Background
This project aims to explore trends in popular music over time. By generating text to mimic hit songs from different decades, we can look at the output to compare the dominant themes of the times, and hopefully learn something about what life was like at those times, at least in terms of popular culture mindset. This is because, in some sense, the output is a summary or compilation of all of the input data.
We made use of Long-Short Term Memory Neural Networks, a popular type of Recurrent Neural Network used to predict/generate sequences of things. In this particular project, a seed sequence of 40 characters was used to predict the 41st character. Then, we slide the window and use characters 2-41 to predict the 42nd, and repeat 400 times until we have a paragraph of generated text. For training data, we created a corpus of text for each decade containing the lyrics of the top 100 songs for every year of that decade.
Part of what makes this project cool is that it is almost impossible to empirically measure success. There is no way to say "the grammar is X% perfect," or, "the topics are Y% accurate to the decade," so it becomes more of an art and a creative process to find out how to determine success and optimize the model. As part of validation, we found the following study which did a rigorous statistical analysis of hit songs, and I have to say our generated texts and the insights we extracted are spot on with the study's findings (and probably took a fraction of the time).
What has America been singing about? Trends in themes in the U.S. top-40 songs: 1960–2010
Peter G. Christenson, Silvia de Haan-Rietdijk, Donald F. Roberts, and Tom F.M. ter Bogt. Psychology of Music. 1/23/2018
Abstract
This study explored 19 themes embedded in the lyrics of 1,040 U.S. top-40 songs from 1960 through 2010, using R strucchange software to identify trends and breaks in trends. Findings reveal both continuity and change. As in 1960, the predominant topic of pop music remains romantic and sexual relationships. However, whereas the proportion of lyrics referring to relationships in romantic terms remained stable, the proportion including reference to sex-related aspects of relationships increased sharply. References to lifestyle issues such as dancing, alcohol and drugs, and status/wealth increased substantially, particularly in the 2000s. Other themes were far less frequent: Social/political issues, religion/God, race/ethnicity, personal identity, family, friends showed a modest occurrence in top-40 music throughout the studied period and showed no dramatic changes. Violence and death occurred in a small number of songs, and both increased, particularly since the 1990s. References to hate/hostility, suicide, and occult matters were very rare. Results are examined in the context of cultural changes in the social position of adolescents, and more specifically in light of the increased popularity of rap/hip-hop music, which may explain the increases in references to sex, partying, dancing, drug use, and wealth.
1970-1979
Here, romantic relationships are the primary topic, and the language is not very sexual in nature.
1990-1999
There is still romantic content: juxtaposed use of "I" and "you." The content is starting to get more sexual and party-themed: use of words like "bed," "club," "party."
2006-2015
Here, we see lots of focus on "club" and "party." This decade I show two sections of generated text with different "diversity," which is basically a randomness factor. Diveresity too low (first example, 0.2) results in less interesting, repetitive output. Too high results in gibberish, as you'll see next.
Diversity Parameter Too High
Overfitting
This generator used an LSTM trained for 17 epochs instead of the usual 3. Overfitting to the training data results in the inabilty to form english words, no matter the diversity parameter. Generally with sequence generators, we'd like to shoot for more than 3 epochs, but that would require more consistent training data- such as all from the same genre or same artist- to not result in overfitting.