Background

               This project aims to explore trends in popular music over time. By generating text to mimic hit songs from different decades, we can look at the output to compare the dominant themes of the times, and hopefully learn something about what life was like at those times, at least in terms of popular culture mindset. This is because, in some sense, the output is a summary or compilation of all of the input data.

                We made use of Long-Short Term Memory Neural Networks, a popular type of Recurrent Neural Network used to predict/generate sequences of things. In this particular project, a seed sequence of 40 characters was used to predict the 41st character. Then, we slide the window and use characters 2-41 to predict the 42nd, and repeat 400 times until we have a paragraph of generated text. For training data, we created a corpus of text for each decade containing the lyrics of the top 100 songs for every year of that decade.

                Part of what makes this project cool is that it is almost impossible to empirically measure success. There is no way to say "the grammar is X% perfect," or, "the topics are Y% accurate to the decade," so it becomes more of an art and a creative process to find out how to determine success and optimize the model. As part of validation, we found the following study which did a rigorous statistical analysis of hit songs, and I have to say our generated texts and the insights we extracted are spot on with the study's findings (and probably took a fraction of the time).

What has America been singing about? Trends in themes in the U.S. top-40 songs: 1960–2010

Peter G. Christenson, Silvia de Haan-Rietdijk, Donald F. Roberts, and Tom F.M. ter Bogt. Psychology of Music. 1/23/2018

Abstract

This study explored 19 themes embedded in the lyrics of 1,040 U.S. top-40 songs from 1960 through 2010, using R strucchange software to identify trends and breaks in trends. Findings reveal both continuity and change. As in 1960, the predominant topic of pop music remains romantic and sexual relationships. However, whereas the proportion of lyrics referring to relationships in romantic terms remained stable, the proportion including reference to sex-related aspects of relationships increased sharply. References to lifestyle issues such as dancing, alcohol and drugs, and status/wealth increased substantially, particularly in the 2000s. Other themes were far less frequent: Social/political issues, religion/God, race/ethnicity, personal identity, family, friends showed a modest occurrence in top-40 music throughout the studied period and showed no dramatic changes. Violence and death occurred in a small number of songs, and both increased, particularly since the 1990s. References to hate/hostility, suicide, and occult matters were very rare. Results are examined in the context of cultural changes in the social position of adolescents, and more specifically in light of the increased popularity of rap/hip-hop music, which may explain the increases in references to sex, partying, dancing, drug use, and wealth.

1970-1979

----- diversity: 0.5
----- Generating with seed: "close to mine and i knew our joy would f"

close to mine and i knew our joy would feeded my love still begin my love i got a show for the boat you are and i can take you i love you are my new rill in the mornin only when you know i love you baby i want to knows in love you say when you know its all right before the means wont you really my eyes dont stop it do it made away i have seen only the band aleaist we say shes gonna be long do you about the real real the morning baby don

Here, romantic relationships are the primary topic, and the language is not very sexual in nature.

1990-1999

----- diversity: 0.2
----- Generating with seed: "break me yeah you wreck mei put you high"

break me yeah you wreck mei put you high i said i want it i know i love you this is the way i see it who wont you can be a little shit what i want it i see it i can see it when i was are the same i want the bast when you should see it when you can see it i want it i can show you the stars im a good and i love it on the stars i got the club the bed the same i want the match the party thinkin when i should want the star i shake it off i s

There is still romantic content: juxtaposed use of "I" and "you." The content is starting to get more sexual and party-themed: use of words like "bed," "club," "party."

2006-2015

----- diversity: 0.2
----- Generating with seed: "a fox jumps over the lazy dog and then i"

a fox jumps over the lazy dog and then i stand it out i love it and i love it out the bad be the chains a girl i want your mind i wanna show you this is the place i want your love i can see it i wanna see your mind i see the way i got the lights im stronding its a good come on i can take your heart the same when you can have the same i know i should want you i dont know i want the stars i would be we got the club the way i love it on th

----- diversity: 0.5
----- Generating with seed: "a fox jumps over the lazy dog and then i"

a fox jumps over the lazy dog and then i shake it with a party to the close the girl im gonna see the club it aint gonna see it bank and i knew you look only though its all back its tracty i dont wanna say the club she just wanna see you like it and the club the same i see it i can love you go head be gone with the time i wan back and i be in rolling on a bring i want me have real do in the chase what i knee of someone the prines she be

Here, we see lots of focus on "club" and "party." This decade I show two sections of generated text with different "diversity," which is basically a randomness factor. Diveresity too low (first example, 0.2) results in less interesting, repetitive output. Too high results in gibberish, as you'll see next.

Diversity Parameter Too High

----- diversity: 1.2
----- Generating with seed: "close to mine and i knew our joy would f"

close to mine and i knew our joy would froovingc vagneseen again loo people donten anoture at worry babya ramill heres funky aw in a skyapaveget easy oh shes got me jamie sailing crorllingte angel drull with my lordhin itstars walk to your longeestyeeered you have rockin morethe werelywild my yreages now peeck cap back soon on no if insideishoar boogie you true sexyahply sudmen for his lonile stime yeah uhoh hear what is of feelings c

Overfitting

This generator used an LSTM trained for 17 epochs instead of the usual 3. Overfitting to the training data results in the inabilty to form english words, no matter the diversity parameter. Generally with sequence generators, we'd like to shoot for more than 3 epochs, but that would require more consistent training data- such as all from the same genre or same artist- to not result in overfitting.

----- diversity: 0.5
----- Generating with seed: "ues down to see we dont even care as res"

ues down to see we dont even care as resoi wousone mo yera ll tan hosa m ele i wawdhe or aie tlog leerwyutar or llonormooo ooetlu maol iomas wngwy id aos nio n aneag ntanesa soat a a wd at tonoeor ne mnr inthl e ttolow em yyinnes h atthdei rinuiaihwg oehe tl hot ane sh hse sinouonne woigoy w i twm elt shallhgwayt g ay wg gar d aoeio ise w y asouine o e mae girdtwot whoeoneryarih two no wygoowgiinoonow aouuaan l ao ls e m eudrtaneaio