Description

We explored three main actors, Youtube, Spotify and Wikipedia in order to create a corpus of songs since we assumed that there would have been differences in the content of this three sources. We chose to consider playlists since this let us see how people create a collection of songs with the theme of our research and which associations they do.

The visualization shown above illustrates for each source, Youtube, Wikipedia and Spotify, how many songs actually talk about marijuana, how many of them only mention it and how many do not talk about it at all but are associated with it anyway. On the left side there is a sum of all the songs for each source then arranged on a timeline in order to compare the different composition of each corpus. Below there's a visualization that illustrates a comparison of the artists appearing in all three sources. It shows how popular they are for Wikipedia, Spotify and Youtube. It is interesting to see that the most popular artist for Youtube and Spotify, Wiz Khalifa dessappears because it doesn't appear in Wikipedia, althought he has more than twenty songs in the Youtube playlist.

Protocol

We searched three queries “marijuana”, “cannabis”, and “weed” in Youtube filtering by playlist and relevance and then downloaded the 2 most popular playlists for each query with the YTDT Video List.

For Spotify we used the Playlist Miner with the queries “420”, “weed” and “stoned” after searching all the other possible keywords and selecting the ones with more playlists attached. The Playlist Miner retrieves all the playlists containing the queries and then lists the most popular songs across all the playlists.

For Wikipedia we copied the list contained in the page “cannabis songs” and searched them in Youtube and created a playlist then we downloaded their data with the YTDT tool with our new playlist ID.

We manually searched all the songs in Wikipedia and Genius to retrieve their publication year. We manually searched the lyrics of all the songs to assign them a categorization: containing marijuana or similar in the title, containing marijuana or similar in the lyrics of the song, not containing any reference to marijuana.

Data

Timestamp: 01/12/2016 - 05/12/2016

Data source: Youtube, YTDT video list, Spotify Playlist Miner, Wikipedia, Google Play

There are three excel files, one for each source. Each of them contains title, author, genre, released date, and an explicit column with three categories: yes - when the lyrics mention marijuana or related terms - title - when the title contains a reference to marijuana - and no - when the song does not mention anything related to marijuana.