Related topics

Which are the most relevant themes and how are they distributed in our movies?

Introduction


A film is a complex artwork, not only for the fact that it has several level of interpretation, but also because the story in itself connects different themes with new relations every time.
In this chapter we would like to analyze the topics that are connected to it and discover which are the ones that have more in common with the topic of migration.
In order to do that we collected the keywords on Imdb for each movies and sort them in some main topics.


This visualization explores some selected arguments (job, arts, immigration, family, criminality, cultural differences, sex, violence and security forces) and shows how they are spread out through our movies.

Protocol


1. Once the 120 movie list was set, each Imdb page url of these films has been scraped using Kimono.
2. A dataset has been created with the 120 urls.
3. We added the string “/keywords?ref_=tt_stry_kw” to all urls using regular expressions in Text Wrangler in order to allow an automated scraping.
4. Using these 120 new urls, a second Kimono API was created to obtain a dataset containing all keywords related to our movies.
5. A pivot table in Excel helped us to discover which were the most recurring keywords and to create an univocal list with 5216 keywords.
6. We noticed that keywords were often too specific, so we decided to assign them to more general topics. The 40 more recurrent theme were: violence, travel, time, technology, sport, social issues, sex, security forces, religion, relationships, reference, politics, people, other, nationalities, media, love, language, justice, job, immigration, history, health, places, gender issues, food, film, family, emotions, education, economy, death, cultural differences, criminality, car, arts, animals, ages, addiction and sexual abuse.
7. Once this dataset was obtained, we selected the more interesting themes in order to visualize only the more relevant issues. These themes were: violence, job, arts, immigration, family, criminality, cultural difference, sex and security forces.
8. A single-column dataset was created for each movie containing only the tags of the topics we selected.
9. Using Raw we create 120 treemap visualization: the “Tag” value is assigned to the dimensions “Hierarchy” and “Color”. the size is given by the repetition of the tag in the movie. Height and width are set to 500 px and padding is set to 1 px.
10. In Illustrator the 120 treemaps were assembled and colored. The graphs are disposed in a boustrophedic way (from the top left corner to the bottom left, starting from left to right, then viceversa till the bottom), following our ranking criteria (see previous chapter).
11. Each category had a color assigned which tend to red if the topic has a more negative connotation and blue for a more positive tendency, with white in the middle for the neutral themes.

How to read it


Each treemap of the mosaic represent one movie. They are disposed in a boustrophedic way (from the top left corner to the bottom left, starting from left to right, then vice versa till the bottom), following our ranking criteria (see previous chapter).
Each category had a color assigned which tend to red if the topic has a more negative connotation and blue for a more positive tendency, with white in the middle for the neutral themes.
By switching on each theme at the time, different patterns can be visualized. These patterns show how themes are spread out in our movies.

Findings


What is interesting to point out is that two principal kinds of patterns can be seen through this visualization. On one hand, we can see that themes characterized by evident negative connotation (criminality and violence) appear in the high zone of the visualization together with other categories (like sex and security forces) which are not characterized by a clearly negative trait but whose connotation softly tend to the negative side of the color palette.

Family is a topic strongly present and equally spread through our films.
On the other hand, is interesting to discover that two themes such as “cultural differences” and “immigration” are more spread along the lower half of the graph than on the higher one. Despite the fact that these two categories are composed by keywords strongly related to our main theme, we can suppose that cinematographic industry prefers to attract audience levering on topics that seems more appealing (like violence and sex) while omitting other aspects more connected with migration but less attractive for the average spectators.