Description
Images inside articles are one of the most important things in catching the user attention and in creating a particular mood that will influence them. Articles about North Korean defectors often use pictures of people, often civilians, but in a large amount of occasions pictures portrait soldiers and war scenes.
Still, we can see how present is the parasite article with detailed pictures about the event. This is particularly true for english speaking media, but is slightly different for China. Chinese articles offered us a different visual impact. They depict people as well, but often in a more cheerful and light way. On the other hand they use way more graphic representations such as maps, graphs and written texts. Kim Jong Un, also, has a really big space, especially in english written news.
Protocol
We randomly selected a sample of 200 articles, 100 Chinese and 100 written in english. After picking articles from the most represented categories (see Question 6), we decided to try a randomization to see if the result would match with the one in Question 6. Using Fatkun Batch we retrieved all the images inside the article pages. We cleaned the images folder, deleting all the spam images and non pertinent results (Fatkun do not retrieve results in an intelligent way, but download every picture inside the pages including spam).
We create structured folders of pictures, then we used Imagga python script to automatically tag every pictures. The result was a Json file with informations about image color and content.
We adjusted the AI tagging bias in Excel, creating a dataset with all the informations about single pictures. As we had too many different categories, we created meaningful categories based on image contents framing all our pictures inside our hand-coded categories. This allowed us to have a more superficial and universal categorization in order to visualize pictures and color pattern.
Data
Timestamp: 11/2016 - 11/2017
Data source: Google News, Baidu News
Download data (4MB)
We have one single dataset with informations about pictures. All the pictures are stored inside one folder, after being renamed.