Description
When news providers talk about defectors they generally talk about them in relation to other internal and external problem of North Korea. The more we get closer to topics such as politics or war the less news talk about people and shift the attention from defectors, treating them as a secondary or marginal topic. Usually providers use a certain kind of title to give the reader an impression of the content, as it is possible to notice from the visualization, the title topic often differs from the real content of the article. If we look at the double treemap we can notice how in english speaking media, even when we target our research on defectors, there is the tendency to talk about other issues. That is particularly true for United States and Japan. It is different for other “minor” countries where the main topic is always regarding defectors health, as we will see in question 2 this is correlated to one specific article that went viral, especially in the UK.
If we observe the Chinese situation we see that surprisingly (for a country were North Korean debate is highly controversial) a great number of providers talk about human rights situation and defectors life in new countries.
Protocol
First we searched on Google News and Baidu with a more specific query “North Korean defectors”, then we scraped and labelled the results.
To create meaningful categories we used IBM Watson NLU to tag all the titles and articles, we opted for a double categorization, to see if there were differences between the first article impression (title) and the corpus. For the chinese part we did not manage to use IBM Watson and we worked manually on the labelling.
Data
Timestamp: 11/2016 - 11/2017
Data source: Google News, Baidu News
Download data (4MB)
We worked on two different datasets: one for Chinese and one for english written news. Then we merge the two datasets that had identical categories: titles, title category, link, article category, date and ranking.