research question

Google News and Baidu News: what are the different sources talking about?

HEALTH/DISEASE LAW/HUMAN RIGHTS POLITICS/ARMY POLITICS/GOVERNEMENT RELIGION/CHRISTIANITY SOCIETY/LIFE SOCIETY/WAR OTHER RELEVANT TOPICS PER COUNTRIES Main topics inside titles Click the country for details SOCIETY/LIFE SOCIETY/WAR OTHER POLITICS/ARMY POLITICS/GOVERNEMENT RELIGION/CHRISTIANITY HEALTH/DISEASE LAW/HUMAN RIGHTS RELEVANT TOPICS PER COUNTRIES Main topics inside article corpus Click the country for details CHINA USA JAPAN IRELAND INDIA OTHER SOUTH KOREA UK CHINA USA JAPAN IRELAND INDIA OTHER SOUTH KOREA UK

Description

When news providers talk about defectors they generally talk about them in relation to other internal and external problem of North Korea. The more we get closer to topics such as politics or war the less news talk about people and shift the attention from defectors, treating them as a secondary or marginal topic. Usually providers use a certain kind of title to give the reader an impression of the content, as it is possible to notice from the visualization, the title topic often differs from the real content of the article. If we look at the double treemap we can notice how in english speaking media, even when we target our research on defectors, there is the tendency to talk about other issues. That is particularly true for United States and Japan. It is different for other “minor” countries where the main topic is always regarding defectors health, as we will see in question 2 this is correlated to one specific article that went viral, especially in the UK.

If we observe the Chinese situation we see that surprisingly (for a country were North Korean debate is highly controversial) a great number of providers talk about human rights situation and defectors life in new countries.

Protocol

protocol5 QUERY VISUALIZATION CORPUS DEFINITION What are different countriestalking about? Natural LanguageUnderstandment API Double tagging: titles and articles Retrieving more results GOOGLE NEWS“North Korean Defectors” BAIDU NEWS“North Korean Defectors” Excel 116results Excel(manual refinementand source labelling) Excel(manual labelling) TitlesCategories ArticlesCategories RAWGRAPHS ILLUSTRATOR BAIDU NEWSAdvanced Research GOOGLE TRANSLATEChinese - English WATSON IBM WEBSCRAPER 40results 100results WEBSCRAPER WEBSCRAPER VISUALIZATIONTREEMAP

First we searched on Google News and Baidu with a more specific query “North Korean defectors”, then we scraped and labelled the results.

To create meaningful categories we used IBM Watson NLU to tag all the titles and articles, we opted for a double categorization, to see if there were differences between the first article impression (title) and the corpus. For the chinese part we did not manage to use IBM Watson and we worked manually on the labelling.

Data

Timestamp: 11/2016 - 11/2017

Data source: Google News, Baidu News

We worked on two different datasets: one for Chinese and one for english written news. Then we merge the two datasets that had identical categories: titles, title category, link, article category, date and ranking.