research question

Are there recurring words and common storylines between different topics?

n2svg-barcharts 35 40 45 50 55 60 65 5 0 10 15 20 25 30 0 5 10 15 20 25 30 35 40 POLITICS WAR HUMAN RIGHTS LIFE TECHNOLOGY yong life hell world country people kim kim yong people life media military living hackers nuclear money zhang death chief states nuclear embassy president strategy institute intelligence diploma security fled family zhang company video defect seoul officials lin defection hackers ordinary living yong media government kim kim life hell united agency united yong death kim nuclear solder soilders kim money HUMAN RIGHTS LIFE WAR TECHNOLOGY HEALTH POLITICS provinces kim defectors border kim nuclear china kim soldier parasites trump pyongyang nuclear seung rights radio pyongan province people ministry lee human hamgyong drives defections country chinese china camps south show people joo food defectors aid unc president official missile military jsa seung radio people media information flash drives country broadcast worms surgeon state problems long lee hospital health experts country border yun u.s stop special security sanctions pressure president official month missile ministry minister kim japan foreign china aid abe soldiers death defected hackers hell nuclear ordinary soldier Yong Kim Jong Un Trump Soldier Security Pyongyang People Parasites Military Lee Defector (Defections Defecting) Kim Jong Un China Border Human USA GOOGLE NEWS Recurring words inside the articles corpus + recurring words divided par topic Recurring words inside the articles corpus + recurring words divided par topic CHZ BAIDU NEWS

Description

While scraping the news we noticed common patterns that could tell us something about how defectors issue is treated from media. It seemed that, particularly vertical articles, the one talking about their journey and their life (listed as LIFE, HUMAN RIGHTS and HEALTH), searching for the detail, even the most morbid one, was something recurring. Words such as “hell”, “death”, “food” (which indicates generally food shortages or famine related problems) are used in multiple articles to talk about defectors situation. One interesting word is “parasite” (or “parasites”) that is linked to one single event, reported from different devices: one military, that was shot during his escape was found with a 27cm parasite in his stomach.

As a conclusion we can say that if we remove all the news about defectors that concern politics, nuclear and war we will have a selection of articles that talk about their life in a really personal way, with a large space to details.

Protocol

prot6 QUERY VISUALIZATION CORPUS DEFINITION Are there recurring words among different topics that can identify how the issue is treated? Merging of the chinese and western database - short list selection of 10 articles percountry according to the sources relevance- 20 articles were picked for China - 1st phase: single article analysis- 2nd phase: categories split and analysis of different corpus - 3rd phase: creation of a global corpus and analysis 60results DATASET “North Korean Defectors News” Excel Excel(cleaning: remove all words usedless than 10 times) TABLEAU ILLUSTRATOR Global Corpus Categories SplitCorpus VOYANT TOOL VISUALIZATION 1 STACKED BARCHART VISUALIZATION 2 CATEGORIES BARCHART

We selected a sample of 30 articles from the english dataset, choosing articles from the most representative categories. We did the same with the Chinese dataset, retrieving 30 articles.

We manually organized them inside a new Excel files, using Voyant Tool to analyze single articles and get the recurring words per categories. Then we processed the entire articles corpus to see the total amount of words used.

Data

Timestamp: 11/2016 - 11/2017

Data source: Google News, Baidu News

We used four dataset: two for categories and two for the total corpus. Inside the first two datasets we have informations about relevant words splitter per single categories. Inside the last two we have the total amount of relevant words according to the sum of every article we analyzed.