Description

The visualization displays the comparison between the most common 15 names found in news articles about ISIS and the most common 15 names found in the official magazines of the Islamic State, Dabiq and Rumiyah. Within the articles, most of the people mentioned are terrorists. This confirms the fact that the largest amount of available content in news sites concerns the attacks that ISIS has carried out in Europe — as discovered in question 7. This is different with the names collected in the official magazines of the Islamic state, that mainly quote affiliates to the organization or political and religious figures. Religious figures do not produce significant occurrences in the news articles. Interesting to notice is that the only two people that are present in both text sources are Barack Obama and Jihadi John, a known terrorist.

Protocol

Through the use of the Alexa ranking, four European news sites were selected to be used for the collection of data relating to the articles concerning ISIS. Two of the four news sites (Euronews and BBC Online) were selected as first two European news sites in the ranking of the most used, the other two (The Telegraph and Mirror Online) from the ranking of the UK. This method of selection considered news sites that use the English language for their publications, but we discovered that this wasn’t the case after all (read below).

The data collection was first conducted on Google to collect the articles. Each search query used the word "ISIS," combined with the name of a city (Paris, Brussels and Nice) in the periods in which terrorist attacks in Europe have occurred.

The 7 periods examined were selected referring to the results of Google Trends as shown in the introduction of this website. Respectively to the peaks represented by the attacks were selected 3 interim periods, defined truce time intervals. For each event or research period it was defined a time range of two weeks for the collection of the articles. The same search query matched to the reference period was repeated four times, once for each news site; for a total of 28 searches.

Example: ISIS AND Paris site: euronews.com - Costum range: Nov 6, 2015 - Nov 20, 2015

The results of each research were collected with the Google Chrome extension Web Scraper that allows to collect specific items of the page through the html code reading. In this case, URLs, articles section, title and text were collected. The result was a .csv file for each search, containing the results of the scraping. The total number of articles collected (1758) was then manually reduced according to the relevance for this research; the final number of used articles is 1215. Most of the articles were written in foreign languages, for this reason the articles were translated into English using Google Translate.

All the generated datasets were combined into different files for analysis. The text of the articles was analyzed to bring out the occurrences, with Voyant tool. With the use of the tool, the three most frequent words were selected for each article. Among the recurring words, proper names have been selected if appeared. The names of people were then collected and organized in a single sheet, unifying all periods in a single dataset.

Similarly, for the texts of Dabiq and Rumiyah, downloaded from the website of The Clarion Project, Voyant tool was used to extract the names of the people. A dataset with the list of names and the number of recurrences was created.

For both sources, news sites and ISIS magazines, each name was tagged with a category. Only the first 15 names both from the news sites and from the ISIS magazines were then used to make the comparison.

Data

Timestamp: 06/12/2016 - 13/12/2016

Data source: Google, Web Scraper, Voyant tool

The collected data were organized into 7 different datasets.

1_charliehebdo - 2_bataclan - 3_brussels - 4_nice
For each of the four periods of the terrorist attacks there is a dataset containing all the downloaded articles with Web Scraper (link, title, text, category) and the added elements during analysis (translated text, keywords, names, type, length and notes). The first sheet contains the just mentioned elements while the other summarizes the counts of these elements. The “person” sheet contains the proper name of each person and the number of times that is repeated in the articles of the period.

5_neutralperiods
For truce defined periods there is a single file that contains a sheet for each period with the elements collected using Web Scraper. There is a sheet called "person" for the proper names recurrences.

6_totalnews
The file contains the sheet "person total" with the total counts of occurrences of names for the 7 periods.

7_comparison dabiq
The file contains the "person" sheet; a list of people mentioned in the magazines, in the totality of its editions and the number of occurrences for each name.