This step of the research focused on the individuation of a tematic-network inside the controversy “Censorship and terrorism in the U.S”. In particular we investigate Google Search to understand if we could find out which were the specific topics of the debate. The result was quite interesting, since we discover that nowadays when we talk about terrorism, we mainly refer to isis; and the censorship is a topic covering mostly media and national security fields. The squares in the treemap show how much a couple of word occurs in the text analyzed from the Web and Isis is the biggest tile.

The analysis could be considered more specific than the previous one, since we can identify different macro-tematic areas with more specific topics inside. We realized the debate could shift from terrorism to isis but at the same time we need further researches to go deeply into the controversy.


queriesdefinition google advancedsearch google scraper 1. advanced options— usa— 20 results2. download pages url csv> 120 results 1.manual scraping> 43 results 2 word phrases count— min word lenght: 4— min occurrences: 6 from pages url to text 1. cluster definition— terrorism— national security— human rights— online censorship— media2. creation of DATABASE B containing— 2 word phrases— occurrences— cluster chart: treemap excel textripper brackets keyword densityanalyzer excel raw illustrator corpusdefinition visualization corpus scraping textestraction cleaning— a/an/about/are/and/at/as/be/how/I/in/is/it/of/on/or/that/the/this/to/was/what/where/when/who/will/with/than/they/if/we/you/he/she/have/has/been/were/do/its 1. incognito search2. 6 queries— censorship terrorism usa — censorship terrorism united states— censorship information terrorism usa— censorship information terrorism united states— censorship media terrorism usa— censorship media terrorism united states HOW TO READ: website terminal software tool corpusdefinition visualization networkdefinition data-sourcedefinition queriesdefinition
1. Queries definition

The first part of the protocol is about a deeper exploration of the controversy on Google; for this reason we started finding, through different incognito searches, the best queries about censorship and terrorism in the USA.

2. Corpus definition

The corpus is made up of 38 url scraped from 120, extracted with Google Scraper. We decided to explore Google since it could have represented the interest of the people and of the media on the controversy. After the url extraction, we got the texts, scraped with brackets from conjunctions and useless words that could have alterate the results of the research. In the end we used Keyword Density Analyzer to extract the more common 2 Words Phrases and understand which was the topic that occurred the most.


Timestamp: 24/11/2016 - 28/11/2016

Data source: Google Advanced Search

The data are collected in a .xls file, there are three columns containing: two words phrases mined from texts, number of occurences and thematic clusters.