This step of the research focused on the individuation of a tematic-network inside the controversy “Censorship and terrorism in the U.S”. In particular we investigate Google Search to understand if we could find out which were the specific topics of the debate. The result was quite interesting, since we discover that nowadays when we talk about terrorism, we mainly refer to isis; and the censorship is a topic covering mostly media and national security fields.
The squares in the treemap show how much a couple of word occurs in the text analyzed from the Web and Isis is the biggest tile.
The analysis could be considered more specific than the previous one, since we can identify different macro-tematic areas with more specific topics inside. We realized the debate could shift from terrorism to isis but at the same time we need further researches to go deeply into the controversy.
1. Queries definition
The first part of the protocol is about a deeper exploration of the controversy on Google; for this reason we started finding, through different incognito searches, the best queries about censorship and terrorism in the USA.
2. Corpus definition
The corpus is made up of 38 url scraped from 120, extracted with Google Scraper.
We decided to explore Google since it could have represented the interest of the people and of the media on the controversy. After the url extraction, we got the texts, scraped with brackets from conjunctions and useless words that could have alterate the results of the research.
In the end we used Keyword Density Analyzer to extract the more common 2 Words Phrases and understand which was the topic that occurred the most.