Googling the controversy

N-grams

Introduction

This study considers the most distributed n-grams, taking into account the results of the queries providing pro or against quotes.
Once obtained the most distributed n-grams in the sub-corpus pro and against, they can be compared with different parameters, such as frequency and relevance.

How to read the visualization

The visualization compares the hundred most distributed n-grams in the sub-corpora of pro and against results. By selecting the buttons TF and TF-IDF, which refer respectively to the frequency and relevance, the n-grams can be compared with these additional parameters to see in which order they redeploy.

How it has been done

We took the articles that provided only pro or against quotes out of the total corpus of results, thus creating two sub-corpora.
Using the tool Zup, we extracted the texts of the articles, while with the tool Sven we extracted and analyzed the n-grams. Afterwards we selected the most distributed hundred n-grams of both the sub-corpora and then we checked how they redeployed when ordered by frequency or relevance. We created a visualization by means of the tool Raw.

Findings

In relation to the PRO sub-corpus, after organizing the n-grams according to the distribution, it is clear how the concepts are more connected to privacy, personal data protection and fundamental rights themes. Regarding the AGAINST sub-corpus, on the other hand, the distribution follows themes such as free information, censorship, justice. Also n-grams such as “Guardian” and “Wikipedia” refers to de-indexing cases and removal of online information. All of the found n-grams refer to the same area of interest, namely internet, politics, legislation.

Metadata

Timestamp:14/11/2014 - 05/12/2014

Data source: Google

Related Protocol

Download data (455KB)