Words matter

Differences between queries

Introduction

This visualization shows the different perceptions people have about the Deep Web and the Tor browser, for understanding public opinion's poll. How people speak of the phenomenon? Which kind of words they use? How they approach the topic?

How to read the visualization

This alluvial diagram shows how the first 150 relevant words for each query are splitted in various categories. On the left you can select the composed query of which you want to see the division among categories. The size of the flows is linked to the number of words for that category. When a query is selected on the right are visible the first 25 terms relevant for Tf_idf value (term frequency–inverse document frequency). Their size is referred to their relevance.

How it has been done

After the capture of the first 100 results for each query in Google.com and the selection of 25 of them removing invalid urls, the text cointained in fine results has been extracted with dev.zup.densitydesign.org. A corpus of 200 text documents for each query was composed and analyzed with dev.sven.densitydesign.org to extrapolate the relevance of words contained in it. We added a category (perceptions, items, verbs, actors involved, technology and ambients) to the first 150 words emerged by the analysis and we created a spreadsheet with terms, Tf_idf value and category. We visualized with an alluvial diagram made in raw.densitydesign.org the amount of words of each category for each corpus. For the first 25 terms of each list we used raw.densitydesign.org to show the proportion between them and the category to which they belong by a Circle Packing diagram. We replaced circles with words of the same size.

Findings

From this visualization emerges the different approach people have to the Deep Web phenomenon and to the tool that allows to enter in it. The categories linked to the queries let us to see that speaking of Deep Web people use words related to current events, to feelings, to web ambients. The Tor query is mostly connected to technical terms or to action like "get Tor". This can suggests a contraposition between the ignorance of public opinion that is scared from the mistery behind the Deep Web and the awareness of computer technicians that know Tor and his benefits.

Metadata

Timestamp: 25/11/2014 - 10/12/2014

Data source: Google

Related Protocol

Download data (5 KB)