Corpus Analysis

Protocol

Text Analysis


Starting from the queries:

- Internet Addiction
- Digital Addiction
- Social Media Addiction (cut out proceeding with the analysis)

1. Google.com (only english pages) with incognito window navigation

2. Google.com > settings > never show instant results > results per page > 100

3. Query > left-click > view page source > cmd-a, copy and paste on Yoyo
> filter “google” > extract links

4. Excel > double links cleaned (filters > advanced > unique records only)
> excel file cleaned validating each link by reading and tagging its nature in:

- Medical
- Rehab
- News
- Blog
- Education
- Technology
- Wiki

5. Added information in the Excel dataset related to:

- Author
- Latitude
- Longitude
- IP of the website
- Short summary
- Quotes

6. Zup (user: gruppo_2; password: gruppo_2) > project rename > cmd-v
filtered excel links > start > when done, download results > zip files extracted

7. File .txt renamed with short names (01,02, n) not to make Sven crash

8. Sven (user: gruppo_2; password: gruppo_2). It is necessary to log out from ZUP
or open it with another browser otherwise there might be identification problems
> upload all the .txt related to a query and/or a specific tag

9. TF and TFIDF (much slower) analysis and .csv download

10. .csv cleaning on Excel

11. Google Refine for the Sven analysis

12. First 50 relevant words for each query and delta calculation between Excel frequencies
(TF Internet Addiction - TF Digital Addiction), in order to have a delta value for both queries

13. Scatterplot creation using Raw and Illustrator

Images Analysis


Starting from the queries:

- Internet Addiction
- Digital Addiction
- Social Media Addiction (cut out proceeding with the analysis)

1. Google.com (only english pages) with incognito window navigation
on Mozilla Firefox

2. Google.com > google images > query > first 100 images download
with DownThemAll

3. Images put into a grid according to their appearance order and tags
(both with Illustrator and Excel)

4. Images sorted in the grid according to the tags,
from the most common to the least common

Metadata

Timestamp:
22/11/2014 - 12/12/2014

Data source:
Google

Tools:
Zup, Sven,
Google Refine,
Raw