Web pages Analysis

Protocol

Introduction

The protocol described allows to extract the first 200 voices of google, using queries
“File Sharing” + Effects, “File Sharing” + consequences, Piracy + effects, Piracy + consequences.

Scraping

Google.com set in Incognito Mode.

Scraping of first 200 results from Google, with the 4 queries:

  • “File Sharing” + Effects;
  • “File Sharing” + Consequences;
  • Piracy + Effects;
  • Piracy + Consequences.

URL extraction and text cleaning, to exclude Social Networks and non-text results.

Reading and elimination of irrelevant links (for example: sea piracy, file sharing tutorials,
file sharing protocols official pages, ecc…)

Filling a dataset with the following entries:

Metadata

Timestamp:
24/11/14 - 03/12/14

Data source:
Google

Tools:
Url Extractor, Microsoft Excel, Illustrator