Introduction
The protocol described allows to visualize on a map all the geonames contained in the web pages analysis.
The protocol described allows to visualize on a map all the geonames contained in the web pages analysis.
Google.com set in Incognito Mode.
Scraping of first 200 web pages results from Google, with the 4 queries:
URL extraction and text cleaning, to exclude Social Networks and non-text results.
Text extraction, from the results, with Zup, and cleaning of empty or off topic files.
Replacing of every new line with a space by using a script of the terminal.
Finding of all the entities in the texts using Open Refine.
Import the result in Microsoft Excel to finalise the dataset: