Description

Once the news outlets were identified as a relevant medium with which the user informs itself, the focus of the research shifted to them. The aim of the analysis was to understand who the most relevant news providers were and how each one of them influenced the debate around climate change. The result is a bump chart where the top ten providers are ranked on a vertical scale, based on their relevance during the three years analyzed.

It’s possible to note that all but the first two positions change through the years, with some providers appearing or disappearing costantly. Only four providers appear in all three rankings: The Guardian and New York Times, which occupy the first two positions; The Huffington Post and Los Angeles Times that vary their position through the years. Generally, the variation in the three ranks do not seem tied in any particular way to the presence of Donald Trump in the debate.

Regardless the number of articles remains somewhat constant during the three years, it’s possible to notice that The Guardian, once taking a great lead on the other providers, in 2016 is not as relevant as before.

Providers position on Google News

To have a much more global view of the research results, in the second visualization all articles scraped are shown, highlighting the articles from the most relevant providers. Each rectangle represents an article, and it’s positioned as it was on the page of Google News during that week. Once an article appears in the highest position, it’s not an indication of relevance per se. Through the relevance ranking, it was intention of the research to ponder both the total numbers of articles and its position on the page.

Protocol

Once online news were chosen as focus, the research was done on Google News USA. Using the query “climate change”, thanks to Webscraper.io, all articles appearing on the first page of Google News were scraped from 2014 to 2016. In order to make the process easier, the scrape was done at weekly intervals for three months each year, from September to November, for a total of 12 weeks a year, that corresponds to the peak range in 2016.

The tool allowed to gather titles and links from each article, that were then grouped based on the news provider, and positioned as they appeared on the Google page. These data is what the second visualization is based on. To create the provider ranking a relevance index was calculated, based on the provider’s frequency and the article’s position on Google News. With a maximum of 42 results each page, the first position on the page granted 52 points, while the last position granted 11 points, while not appearing on the list granted no points whatsoever. This allowed to create a difference between the last ranked of the week and the providers that didn’t appear at all. These points were added, thanks to a Java script, and they constituted the ranking score that defined the ten most relevant providers of the years.

Data

Data source: Google News

The first visualization is almost a 1:1 representation of the dataset. However, the main difference is that it's possible to explore it in a quicker way. Each column is one of the twelve weeks analyzed for each year, and the position is occupied by the provider.
The dataset used to draw the second visualization is the ranking score, ordered from highest to lowest and divided for each year.