Open culture fostering the scientific publishing debate

Description

In this visualization we studied the most frequent terms related to the controversy that appear in online articles and blogs. We started with the queries emerged from previous Seealsology protocol as the prominent topics across the years in the Seealsology visualizations. The queries are: Libre Knowledge, Freedom of Information, Free Content, Crypto Anarchism, Hacktivism, Open content, Open source, Free Culture Movement, Access to Knowledge, Open access, Sci-hub, Libgen. We portrayed the evolution of top ten words year by year, starting from 2013 to 2016. Then we organized the terms in four semantic areas: Academic (identified by the colour dark green), Open (identified by the colour bright green), Piracy (identified by the colour black) and Publishing (identified by the colour blue). Finally we crossed checked these results with Google Trends to see if there is any overlap between the two. We used the frequency of words rather than their count to ensure statistical validity.

The terms “Open Access” and “Free” constantly occupy the top positions in our ranking. In some years it’s interesting to point out that the frequency of “Open Access” almost doubles. In accordance with the results of our research on the topics related to the controversy, articles too leave plenty of room room to talk about pirates in 2016. This decrees that pirates arrived to gain significant weight in the controversy: Sci-Hub went from zero in 2014 to the eighth position in 2016 (also in accordance to Google Trends). It is also interesting to note how the word “business” it is always a constant presence throughout the years, slighting rising in 2014-2015. The fact that the term “business” appears constantly points out the awareness that scientific knowledge is indeed a “business” and therefore there are actors who cash money from its diffusion. Overall it’s possible to see two different trends betweens the curves of this visualization and the previous one: the articles tend to have a more constant nature compared to the discussions, which are much more volatile.

Protocol

Extract the same nodes of the previous point and consider them as a list of queries
Search on Google.com one query at a time for each time unit (2013, 2014, 2015, 2016)
Filter results using only Google News results and sort them by relevance
Select the first 25 articles for each query and extract text content
Count all the words using the tool Stemmer
Select the first 15 keywords for each time unit (2013, 2014, 2015, 2016)
Create a flow chart to visualize changes in the terms frequency of usage

Data

Timestamp: 01/01/2013 - 05/12/2016

Data source: Google News Scraper

Download data (123KB)

The dataset contains the outcome of word counting with Stemmer tool. Words are ranked by frequency of usage.

Open culture fostering the scientific publishing debate

Research questions

prev

next

research question

Which are the most debated topics on online articles?

Description

Protocol

Data

Timestamp: 01/01/2013 - 05/12/2016

Data source: Google News Scraper

Download data (123KB)

project by

Faculty

Teaching Assistants