Figuring out the most debated topics about Open Access

In depth: Reddit

Introduction

Reddit is a huge, global forum where any kind of topic can be discussed, and so is Open Access. It seems like the perfect source for debate for many reasons, and not only because of its daily activity. For example the average user is aged 25 to 34, a perfect profile as it is possible this hypothetical user already has a major degree but is still young enough to have widely explored the field and the innovations in it; also, Reddit has no censorship at all: despite many redditors are certified users, they are not supposed to maintain a certain etiquette and discussion is never too formal. Last but not least, Reddit is deeply involved into internet openness, being an open-source website and founded by huge Open Access activist Aaron Swartz.

Protocol

We used the search function inside Reddit and typed the query "open access". As the results were not too many, it was possible to select the relevant discussions only, eventually limited to 30 AmA topics. An Ama (Ask me Anything) is a specific kind of topic where the original poster makes himself available to answer to any kind of question, usually related to the original post and the occupation of his.

Kimono is a special tool that helped us download the first 500 comments per topic. All these replies where analysed thanks to Vos Viewer, another interesting software: it made us clean up all the results, for example detecting and deleting double posts, unifying synonims etc. We were thus able to indentify the main words and expressions (the ones found at least 20 times) and to make a binary account (how many posts included each of these main expressions).

How to read it

The visualization can be viewed with no particular reading order. Each balloon is associated with one of the main expressions found, and its size reflects the binary count. The bigger a balloon, the more posts contained the expression at least once, while colours help clustering different words to different semantic areas. Lines represent links between some of these common words.

Findings

There is a quite big interested community dwelling in Reddit who actively participates in the boards. Usually, users either work in the field or are at least competent. There are four more debated topics:

- the business model, including fundings, editors and profits
- academic publications, including institutions and libraries
- quality of publications, including peer review issues, authors and impact factor
- scientific reliability, including the importance of openness of datasets and bias problems

Data

Timestamp: 20/11/2014 - 30/11/2014

Data source: Reddit

Download data (437kB)