Description

This visualization shows the most recurring words in the subreddits comments about the Jordan Pilot video. The words’ placement reflects the belonging to each subreddit: on the left r/worldnews, on the right r/watchpeopledie. The recurring common words are placed in the middle. To highlight the connection between words we made the visualization interactive. We wanted to draw attention to the tone of voice for each subreddit, connecting the keywords with the most upvoted comments

UPVOTE COMMENT 11 Saw the video. My ... 5 So I know we don't ... sub-reddit 11 votes video Saw the video. My blood is boiling. I would give up my eyesight for a world without IS.

We noticed that despite the variety of users, both subreddit were characterized by negative, violent and offensive comments.

Protocol

2 subreddit— /r/worldnews > censored— /r/watchpeopledie > uncensored comments extraction— /r/worldnews > 136 comments— /r/watchpeopledie > 200 comments> 336 comments data sourcedefinition excel gephi textetur 2 text wrangler excel reddit.com Python +praw 1 1. creation of two visualization— censored comments— uncensored comments2. csv download creation of a dataset — nodes and edgesfor both censored and unconsored comments— new column: attributes/colors 1. nodes partition— attribute: colors2. layout— force atlas 2— scaling 2— gravity 1— noverlap illustrator corpusdefinition visualization corpus scraping comments’ text cleaning— a/an/about/are/and/at/as/be/how/I/in/is/it/of/on/or/that/the/this/to/was/what/where/when/who/will/with/than/they/if/we/you/he/she/have/has/been/were/do/its database 5 creation— censorship level— comments level— ranking— comments text— upvotes textetur 2 1. how to extract comments from REDDIT using PRAW 2. Creation of two visualization for censored and uncensored words-cloud Python +praw 1 >>> import praw >>> reddit = praw.Reddit user_agent='Comment Extraction (by /u/USERNAME)', client_id='CLIENT_ID', client_secret="CLIENT_SECRET", username='USERNAME', password='PASSWORD') — Create an account on reddit.com— On reddit.com: go to pref > app > create new app— select script > fill the fields > obtain your information A. Access REDDIT.com B. Load submission C. Extract top level comments D. Extract second level comments — from the url: ../r/worldnews/comments/2unfmu/isis.. >>> submission = reddit.submission(id='example') >>> for top_level_comment in submission.comments: print(top_level_comment.body) >>> submission.comments.replace_more(limit=0)for top_level_comment in submission.comments: for second_level_comment in top_level_comment.replies: print(second_level_comment.body) A. Censored Cloud B. Uncensored Cloud HOW TO READ: website terminal software tool corpusdefinition visualization networkdefinition data-sourcedefinition queriesdefinition
1. Data-Source definition

The video-news selected was the jordan pilot one. We decided to analyze the debate on Reddit, in particular on the subreddits, /r/worldnews and /r/watchpeopledie, respectively for censored and uncensored news.

2. Corpus definition

The comments were extracted using Python, and accessing Reddit through Praw; we downloaded the top-level comments for each subreddits. Then, we cleaned the text and organized a dataset in Excel, with two spreadsheets for uncensored and censored comments. Later, we scraped manually the text in Text Wrangler, cleaning it from conjunction and useless words and used Textetur to visualize the words-clouds around each discussion. The results were two different visualizations that had to be assembled together. For this reason, we dowloaded two cvs files, reorganized in Excel and importing new nodes and edges in Gephi, adding an attribute column for uncensored and censored.

Data

Timestamp: 06/12/2016 - 12/12/2016

Data source: Reddit

The data report in an .xls file the amount of top-level comments about the Jordan pilot execution. Each column contains: the censorship level for each subreddit, the comment level, comments rank, text and upvotes.