This visualization shows the most recurring words in the subreddits comments about the Jordan Pilot video. The words’ placement reflects the belonging to each subreddit: on the left r/worldnews, on the right r/watchpeopledie. The recurring common words are placed in the middle.
To highlight the connection between words we made the visualization interactive. We wanted to draw attention to the tone of voice for each subreddit, connecting the keywords with the most upvoted comments
We noticed that despite the variety of users, both subreddit were characterized by negative, violent and offensive comments.
Protocol
1. Data-Source definition
The video-news selected was the jordan pilot one. We decided to analyze the debate on Reddit, in particular on the subreddits, /r/worldnews and /r/watchpeopledie, respectively for censored and uncensored news.
2. Corpus definition
The comments were extracted using Python, and accessing Reddit through Praw; we downloaded the top-level comments for each subreddits. Then, we cleaned the text and organized a dataset in Excel, with two spreadsheets for uncensored and censored comments.
Later, we scraped manually the text in Text Wrangler, cleaning it from conjunction and useless words and used Textetur to visualize the words-clouds around each discussion. The results were two different visualizations that had to be assembled together. For this reason, we dowloaded two cvs files, reorganized in Excel and importing new nodes and edges in Gephi, adding an attribute column for uncensored and censored.
The data report in an .xls file the amount of top-level comments about the Jordan pilot execution. Each column contains: the censorship level for each subreddit, the comment level, comments rank, text and upvotes.