NSA in the Tv Series

Speech context. The keywords that appear when the NSA is mentioned

Introduction

This network graph shows how the keywords are connected the one with the others. All the keywords represented are stricktly related to the NSA and the surveillance/security at-large. It is interesting to recognise a central nucleus of word with an higher in-degree and an external sphere of less relevant words. Moreover, below, a simple bubble chart depicts clearly all the keywords that appear into the graph and their frequency (that is how many times each word is counted in all the TV series which mention the NSA) is represented through the circle's size.

How to read the visualization

In the network graph the keywords, represented with a colored circle, are connected through an edge (connecting line) if they are mentioned in the same dialogue at least once. The size of the nodes represents the frequency, that is all the times a word appears in the whole corpus of the 153 series' subtitles. Nodes are colored according to the Gephi’s modularity class, which is a measure of the structure of the graphs and it was designed to measure the strength of division of a network into groups. The network is spatialised by the algorithm Force Atlas 2 which draws closer the nodes that are more connected. Through the window on the left it is possible to select one group from a list of six and discover all the words belonging to it. Whereas choosing one node in the network it is possible to see only the keywords which are connected to the selected word.

The second graph is composed by a different bubble, each one represents a keyword. The size of each bubble depicts the frequency of the keyword in the whole corpus of the TV series which mention the NSA. The bigger bubble is the NSA and this word appears 1075 times in all the TV shows.

How it has been done

Using Kimono, it's possible to gather part of the dialogue which constitutes the contest of every sentence containing the word "NSA" (1075 sentences containing this word with the relative link to the context) from the links grouped before on Subzin. These links are inserted in Kimono and the crawl is activated. The result is a datatest in which in every line there is a sentence (5375 total sentences). All those sentences are copied in the website Textalyser, which allows to have a list with the most common words with the number of times they appear in the text. Only those with a frequency major than 6 have been selected. Then these ones have been divided in one of these two categories: "connected to NSA" (keywords) or "other words". The keywords and their frequency are used to create the bubble chart.

At this point a script, created with the programming language Python, search all the keywords inside each dialogue and, if they appear together, the script creates a line in a chart: the first column (source) is filled with a keyword, and the second one (target) with one other keyword found in the same dialogue. Therefore in the chart a line is created everytime two keywords are together in the same dialogue and each line can appear twice or more, that is everytime the two words are found together in a dialogue. This table is used to create a map with Gephi: each node is a keyword and if two keywords are in the same speech, they are connected with a line.

Findings

Looking at the bubble chart one will immediately notice that there is a closer group of words in the center of this graph, these keywords have a larger number of connections.

The words like "call" and "phone" are placed in the center, so they are used frequently in the TV series when the NSA is mentioned. The word "call" is small beacuse it appears rarely in the corpus but it always appears with other keywords. Consequently this word is used expecially in surveillance or security's contexts. The security agencies, like CIA and FBI, are placed in the center too.

One of the most important things that the network graph shows is the importance of the word "security" instead of the word "surveillance". This fact is confirmed by the next step of our analysis (which inspects the visual context): the NSA appears more frequently as a security agency rather than a surveillance organization. The word "security" has much more connections and it is never mentioned with its counterpart "surveillance".

Metadata

Timestamp: 18/12/2014 - 20/12/2014

Data source: Subzin

Related Protocol

Download data 1 (1KB) - Keywords' frequency

Download data 2 (217KB) - Dialogues