Chapter 3 - Science vs Intermediary

In simple terms

Introduction

The following protocol focuses on the analysis of the two main voices, that arise from the network of all the websites. On one side there is Pubmed, the academic point of view of the science, and on the other a more immediate and popular source of information, that is The New York Times.

Protocol

In The New York Times site, in the Health section, the articles obtained by the research milk are sorted by relevance and are filtered by date, after the 01/01/2005. The first 300 outcomes are downloaded with Kimono, manually taking out those concerning Human Milk and Recipes, in order to reach a list of 100 pertinent results. The texts are downloaded with Blockspring (Extract Text from URL) for all the articles.

In the Health section of PubMed, the papers obtained by the research milk are sorted by relevance, manually excluding those of Human Milk. The title texts and the abstracts of the first 100 pertinent results are downloaded and ISS code, copyright, authors’ information, French and Spanish translations, link and words as “abstract” and “keywords” are deleted.

Keyword Density Analyzer is used to search the most frequent single words and couple of words, both in the texts of The New York Times and of PubMed. The final four lists of 50 words are assembled manually, omitting articles, conjunctions, verbs, some adjectives, without relevance if they’re single, plurals of words already in singular, declensions of the same term, and “milk”, because it’s the requirement. Some medical contractions are combined with their complete expressions.

The words from the 4 lists are categorized, based on their definition in One Look Dictionary Search; the groups are the same for the single words and the couple of words.

How to read it

The two columns on the left refers to single words most repeated in The New York Times and in PubMed, while on the right side there are the couples of words, that appear always together. The higher the word in the list, the higher is the frequency of it in the texts.

Every color refers to a specific category, while some words are left in blank, because they don’t belong to any group.

Generally self-explicative, the emerging categories are: food, milk typologies (both of animal and vegetal origins), diseases, who and when (subjects and temporal information), studies and treatments, human body and nutrients.

The legend is interactive and let you highlight every single category, in order to explore it more.

Findings

Comparing the same categories, it’s clear that the food is named very little by Pubmed and it’s mostly related to milk’s derivatives; on the contrary, the N.Y. Times doesn’t have a specific focus and develop really different food’s typologies, for example meat, tea, olive oil and French fries

As predictable, the milk typologies are more common as couple of words and they consist almost completely of cow’s milks declinations (also named “conventional milk”), of other animal’s milks’ and of milks for infant.

For the diseases the N.Y. Times and Pubmed speak about different topics, that are respectively cancer, obesity and heart attacks for the news and allergy, intolerance and diabetes for the papers. No word is in common between them, except for “risk”, a generic one.

The status of nutrients is quite the same, with the calcium for the N.Y. Times, not mentioned by the other side, and more specific and detailed components for the science.

The underlying summary better shows that the main topic discussed by The New York Times is food, while Pubmed mostly speaks about studies and treatments, milk typologies and diseases, confirming the expectations.

Data

Timestamp: 12/12/2015

Data source: PubMed, The New York Times

Download data (152KB)