Amazon Analysis


Books Analysis

Starting from the queries:

- Internet Addiction
- Digital Addiction
- Social Media Addiction (abandoned throughout the analysis)

1. Each query has been typed into (using Google Chrome)
selecting the voice “books”. The first 200 results have been selected (approximately 20 pages each)
in order of relevance for each query

2. Once opened the first results page for each query,
the following voices have been created on Kimono:

- Title
- Author
- Data
- Reviews
- Image

In “Data Model View”, from the page “Advanced” for each voice,
“Including href” has been taken off; selecting “Crawl Setup” has been used “Manual Crawl”
and inserted the URL for the 20 following pages; once ended the Crawling the .cvs is dowloaded

3. The .csv is opened in TextWrangler,
reworked substituting “,” with “;” (dividers), then saved
as “unix (LF)-Unicode (UTF-8)”

4. Three tables have been created using Excel (one for each query)
keeping the following voices:

- Title
- Author
- Data
- Reviews

Data cleaned and fixed

5. For each result the abstract has been read ( and labeled “relevant" or "non-relevant",
Three labels have been associated to each voice:

Speaker, who speaks:
- Medical (1)
- Non medical (0)

Typology, what kind of text:
- Manual (1)
- Essay (2)
- Narrative (3)

Self help, do-it-yourself remedies:
- Self Help (1)
- Others (2)

The resulting file .csv has been sorted like following:

- Query
- Title
- Author
- Reviews
- Year
- Relevant
- Speaker
- Typology
- Self Help

6. Dataset has been put in Raw generating different graphics then
arranged to compose the final one

Book covers

1. For each query a list of urls containing the links has been created on Excel,
copied-pasted on TextWrangler and saved as html

2. Files have been opened on Mozilla Firefox; using the plug-in DownThemAll
the covers have been downloaded and put in two different folders (one for each query) setting the starting order for the download “normal+slow” (advanced settings > other settings)

3. Once downloaded the images have been imported on Lightroom:
for each query the image hes been exported as 160x160 square and renamed as
001, 002, 003, 004… 200 and then substituted to the original ones

4. Each cover has been tagged as following:

- Device and technology (yellow)
- Neutral themes not linkable to any of the other categories (blue)
- Drugs and other addictions (red)
- Family and religion (purple)
- Medicine (green)
- Without cover (black)


20/11/2014 - 15/12/2014

Data source:
