Exploring the Web personalization

The Wikipedia survey

Introduction

We decided to analyze the debate about the Web Personalization, but what is the Web Personalization? Which areas it involves? What are the different aspects of it? Starting from these questions we made a research on wikipedia, the largest free encyclopedia of the web, to find the most relevant pages. The most important of all is “Personalization” and we moved from this to others like “Personalized search” and “Filter bubble” thanks to the links in them and to the see also at the bottom of the pages. In order to create a web with all these pages we used Gephi to connect all the pages with a link between them, through the see also of each.

How to read the visualization

This chart shows the main areas involved in the topic of web personalization. The most important areas are society, technology, marketing and privacy.
If you want, you can select each areas and see every ramifications, to understand the real debate range.
These areas were resulted very important through the protocol 2 and defined very well the debate.

How it has been done

Were selected 5 important pages (seeds) of the Wikipedia.org database, then we filtered all the links that were in their respective seealso sections. We tried to maintain only the pages that we thought were relevant in the topic of web personalization. In addition, we considered a couple of more links that we wanted to see on the map so that we could know if they were actually relevant or not. Using te tool Seealsology, these links were pasted and we obtained 2975 more nodes that we mapped using the tool Gephi. Then we simplified this view scraping manually from the Gephi file the most relevant pages, considered because of their indegree range.
Then we re-read those pages on Wikipedia and considered just the ones that were really relevant.

Findings

The main reason we decided to make this graph was to better understand the areas that were involved in this topic in order to be able to formulate a proper research question. We were able to identify 4 main clusters which are Society, Technology, Privacy & control and Marketing. In this way we understood that Society is most important part and it comprehends the social media aspect and the online identity with digital traces and digital identity. It also englobe the mass customization and collective intelligence with the relevance paradox, collaborative filtering and filter bubble. The technology side was mainly composed by the data mining, the predictive analysis and the simulated reality part. Another important aspect of Web Personalization is the one about the privacy and control, with a big part about control and the mass surveillance and a smaller part about the privacy on the internet. Last but not least the marketing side, with all its aspects of internet marketing. This map gave us a general idea of the areas involved and was also useful to better comprehend which aspects were more important to investigate.
Finding 01 In the analysis of the main page of our topic, we found out that a section about what are the advantages and the disadvantages of the theme were deleted in 2011, and moved to another page.
The new page with this paragraph is “Personalized search” which includes, not only these important parts about the debate, but also a paragraph about the main debate of the Web Personalization, called “Filter Bubble”.

Metadata

Timestamp: 10/11/2014 - 27/11/2014

Data source: Wikipedia

Related Protocol

Download data (4MB)