How it has been done
After the capture of the first 100 results for each query in and the selection of 25 of them removing invalid urls, the text cointained in fine results has been extracted with
With a text analysis was performed on each query's corpus of text documents. From the analysis emerged a list of words (n-grams) sorted by Tf_idf value (term frequency–inverse document frequency). We took the first 10 words for each query and added a tag to them, creating a spreadsheet with xx columns: the associated keywords, the added queries, the n-grams and the related categories. With was produced a circular dendrogram in order to show queries' hierarchy and results' clustering.