As a matter of fact in the last years Twitter is becoming the new big data container, due to the deep increase of amount of users and its growing popularity. Moreover the huge amount of users prole and rough text data, are providing continuously new research challenges. This paper reports our contribution and results to the Trec 2012 Microblog Track. In this particular challenge each participant is required to conduct a "`real-time"' retrieval task, that given a query topic seeks for the most recent and relevant tweets. We devised an effective real time ranking algorithm, avoiding heavy computational requirements. Our contribution is multifold: (1) adapting an existing ranking method BM25 to microblogging purpose (2) combining content-based features with knowledge extracted from Wikipedia (3) employing Pseudo Relevance Feedback techniques for query expansion (4) performing text analysis such as ad-hoc text normalization and POS Tagging to limit noise data and better represent useful information.

Feltoni Gurini, D., Gasparetti, F. (2012). Trec Microblog 2012 Track: Real-Time Algorithm for Microblog Ranking Systems. In Proc. of The 21st Text REtrieval Conference, TREC 2012, Gaithersburg, Maryland, November 6-9, 2012.

Trec Microblog 2012 Track: Real-Time Algorithm for Microblog Ranking Systems

GASPARETTI, FABIO
2012-01-01

Abstract

As a matter of fact in the last years Twitter is becoming the new big data container, due to the deep increase of amount of users and its growing popularity. Moreover the huge amount of users prole and rough text data, are providing continuously new research challenges. This paper reports our contribution and results to the Trec 2012 Microblog Track. In this particular challenge each participant is required to conduct a "`real-time"' retrieval task, that given a query topic seeks for the most recent and relevant tweets. We devised an effective real time ranking algorithm, avoiding heavy computational requirements. Our contribution is multifold: (1) adapting an existing ranking method BM25 to microblogging purpose (2) combining content-based features with knowledge extracted from Wikipedia (3) employing Pseudo Relevance Feedback techniques for query expansion (4) performing text analysis such as ad-hoc text normalization and POS Tagging to limit noise data and better represent useful information.
2012
Feltoni Gurini, D., Gasparetti, F. (2012). Trec Microblog 2012 Track: Real-Time Algorithm for Microblog Ranking Systems. In Proc. of The 21st Text REtrieval Conference, TREC 2012, Gaithersburg, Maryland, November 6-9, 2012.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11590/185447
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
social impact