Content-based approaches to research paper recommendation are important when user feedback is sparse or not available. The task of content-based matching is challenging, mainly due to the problem of determining the semantic similarity of texts. Nowadays, there exist many sentence embedding models that learn deep semantic representations by being trained on huge corpora, aiming to provide transfer learning to a wide variety of natural language processing tasks. In this work, we present a comparative evaluation among five well-known pre-trained sentence encoders deployed in the pipeline of title-based research paper recommendation. The experimented encoders are USE, BERT, InferSent, ELMo, and SciBERT. For our study, we propose a methodology for evaluating such models in reranking BM25-based recommendations. The experimental results show that the sole consideration of semantic information from these encoders does not lead to improved recommendation performance over the traditional BM25 technique, while their integration enables the retrieval of a set of relevant papers that may not be retrieved by the BM25 ranking function.
Mohamed Hassan, H.A., Sansonetti, G., Gasparetti, F., Micarelli, A., Beel, J. (2019). BERT, ELMo, use and infersent sentence encoders: The Panacea for research-paper recommendation?. In Proceedings of ACM RecSys 2019 Late-breaking Results (pp.6-10).
BERT, ELMo, use and infersent sentence encoders: The Panacea for research-paper recommendation?
Sansonetti G.
;Gasparetti F.;Micarelli A.;
2019-01-01
Abstract
Content-based approaches to research paper recommendation are important when user feedback is sparse or not available. The task of content-based matching is challenging, mainly due to the problem of determining the semantic similarity of texts. Nowadays, there exist many sentence embedding models that learn deep semantic representations by being trained on huge corpora, aiming to provide transfer learning to a wide variety of natural language processing tasks. In this work, we present a comparative evaluation among five well-known pre-trained sentence encoders deployed in the pipeline of title-based research paper recommendation. The experimented encoders are USE, BERT, InferSent, ELMo, and SciBERT. For our study, we propose a methodology for evaluating such models in reranking BM25-based recommendations. The experimental results show that the sole consideration of semantic information from these encoders does not lead to improved recommendation performance over the traditional BM25 technique, while their integration enables the retrieval of a set of relevant papers that may not be retrieved by the BM25 ranking function.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.