The Italian Senate faces the problem of clustering amendments to optimize the scheduling of parliamentary sessions. Currently, this task is carried out by Similis, an application that tackles this problem by using a traditional term-frequency technique, which leads to clustering based on wording rather than semantics. Recent advances in natural language processing have led Italian institutions to investigate the adoption of pre-trained language models (PTLMs) for text analysis. Along this line, in this paper, we propose CLAMSE, an alternative system to Similis that uses Sentence-BERT pre-trained models to generate embeddings and then groups similar amendments through hierarchical agglomerative clustering. Our preliminary evaluation shows that CLAMSE achieves comparable performance to Similis using embeddings generated by pre-trained models without fine-tuning, paving the way for applying a clustering method with advanced contextual understanding. This study contributes to enhancing the effectiveness of institutional decision-making processes through the adoption of PTLMs.
Sajeva, A., Iannucci, S., Marchetti, C., Merialdo, P., Torlone, R. (2024). Clustering Amendments with Semantic Embeddings. In CEUR Workshop Proceedings (pp.312-320). CEUR-WS.
Clustering Amendments with Semantic Embeddings
Sajeva A.;Iannucci S.;Merialdo P.;Torlone R.
2024-01-01
Abstract
The Italian Senate faces the problem of clustering amendments to optimize the scheduling of parliamentary sessions. Currently, this task is carried out by Similis, an application that tackles this problem by using a traditional term-frequency technique, which leads to clustering based on wording rather than semantics. Recent advances in natural language processing have led Italian institutions to investigate the adoption of pre-trained language models (PTLMs) for text analysis. Along this line, in this paper, we propose CLAMSE, an alternative system to Similis that uses Sentence-BERT pre-trained models to generate embeddings and then groups similar amendments through hierarchical agglomerative clustering. Our preliminary evaluation shows that CLAMSE achieves comparable performance to Similis using embeddings generated by pre-trained models without fine-tuning, paving the way for applying a clustering method with advanced contextual understanding. This study contributes to enhancing the effectiveness of institutional decision-making processes through the adoption of PTLMs.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.