The Project of a “Thesaurus Linguae Arabicae”: linguistic and computational issues

Solimando, Cristina; Lancioni, Giuliano

"The project Thesaurus Linguae Arabicae aims to build a network of Arabic corpuses in order to give scholars research tools such as extended concordances, historical and etymological dictionaries, corpuses that reflect the linguistic reality. The absence of these research tools pushed a group of scholars coming from different countries to meet in a workshop organized in Rome (“Towards a Thesaurus Linguae Arabicae”, October 11-13, 2011) in which issues on the building of communication interfaces between existing corpuses and a model for the development of new corpuses have been discussed. The choice of diversified textual typologies, that have been defined “without adjectives”, and of encoding tools and methodologies are the key points: the adoption of the Text Encoding Initiative standards and the recourse to automatic segmenters and lemmatizers for computer aided annotation are the first steps towards the realization of this ambitious project. "

Solimando, C., Lancioni, G. (2012). The Project of a “Thesaurus Linguae Arabicae”: linguistic and computational issues. In M.J. CLIVAZ C (a cura di), Lire Demain. Des manuscrits antiques à l'ère digitale / Reading Tomorrow. From Ancient Manuscripts to the Digital Era (pp. 633-648). LAUSANNE : Presses Polytechniques et Universitaires Romandes.