In the contemporary landscape, the pervasive use of advanced chatbots for content generation and translation raises intriguing linguistic questions. Acknowledging the complex ethical considerations associated with the development and use of these language models, it becomes apparent that there exists a notable scientific void in the realm of general linguistics concerning this interesting subject, particularly in relation to the Italian language, as evidenced by the scarcity of specific studies on this topic. The purpose of this article is to bridge this divide by conducting a thorough linguistic analysis of a subset of texts produced by two influential language models, namely Gpt4 and Jasper, which have gained global popularity. From a methodological standpoint, we meticulously constructed two corpora, each comprising 10 short stories for children generated by Chat Gpt and Jasper, respectively. To achieve this, we utilized prompt engineering tools to ensure the use of highly detailed and explicit inputs for each language model. Additionally, we incorporated a third corpus of equal size, consisting of 10 classic fairy tales, such as Hansel and Gretel and Tom Thumb, translated into Italian, in order to establish a solid comparative linguistic foundation with human language. Through a comprehensive qualitative and quantitative analysis, facilitated by the utilization of SketchEngine, we uncovered the distinctive linguistic features of Chat Gpt and Jasper, aiming to provide an initial characterization of these language models, and paying specific attention to the deviations and idiosyncrasies inherent in this unique form of linguistic expression. Ultimately, this paper aims to serve as a catalyst for further reflections on the topic, to foster a deeper understanding of the implications and potential consequences of employing these language models, sparking meaningful discussions in both the ethical and linguistic realms.
Calò, C., Palmerini, M. (2024). Exploring the linguistic landscape of GPT4 and Jasper Language Models: a corpus-based analysis of Italian short stories for children. In J.A.N.Á.y.O.S.O.G. Salud A. Flores Borjabad (a cura di), Tejiendo palabras: explorando la lengua, la lingüística y el proceso de traducción en la era de la inteligencia artificial (pp. 269-286). Madrid : Dykinson, S. L..
Exploring the linguistic landscape of GPT4 and Jasper Language Models: a corpus-based analysis of Italian short stories for children
Monica Palmerini
Conceptualization
2024-01-01
Abstract
In the contemporary landscape, the pervasive use of advanced chatbots for content generation and translation raises intriguing linguistic questions. Acknowledging the complex ethical considerations associated with the development and use of these language models, it becomes apparent that there exists a notable scientific void in the realm of general linguistics concerning this interesting subject, particularly in relation to the Italian language, as evidenced by the scarcity of specific studies on this topic. The purpose of this article is to bridge this divide by conducting a thorough linguistic analysis of a subset of texts produced by two influential language models, namely Gpt4 and Jasper, which have gained global popularity. From a methodological standpoint, we meticulously constructed two corpora, each comprising 10 short stories for children generated by Chat Gpt and Jasper, respectively. To achieve this, we utilized prompt engineering tools to ensure the use of highly detailed and explicit inputs for each language model. Additionally, we incorporated a third corpus of equal size, consisting of 10 classic fairy tales, such as Hansel and Gretel and Tom Thumb, translated into Italian, in order to establish a solid comparative linguistic foundation with human language. Through a comprehensive qualitative and quantitative analysis, facilitated by the utilization of SketchEngine, we uncovered the distinctive linguistic features of Chat Gpt and Jasper, aiming to provide an initial characterization of these language models, and paying specific attention to the deviations and idiosyncrasies inherent in this unique form of linguistic expression. Ultimately, this paper aims to serve as a catalyst for further reflections on the topic, to foster a deeper understanding of the implications and potential consequences of employing these language models, sparking meaningful discussions in both the ethical and linguistic realms.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.