Our project, In Codice Ratio, is an interdisciplinary research initiative for analyzing content of historical documents conserved in the Vatican Secret Archives (VSA). As most of such documents are digitized as images, Machine Transcription is both an enabler to the application of Knowledge Discovery techniques, as well as a useful tool to the paleographer for speeding up the transcription process. Our approach involves a convolutional neural network to recognize characters, statistical language models to compose and rank word transcriptions, and crowdsourcing for scalable training data collection. We have conducted experiments on pages from the medieval manuscript collection known as the Vatican Registers. Our results show that almost all the considered words can be transcribed without significant spelling errors.
Merialdo, P., Firmani, D., Nieddu, E., Maiorino, M. (2019). In Codice Ratio: Machine Transcription of Medieval Manuscripts. In 15th Italian Research Conference on Digital Libraries (IRCDL) (pp.185-194). Springer [10.1007/978-3-030-11226-4].
In Codice Ratio: Machine Transcription of Medieval Manuscripts
Merialdo Paolo;Firmani Donatella;Nieddu Elena;Maiorino Marco
2019-01-01
Abstract
Our project, In Codice Ratio, is an interdisciplinary research initiative for analyzing content of historical documents conserved in the Vatican Secret Archives (VSA). As most of such documents are digitized as images, Machine Transcription is both an enabler to the application of Knowledge Discovery techniques, as well as a useful tool to the paleographer for speeding up the transcription process. Our approach involves a convolutional neural network to recognize characters, statistical language models to compose and rank word transcriptions, and crowdsourcing for scalable training data collection. We have conducted experiments on pages from the medieval manuscript collection known as the Vatican Registers. Our results show that almost all the considered words can be transcribed without significant spelling errors.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.