Our project, In Codice Ratio, is an interdisciplinary research initiative for analyzing content of historical documents conserved in the Vatican Secret Archives (VSA). As most of such documents are digitized as images, Machine Transcription is both an enabler to the application of Knowledge Discovery techniques, as well as a useful tool to the paleographer for speeding up the transcription process. Our approach involves a convolutional neural network to recognize characters, statistical language models to compose and rank word transcriptions, and crowdsourcing for scalable training data collection. We have conducted experiments on pages from the medieval manuscript collection known as the Vatican Registers. Our results show that almost all the considered words can be transcribed without significant spelling errors.

Merialdo, P., Firmani, D., Nieddu, E., Maiorino, M. (2019). In Codice Ratio: Machine Transcription of Medieval Manuscripts. In 15th Italian Research Conference on Digital Libraries (IRCDL) (pp.185-194). Springer [10.1007/978-3-030-11226-4].

In Codice Ratio: Machine Transcription of Medieval Manuscripts

Merialdo Paolo;Firmani Donatella;Nieddu Elena;Maiorino Marco
2019-01-01

Abstract

Our project, In Codice Ratio, is an interdisciplinary research initiative for analyzing content of historical documents conserved in the Vatican Secret Archives (VSA). As most of such documents are digitized as images, Machine Transcription is both an enabler to the application of Knowledge Discovery techniques, as well as a useful tool to the paleographer for speeding up the transcription process. Our approach involves a convolutional neural network to recognize characters, statistical language models to compose and rank word transcriptions, and crowdsourcing for scalable training data collection. We have conducted experiments on pages from the medieval manuscript collection known as the Vatican Registers. Our results show that almost all the considered words can be transcribed without significant spelling errors.
2019
978-3-030-11225-7
Merialdo, P., Firmani, D., Nieddu, E., Maiorino, M. (2019). In Codice Ratio: Machine Transcription of Medieval Manuscripts. In 15th Italian Research Conference on Digital Libraries (IRCDL) (pp.185-194). Springer [10.1007/978-3-030-11226-4].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11590/346507
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact