In this paper, we consider the task of automatically identifying whether different parts of medieval and modern manuscripts can be traced back to the same copyist/scribe, a problem of significant interest in paleography. Currently, the application of deep learning techniques in the context of scribe recognition has been hindered by the lack of a sufficiently large, labeled dataset, since the labeling process is incredibly complex and time-consuming. Here, we propose the first successful application of the recent framework of self-supervised learning to the field of digital paleography, wherein we pretrain a convolutional neural network by leveraging large amounts of unlabeled manuscripts. To this end, we build a novel dataset consisting of both labeled and unlabeled manuscripts for copyist identification extracted from the Vatican Apostolic Library. We show that fine-tuning this model to the task of interest significantly outperforms other baselines, including the common setup of initializing the network from general-domain features, or training the model from scratch, also in terms of generalization power. Overall, our results reveal the strong potential of self-supervised techniques in the field of digital paleography, where unlabeled data (i.e., digitized manuscripts) is nowadays available, while labeled data is scarcer.

Lastilla, L., Ammirati, S., Firmani, D., Komodakis, N., Merialdo, P., & Scardapane, S. (2022). Self-supervised learning for medieval handwriting identification: A case study from the Vatican Apostolic Library. INFORMATION PROCESSING & MANAGEMENT, 59(3), 102875 [10.1016/j.ipm.2022.102875].

Self-supervised learning for medieval handwriting identification: A case study from the Vatican Apostolic Library

Ammirati S.;;Merialdo P.;
2022

Abstract

In this paper, we consider the task of automatically identifying whether different parts of medieval and modern manuscripts can be traced back to the same copyist/scribe, a problem of significant interest in paleography. Currently, the application of deep learning techniques in the context of scribe recognition has been hindered by the lack of a sufficiently large, labeled dataset, since the labeling process is incredibly complex and time-consuming. Here, we propose the first successful application of the recent framework of self-supervised learning to the field of digital paleography, wherein we pretrain a convolutional neural network by leveraging large amounts of unlabeled manuscripts. To this end, we build a novel dataset consisting of both labeled and unlabeled manuscripts for copyist identification extracted from the Vatican Apostolic Library. We show that fine-tuning this model to the task of interest significantly outperforms other baselines, including the common setup of initializing the network from general-domain features, or training the model from scratch, also in terms of generalization power. Overall, our results reveal the strong potential of self-supervised techniques in the field of digital paleography, where unlabeled data (i.e., digitized manuscripts) is nowadays available, while labeled data is scarcer.
Lastilla, L., Ammirati, S., Firmani, D., Komodakis, N., Merialdo, P., & Scardapane, S. (2022). Self-supervised learning for medieval handwriting identification: A case study from the Vatican Apostolic Library. INFORMATION PROCESSING & MANAGEMENT, 59(3), 102875 [10.1016/j.ipm.2022.102875].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11590/399063
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact