Biometric recognition systems have become a cornerstone of modern authentication technologies, enabling automatic identity verification based on physiological traits. While deep learning has significantly improved recognition accuracy across multiple biometric modalities, current systems still face critical challenges related to robustness, generalization, and transparency. In particular, the black-box nature of deep neural networks raises concerns regarding trustworthiness, while real-world operating conditions— such as cross-session variability, unseen subjects, and signal degradation—often lead to performance drops that are difficult to interpret. This doctoral thesis investigates the role of attention-based deep learning architectures and intrinsic explainability mechanisms in biometric recognition, with a specific focus on iris and vein modalities. The central hypothesis of this work is that attention mechanisms, as implemented in Vision Transformer (ViT) architectures, can simultaneously enhance recognition robustness and provide meaningful insights into the decision-making processes of biometric systems. The thesis first provides a comprehensive overview of biometric recognition pipelines, presentation attack detection (PAD), and explainable artificial intelligence (XAI), highlighting the limitations of traditional convolutional approaches and post-hoc explainability methods. Particular emphasis is placed on the distinction between intrinsic and post-model explainability, motivating the adoption of attention-based architectures as a principled compromise between performance and interpretability. Subsequently, the effectiveness of Vision Transformers is investigated in the context of wrist-vein biometric recognition. Experiments conducted on public datasets under open-set verification scenarios demonstrate that ViT-based models are capable of extracting highly discriminative features while exhibiting superior generalization compared to convolutional neural networks. The attention maps generated by the proposed models are analyzed to assess whether the decision process aligns with physiologically significant vein patterns, thus providing intrinsic explainability. The results show that attention mechanisms consistently focus on anatomically relevant regions, supporting the interpretability and reliability of the recognition process. iii The thesis further explores the detection of iris presentation attack under realistic conditions by analyzing the impact of image compression on recognition performance and explainability. By evaluating traditional and deep learning-based compression schemes, the study demonstrates how increasing signal degradation affects not only error rates, but also the spatial distribution of attention within the model. Metrics such as percent root mean square difference (PRD) and Compression Efficiency are used to link signal-level distortion to recognition failures, revealing how attention degradation correlates with incorrect classification in PAD scenarios. Overall, this work demonstrates that explainability in biometric systems should not be regarded solely as a transparency requirement but also as a powerful analytical tool for understanding robustness and failure modes. By integrating attention-based architectures with systematic experimental evaluation under realistic conditions, this thesis contributes to the development of more reliable, interpretable, and trustworthy biometric recognition systems.

Albano, R. (2026). Attention-based Network and Explainable AI for Biometrics Recognition.

Attention-based Network and Explainable AI for Biometrics Recognition

Rocco Albano
2026-05-12

Abstract

Biometric recognition systems have become a cornerstone of modern authentication technologies, enabling automatic identity verification based on physiological traits. While deep learning has significantly improved recognition accuracy across multiple biometric modalities, current systems still face critical challenges related to robustness, generalization, and transparency. In particular, the black-box nature of deep neural networks raises concerns regarding trustworthiness, while real-world operating conditions— such as cross-session variability, unseen subjects, and signal degradation—often lead to performance drops that are difficult to interpret. This doctoral thesis investigates the role of attention-based deep learning architectures and intrinsic explainability mechanisms in biometric recognition, with a specific focus on iris and vein modalities. The central hypothesis of this work is that attention mechanisms, as implemented in Vision Transformer (ViT) architectures, can simultaneously enhance recognition robustness and provide meaningful insights into the decision-making processes of biometric systems. The thesis first provides a comprehensive overview of biometric recognition pipelines, presentation attack detection (PAD), and explainable artificial intelligence (XAI), highlighting the limitations of traditional convolutional approaches and post-hoc explainability methods. Particular emphasis is placed on the distinction between intrinsic and post-model explainability, motivating the adoption of attention-based architectures as a principled compromise between performance and interpretability. Subsequently, the effectiveness of Vision Transformers is investigated in the context of wrist-vein biometric recognition. Experiments conducted on public datasets under open-set verification scenarios demonstrate that ViT-based models are capable of extracting highly discriminative features while exhibiting superior generalization compared to convolutional neural networks. The attention maps generated by the proposed models are analyzed to assess whether the decision process aligns with physiologically significant vein patterns, thus providing intrinsic explainability. The results show that attention mechanisms consistently focus on anatomically relevant regions, supporting the interpretability and reliability of the recognition process. iii The thesis further explores the detection of iris presentation attack under realistic conditions by analyzing the impact of image compression on recognition performance and explainability. By evaluating traditional and deep learning-based compression schemes, the study demonstrates how increasing signal degradation affects not only error rates, but also the spatial distribution of attention within the model. Metrics such as percent root mean square difference (PRD) and Compression Efficiency are used to link signal-level distortion to recognition failures, revealing how attention degradation correlates with incorrect classification in PAD scenarios. Overall, this work demonstrates that explainability in biometric systems should not be regarded solely as a transparency requirement but also as a powerful analytical tool for understanding robustness and failure modes. By integrating attention-based architectures with systematic experimental evaluation under realistic conditions, this thesis contributes to the development of more reliable, interpretable, and trustworthy biometric recognition systems.
12-mag-2026
38
ELETTRONICA APPLICATA
Explainable AI; Neural Network; Biometrics Recognition; Vision Transformer
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11590/541516
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact