In recent years, video data has been extensively used for surveillance purposes. Anyway, if a fight can be recognized by everyone, abnormal sounds could pass unnoticed. Moreover, if a dangerous event is not in our line of sight, the only cue that can be exploited is the sound produced by the threat. With the adoption of Artificial Intelligence-based techniques, it is possible to detect these anomalies by inspecting the videos acquired by cameras located inside the bus. However, this scenario is complex for several reasons since video analysis is computationally expensive, requiring costly hardware equipment for processing and storage. Moreover, videos suffer from occlusions and luminance variations, making the system not suitable in all situations. To this aim, the objective of my Ph.D. is to propose a data-driven framework that can detect if an audio recording is anomalous and, if this is the case, to identify which and where dangerous events are occurring. The architecture I propose, denoted as Coarse-to-Fine, is composed of two elements. The first is responsible for modeling the normal background of a target environment in an unsupervised fashion. If an anomalous audio is detected, a second element focuses on what, when, and where the anomaly occurs.
Neri, M. (2024). Anomaly detection and classification of audio signals with artificial intelligence techniques. SCIENCE TALKS, 10 [10.1016/j.sctalk.2024.100351].
Anomaly detection and classification of audio signals with artificial intelligence techniques
Neri, Michael
2024-01-01
Abstract
In recent years, video data has been extensively used for surveillance purposes. Anyway, if a fight can be recognized by everyone, abnormal sounds could pass unnoticed. Moreover, if a dangerous event is not in our line of sight, the only cue that can be exploited is the sound produced by the threat. With the adoption of Artificial Intelligence-based techniques, it is possible to detect these anomalies by inspecting the videos acquired by cameras located inside the bus. However, this scenario is complex for several reasons since video analysis is computationally expensive, requiring costly hardware equipment for processing and storage. Moreover, videos suffer from occlusions and luminance variations, making the system not suitable in all situations. To this aim, the objective of my Ph.D. is to propose a data-driven framework that can detect if an audio recording is anomalous and, if this is the case, to identify which and where dangerous events are occurring. The architecture I propose, denoted as Coarse-to-Fine, is composed of two elements. The first is responsible for modeling the normal background of a target environment in an unsupervised fashion. If an anomalous audio is detected, a second element focuses on what, when, and where the anomaly occurs.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.