With the explosive growth of high precision and high-performance deep learning methods, the issue of object detection and classification has received a lot of attention in the last years. More recently, similar approaches have been investigated for audio object detection with promising results. The audio signal is first converted to a 2D time-frequency representation and then, by using multiple convolutional layers or multiple convolutional groups, patterns contained in audio recordings are localized. Here, we apply four computer vision's state-of-the-art CNNs adapting them for our audio classification and audio event detection tasks. Our results, performed on the publicly available audio dataset UrbanSound8K composed of more than 8700 short audio clips of urban sounds, confirm the effectiveness of the proposed approach for audio event detection and classification.
Eutizi, C., Benedetto, F. (2021). On the Performance Improvements of Deep Learning Methods for Audio Event Detection and Classification. In 2021 44th International Conference on Telecommunications and Signal Processing, TSP 2021 (pp.141-145). Institute of Electrical and Electronics Engineers Inc. [10.1109/TSP52935.2021.9522625].
On the Performance Improvements of Deep Learning Methods for Audio Event Detection and Classification
Eutizi C.;Benedetto F.
2021-01-01
Abstract
With the explosive growth of high precision and high-performance deep learning methods, the issue of object detection and classification has received a lot of attention in the last years. More recently, similar approaches have been investigated for audio object detection with promising results. The audio signal is first converted to a 2D time-frequency representation and then, by using multiple convolutional layers or multiple convolutional groups, patterns contained in audio recordings are localized. Here, we apply four computer vision's state-of-the-art CNNs adapting them for our audio classification and audio event detection tasks. Our results, performed on the publicly available audio dataset UrbanSound8K composed of more than 8700 short audio clips of urban sounds, confirm the effectiveness of the proposed approach for audio event detection and classification.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.