With the explosive growth of high precision and high-performance deep learning methods, the issue of object detection and classification has received a lot of attention in the last years. More recently, similar approaches have been investigated for audio object detection with promising results. The audio signal is first converted to a 2D time-frequency representation and then, by using multiple convolutional layers or multiple convolutional groups, patterns contained in audio recordings are localized. Here, we apply four computer vision's state-of-the-art CNNs adapting them for our audio classification and audio event detection tasks. Our results, performed on the publicly available audio dataset UrbanSound8K composed of more than 8700 short audio clips of urban sounds, confirm the effectiveness of the proposed approach for audio event detection and classification.

Eutizi, C., Benedetto, F. (2021). On the Performance Improvements of Deep Learning Methods for Audio Event Detection and Classification. In 2021 44th International Conference on Telecommunications and Signal Processing, TSP 2021 (pp.141-145). Institute of Electrical and Electronics Engineers Inc. [10.1109/TSP52935.2021.9522625].

On the Performance Improvements of Deep Learning Methods for Audio Event Detection and Classification

Eutizi C.;Benedetto F.
2021-01-01

Abstract

With the explosive growth of high precision and high-performance deep learning methods, the issue of object detection and classification has received a lot of attention in the last years. More recently, similar approaches have been investigated for audio object detection with promising results. The audio signal is first converted to a 2D time-frequency representation and then, by using multiple convolutional layers or multiple convolutional groups, patterns contained in audio recordings are localized. Here, we apply four computer vision's state-of-the-art CNNs adapting them for our audio classification and audio event detection tasks. Our results, performed on the publicly available audio dataset UrbanSound8K composed of more than 8700 short audio clips of urban sounds, confirm the effectiveness of the proposed approach for audio event detection and classification.
2021
978-1-6654-2933-7
Eutizi, C., Benedetto, F. (2021). On the Performance Improvements of Deep Learning Methods for Audio Event Detection and Classification. In 2021 44th International Conference on Telecommunications and Signal Processing, TSP 2021 (pp.141-145). Institute of Electrical and Electronics Engineers Inc. [10.1109/TSP52935.2021.9522625].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11590/393053
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 5
social impact