End–to–end convolutional neural networks for sound event detection in urban environments
Pablo Zinemanas, Pablo Cancela, Martín Rocamora
Proceedings of the 24th Conference of Open Innovations Association FRUCT, 2nd IEEE FRUCT International Workshop on Semantic Audio and the Internet of Things, Moscow, Russia, 8-12 apr, page 533--539- 2019
Research group(s):  Procesamiento de Audio (gpa)
Department(s):  Procesamiento de Señales
Download the publication : ZCR19.pdf [541KB]  


We present a novel approach to tackle the problemof sound event detection (SED) in urban environments using end-to-end convolutional neural networks (CNN). It consists of a 1DCNN for extracting the energy on mel–frequency bands from theaudio signal based on a simple filter bank, followed by a 2DCNN for the classification task. The main goal of this two-stagearchitecture is to bring more interpretability to the first layers ofthe network and to permit their reutilization in other problems ofsame the domain. We present a novel model to calculate the mel–spectrogam using a neural network that outperforms an existingwork, both in its simplicity and its matching performance. Also,we implement a recently proposed approach to normalize theenergy of the mel–spectrogram (per channel energy normaliza-tion, PCEN) as a layer of the neural network. We show how theparameters of this normalization can be learned by the networkand why this is useful for SED on urban environments. Westudy how the training modifies the filter bank as well as thePCEN normalization parameters. The obtained system achievesclassification results that are comparable to the state–of–the–art,while decreasing the number of parameters involved

Additional data


BibTex references

Descargar BibTex bibtex

Other publications in the database

» Pablo Zinemanas
» Pablo Cancela
» Martín Rocamora