Singing voice detection in polyphonic music
Martín Rocamora
Master thesis from Universidad de la República (Uruguay). Facultad de Ingeniería. IIE - Aug. 2011
Advisor: Alvaro Pardo
Co-advisor: Alvaro Pardo
Research Group(s): Procesamiento de Audio (gpa)
Department(s): Procesamiento de Señales
Descargar la publicación : Roc11.pdf [8Mo]  


When listening to music most people are able to distinguish the sound of different musical instruments, though this ability may require some training. However, when it comes to the singing voice, anyone can easily recognize it from the other several instruments of a musical piece. This dissertation deals with the automatic detection of singing voice in polyphonic music recordings. It is motivated by the idea that the automatic identification of the segments of a song containing vocals would be a helpful tool in music content processing research and related applications. In addition, the efforts on building such a tool could contribute to some extent to sound perception understanding and its emulation by machines. Two different computer systems are developed in this work that process a polyphonic music audio file and produce in return labels indicating time intervals when singing voice is present. Each of them corresponds to a different conceptual approach. The first one is a pattern recognition system based on acoustic features computed from the audio sound mixture and can be regarded as the standard solution. A significant effort has been put into its improvement by considering different acoustic features and machine learning techniques. Results indicate that it seems rather difficult to surpass certain performance bound by variations on this approach. For this reason, a novel way of addressing the singing voice detection problem was proposed, that involves the separation of harmonic sound sources from the polyphonic audio mixture. This is based on the hypothesis that sounds could be better characterized after being separated, which would provide an improved classification. A non-traditional time-frequency representation was implemented, devised for analysing non-stationary harmonic sound sources such as the singing voice. Besides, a polyphonic pitch tracking algorithm was proposed, which tries to identify and follow the most prominent harmonic sound sources in the audio mixture. Classification performance indicates that the proposed approach is a promising alternative, in particular for not much dense polyphonies where singing voice can be correctly tracked. As an outcome of this work an automatic singing voice separation system is obtained with encouraging results.

Datos adicionales


Referencias BibTex

Descargar BibTex bibtex

Otras publicaciones

» Martín Rocamora