This work investigates how to detect emergency vehicles such as ambulances, fire engines, and police cars based on their siren sounds. Recognizing that car drivers may sometimes be unaware of the siren warnings from the emergency vehicles, especially when in-vehicle audio systems are used, we propose to develop an automatic detection system that determines whether there are siren sounds from emergency vehicles nearby to alert other vehicles' drivers to pay attention. A convolutional neural network (CNN)-based ensemble model (SirenNet) with two network streams is designed to classify sounds of traffic soundscape to siren sounds, vehicle horns, and noise, in which the first stream (WaveNet) directly processes raw waveform, and the second one (MLNet) works with a combined feature formed by MFCC (Mel-frequency cepstral coefficients) and log-mel spectrogram. Our experiments conducted on a diverse dataset show that the raw data can complement the MFCC and log-mel features to achieve a promising accuracy of 98.24% in the siren sound detection. In addition, the proposed system can work very well with variable input length. Even for short samples of 0.25 seconds, the system still achieves a high accuracy of 96.89%. The proposed system could be helpful for not only drivers but also autopilot systems. INDEX TERMS Audio recognition, convolutional neural networks, emergency vehicle detection, siren sounds.
This paper presents an effective technique for automatically clustering undocumented music recordings based on their associated singer. This serves as an indispensable step towards indexing and content-based information retrieval of music by singer. The proposed clustering system operates in an unsupervised manner, in which no prior information is available regarding the characteristics of singer voices, nor the population of singers. Methods are presented to separate vocal from non-vocal regions, to isolate the singers' vocal characteristics from the background music, to compare the similarity between singers' voices, and to determine the total number of unique singers from a collection of songs. Experimental evaluations conducted on a 200-track pop music database confirm the validity of the proposed system.
This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels in each track: one is a mixture of the lead vocals and background accompaniment, and the other consists of accompaniment only. Although the two audio channels are distinct, the accompaniments in the two channels often resemble each other. We exploit this characteristic to: i) infer the background accompaniment for the lead vocals from the accompaniment-only channel, so that the main melody underlying the lead vocals can be extracted more effectively; and ii) detect phrase onsets based on the Bayesian information criterion (BIC) to predict the onset points of a song where a user's sung query may begin, so that the similarity between the melodies of the query and the song can be examined more efficiently. To further refine extraction of the main melody, we propose correcting potential errors in the estimated sung notes by exploiting a composition characteristic of popular songs whereby the sung notes within a verse or chorus section usually vary no more than two octaves. In addition, to facilitate an efficient and accurate search of a large music database, we employ multiple-pass dynamic time warping (DTW) combined with multiple-level data abstraction (MLDA) to compare the similarities of melodies. The results of experiments conducted on a karaoke database comprised of 1071 popular songs demonstrate the feasibility of query-by-singing retrieval for karaoke music.Index Terms-Bayesian information criterion, dynamic time warping, karaoke, music information retrieval, query-by-singing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.