The implementation of second-screen service requires a tech nology for quick, accurate content identification. This enables the service to trace the channel of a broadcast program that a user is watching or listening to. One approach is to record an audio signal from the user's mobile device, and match it with one in a reference database. However, reverberation and exogenous noise distort a recorded audio signal, making ac curate identification more difficult. This paper presents a new fingerprinting method for content identification that is robust against reverberation and noise. It employs pseudo-sinusoidal components, which are components that can be regarded as sinusoidal over a short period of time. The method gener ates a fingerprint that represents the distribution of pseudo sinusoidal components in the time-frequency domain. Exper imental results show that the method can match a 5-s-long in put signal against 792 hours of reference signals in 1.29 s on a single PC, and can identify the correct program with a recall of over 92% and a precision of 100% in a realistic setting.
We proposed[1] nonlinear operators which decompose a changing energy of sound in wavelet domain into three orthogonal components: i.e., loudness and pitch as coherent changes, and timbre as incoherent change. We showed that they could detect the discontinuity of a single sound stream with excellent temporal resolution and sensitivity. In this paper, we extend the coherency principle so that it can describe and pursue the individual coherency of non-overlapping sound streams in wavelet domain. It is realized by Parzen's non-parametric estimates and Kalman filtering of loudness change rate and pitch shift rate. Using this method, we show some experiments for extraction of the most salient stream from multiple sound streams.
SUMMARYWe will propose in this paper a new algorithm for a computational implementation of auditory scene analysis. This algorithm forms a three-layer structure of (1) subband decomposition by wavelet transform, (2) characterization of subband signal fragments by instantaneous frequency, frequency change rate, and amplitude change rate, and (3) frequency integration of subband signal features by voting method. We will perform the grouping and integration by voting the subband signal fragments into a nonparametric multipeak probability density distribution expressing "possibility of streams"; and then the recognition of the streams and the extraction of the stream parameters are realized by tracing its greatest point. It is confirmed from basic experiments for synthesized sounds and voices that the fundamental frequency/frequency change rate/amplitude change rate can be separated and estimated from multiple streams.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.