Recently many researchers have attempted automatic pitch estimation of polyphonic music (e.g., Li et al., IEEE Trans ASLP, 2009). Most of these attempts have concerned themselves with the estimation of individual pitches (F0s) while not associating the estimated pitches with the particular instruments that produce them. Estimating pitches for each instrument will lead to full music transcription. Individual instrument F0 tracks can be used in music information retrieval systems to better organize and search music. We propose a method to estimate the F0 tracks for a set of harmonic instruments in a sound mixture, using probabilistic latent component analysis (PLCA) and collections of basis spectra indexed by F0 and instrument learned in advance. The PLCA model is extended hierarchically to explain the observed input mixture spectra as a sum of basis spectra from note(s) of various instruments. The polyphonic pitch tracking problem is posed as inferring the most likely combination of the active note(s) from different instruments. Continuity and sparsity constraints are enforced to better model how the music is produced. The method was trained on a common instrument spectrum library an evaluated using an established polyphonic audio dataset.
A sinusoidal model for solo musical sounds consisting of time-varying harmonic amplitudes and frequencies allows for convenient temporal and spectral modifications. With a harmonic model, analysis frames can be grouped by fundamental frequency (F0) and then clustered in terms of their harmonic spectra. The resulting cluster centroid spectra are used as spectral libraries. When continuous audio monophonic passages are analyzed in the form of harmonic components, F0-vs.-time data are used to guide the extraction of parameters from the sound in order to find appropriate library spectra for resynthesis. Two methods for finding appropriate spectra are: 1) best rms match with the incoming spectra and 2) best spectral centroid match. These give similar results, but centroid matching yields smoother spectra over time. Timbre transposition is performed by using a library that belongs to another instrument. We have found that when the target instrument has a unique timbral quality based on its spectrum, the synthesis sounds mostly like that instrument. However, if the target instrument's spectral characteristic is not sufficiently differentiated from the source, the source timbral quality may dominate, probably due to its temporal behavior being transmitted. Results will be demonstrated by audio examples.
Acoustics 08 Paris
10733
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.