Application of loudness/pitch/timbre decomposition operators to auditory scene analysis

Abe, Mototsugu; Ando, Shigeru

doi:10.1109/icassp.1996.548008

Cited by 6 publications

(8 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 9 shows the continuous traces of the stream parameters based on Eq. (26). The initial value is the maximum point which occurred first significantly.…”

Section: Synthesized Sound With Two Streamsmentioning

confidence: 98%

Auditory scene analysis based on time‐frequency integration of shared FM and AM (I): Lagrange differential features and frequency‐axis integration

Abe¹,

Ando²

2002

Systems & Computers in Japan

View full text Add to dashboard Cite

SUMMARYWe will propose in this paper a new algorithm for a computational implementation of auditory scene analysis. This algorithm forms a three-layer structure of (1) subband decomposition by wavelet transform, (2) characterization of subband signal fragments by instantaneous frequency, frequency change rate, and amplitude change rate, and (3) frequency integration of subband signal features by voting method. We will perform the grouping and integration by voting the subband signal fragments into a nonparametric multipeak probability density distribution expressing "possibility of streams"; and then the recognition of the streams and the extraction of the stream parameters are realized by tracing its greatest point. It is confirmed from basic experiments for synthesized sounds and voices that the fundamental frequency/frequency change rate/amplitude change rate can be separated and estimated from multiple streams.

show abstract

“…Figure 9 shows the continuous traces of the stream parameters based on Eq. (26). The initial value is the maximum point which occurred first significantly.…”

Section: Synthesized Sound With Two Streamsmentioning

confidence: 98%

Auditory scene analysis based on time‐frequency integration of shared FM and AM (I): Lagrange differential features and frequency‐axis integration

Abe¹,

Ando²

2002

Systems & Computers in Japan

View full text Add to dashboard Cite

show abstract

“…6. Graphical representation of the Gaussian pdf of a scalar (continuous) random variable (left) and a two-dimensional Gaussian random variable (right).…”

Section: =1mentioning

confidence: 99%

“…The mathematical object p(x|^) admits two interpretations: First, when read as a pdf of x for a given 6, p(x|0) is called the conditional pdf of x^ conditioned on 6. Second, when seen as a function of 6 for given x, p(x|0) is called the parameter likelihood function and it is defined over the space of all possible values of 9 denoted O.…”

Section: Likelihood Functionsmentioning

confidence: 99%

“…This can however be avoided by modifying the likelihood into a penalized likelihood^ defined as L^W = Lx(0) + Ar? (0), (2.47) where Lx(^) is defined in (2.34) and f2 [6) is aimed at lowering the likelihood in parts of the space © which are to be avoided. ^^ In the Gaussian mixture example where J > m^ the penalty term can be set so as to disable variance parameters that are too small; see [91].…”

Section: Penalized Likelihood Approachesmentioning

confidence: 99%

“…}, there is an infinite number of ways to expand any x € W as ^(o = E E "S^i^^'w. (3)(4)(5)(6)(7)(8)(9)(10)(11) and one is naturally led to ask for the 'best' one, according to some given criterion. In such a way, we are naturally led to leave the world of 'rigid expansion' techniques, and enter the world of 'adaptive expansions'.…”

Section: A\\xfmentioning

confidence: 99%

See 2 more Smart Citations

Signal Processing Methods for Music Transcription

Klapuri¹,

Davy²

2006

257

View full text Add to dashboard Cite

Auditory scene analysis based on time‐frequency integration of shared FM and AM (II): Optimum time‐domain integration and stream sound reconstruction

Abe

Ando

2002

Systems & Computers in Japan

View full text Add to dashboard Cite

SUMMARYIn the preceding paper, we have proposed a method for auditory scene analysis, in which the instantaneous frequency, frequency change rate, and amplitude change rate in time-frequency space are intensified into a multipeak probability density distribution by voting method and the grouping into streams of mixed sounds is realized. In this paper, as the main point of the second half of this method, we will introduce the assumption that the stream parameters vary slowly according to the known dynamics and propose an integration method on the time axis, in which the probability density distribution of the stream parameters is optimally estimated in time series by a nonparametric Kalman filter. By doing so, the mechanism of higher auditory scene analysis such as enhancement of the accuracy of the stream parameters, interpolation and connection of the breaks of the streams, and introduction of a priori knowledge into stream selection can be realized. Moreover, the separation and reconstruction system of sounds which correspond to streams is constructed, and the proposed technique is verified by fundamental experiments for synthesized sounds or musical sounds and voices.

show abstract

Application of loudness/pitch/timbre decomposition operators to auditory scene analysis

Cited by 6 publications

References 7 publications

Auditory scene analysis based on time‐frequency integration of shared FM and AM (I): Lagrange differential features and frequency‐axis integration

Auditory scene analysis based on time‐frequency integration of shared FM and AM (I): Lagrange differential features and frequency‐axis integration

Signal Processing Methods for Music Transcription

Auditory scene analysis based on time‐frequency integration of shared FM and AM (II): Optimum time‐domain integration and stream sound reconstruction

Contact Info

Product

Resources

About