2018
DOI: 10.1007/978-3-030-01692-0_1
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Recognition of Sound Categories from Their Vocal Imitation Using Audio Primitives Automatically Found by SI-PLCA and HMM

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 12 publications
0
4
0
Order By: Relevance
“…Some phoneticians have turned their attention to non-speech voice production, trying to identify the most relevant phonetic components that are found in vocal imitations [30]. They identified the broad categories of phonation (i.e., quasi-periodic oscillations due to vocal fold vibrations), turbulence, supraglottal myoelastic vibrations, and clicks, which can be extracted automatically from audio with time-frequency analysis and supervised [31] or unsupervised [32] machine learning. These categories can be made to correspond to categories of sounds as they are perceived [33], and as they are produced in the physical world.…”
Section: Voice As Embodied Soundmentioning
confidence: 99%
See 1 more Smart Citation
“…Some phoneticians have turned their attention to non-speech voice production, trying to identify the most relevant phonetic components that are found in vocal imitations [30]. They identified the broad categories of phonation (i.e., quasi-periodic oscillations due to vocal fold vibrations), turbulence, supraglottal myoelastic vibrations, and clicks, which can be extracted automatically from audio with time-frequency analysis and supervised [31] or unsupervised [32] machine learning. These categories can be made to correspond to categories of sounds as they are perceived [33], and as they are produced in the physical world.…”
Section: Voice As Embodied Soundmentioning
confidence: 99%
“…A projector system Π i in the (Hilbert) space of states is Hermitian, idempotent, and complete. If the system is in state |ψ before measurement, the probability that the outcome of a measurement through a projector system returns j is p m ( j|ψ) = ψ|Π j |ψ , (32) and as a result of the measurement, the system collapses in state ψ ( j|ψ) . Given an orthonormal basis of measurement vectors |a j , the elementary projectors are Π j = |a j a j |, p m ( j|ψ) = | ψ|a j | 2 , and the system (by neglecting a unitary phasor) collapses into ψ…”
Section: Measurementmentioning
confidence: 99%
“…Some phoneticians have turned their attention to non-speech voice production, trying to identify the most relevant phonetic components that are found in vocal imitations [24]. They identified the broad categories of phonation (i.e., quasi periodic oscillations due to vocal fold vibrations), turbulence, supraglottal myoelastic vibrations, and clicks, which can be extracted automatically from audio with time-frequency analysis and supervised [19] or unsupervised [36] machine learning. These categories can be made to correspond to categories of sounds as they are perceived [32], and as they are produced in the physical world.…”
Section: Voice As Embodied Soundmentioning
confidence: 99%
“…The density matrix (18) would evolve according to equation (24), where the unitary operator U(0, t) is defined as in (25). When a pitch measurement is taken, the outcome would be up or down according to equation (35), and the density matrix that results from collapsing would be given by equation (36).…”
Section: Mixed As In a Mixermentioning
confidence: 99%