Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2398
|View full text |Cite
|
Sign up to set email alerts
|

Using Speech Production Knowledge for Raw Waveform Modelling Based Styrian Dialect Identification

Abstract: This paper addresses the Styrian Dialect sub-challenge of the INTERSPEECH 2019 Computational Paralinguistics Challenge. We treat this challenge as dialect identification with no linguistic resources/knowledge and with limited acoustic resources, and develop end-to-end raw waveform modelling based methods that incorporate knowledge related to speech production. In this direction, we investigate two methods: (a) modelling the signals after source system decomposition and (b) transferring knowledge from articulat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 14 publications
0
7
0
Order By: Relevance
“…As shown in this figure, the input to the system consists of pairs of reference and test representations of utterances. We follow the same procedure as in [22] to extract AP features for the representations of utterances (cf. Section 3.2).…”
Section: Technical Approachmentioning
confidence: 99%
See 2 more Smart Citations
“…As shown in this figure, the input to the system consists of pairs of reference and test representations of utterances. We follow the same procedure as in [22] to extract AP features for the representations of utterances (cf. Section 3.2).…”
Section: Technical Approachmentioning
confidence: 99%
“…AP representations are extracted as in [22], where frame-level posteriors of four articulatory categories are computed, i.e., manner of articulation (e.g., degree of constriction), place of constriction, height of the tongue, and vowel. Posteriors for each category are estimated using CNNs trained on healthy speech data from the AMI corpus [25] based on acoustic phoneme-to-articulatory feature mappings [21].…”
Section: Articulatory Posterior Representationmentioning
confidence: 99%
See 1 more Smart Citation
“…As shown in this figure, the input to the system consists of pairs of reference and test representations of utterances. We follow the same procedure as in [20] to extract AP features for the representations of utterances (cf. Section 3.2).…”
Section: Technical Approachmentioning
confidence: 99%
“…Currently, some applications of speech explored learning directly from raw waveform such as speech recognition [23]- [26], speaker verification [27], emotion recognition [28], and environment sound recognition [29]. In [30], raw waveform modeling approaches are used in Styrian dialect identification which performed better than the baseline methods. Inspired by this, we focus on analyzing the CNN filters trained on raw waveform for accent classification.…”
Section: Introductionmentioning
confidence: 99%