Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1445
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum

Abstract: In this paper, we propose an end-to-end deep learning framework to detect speech paralinguistics using perception aware spectrum as input. Existing studies show that speech under cold has distinct variations of energy distribution on low frequency components compared with the speech under 'healthy' condition. This motivates us to use perception aware spectrum as the input to an end-to-end learning framework with small scale dataset. In this work, we try both Constant Q Transform (CQT) spectrum and Gammatone sp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(7 citation statements)
references
References 15 publications
(22 reference statements)
0
7
0
Order By: Relevance
“…E2E models rely purely on the statistics of the available data to learn optimal features, therefore it is possible that the available data, due to the imbalance, did not contain a sufficient variation for the cold class. An E2E approach was also presented by one of challenge participants [138]. In this system, Constant Q Transform (CQT) spectrum and Gammatone spectrum features were feed into 5 CNN layers followed by a single gated recurrent unit layer.…”
Section: Cold and Flu (2017)mentioning
confidence: 99%
“…E2E models rely purely on the statistics of the available data to learn optimal features, therefore it is possible that the available data, due to the imbalance, did not contain a sufficient variation for the cold class. An E2E approach was also presented by one of challenge participants [138]. In this system, Constant Q Transform (CQT) spectrum and Gammatone spectrum features were feed into 5 CNN layers followed by a single gated recurrent unit layer.…”
Section: Cold and Flu (2017)mentioning
confidence: 99%
“…Deep convolutional neural network plays an important role in many areas, including paralinguistic speech attribute recognition in recent years [15,21]. The convolutional structure can capture the patterns on the images, and more generally, on spectrograms or other time-frequency representations.…”
Section: Embeddingmentioning
confidence: 99%
“…DenseNet [24] connects every layer with other layers in a feedforward manner thus has the potential to reduce the problem of gradient vanishing. Compared to other network structures applied in previous paralinguistic challenges [15,21], the networks we adopt here are much deeper. Therefore we focus on the orca activity detection task due to the relatively large size of training data.…”
Section: Embeddingmentioning
confidence: 99%
See 1 more Smart Citation
“…For this task, we utilise the Upper Respiratory Tract Infection Corpus (URTIC) dataset as featured in the IN-TERSPEECH 2017 Computational Paralinguistics Challenge (COMPARE) [21]. A range of different approaches have already been undertaken on this data, from conventional OPENS-MILE [22] based systems [21,23], to more contemporary deep learning systems [24]. As the aim of the presented work is to explore the advantages of network pruning and optimisation, we opted to train standard multi-layer dense neural networks, the suitability of which have been demonstrated for this data [25].…”
Section: Introductionmentioning
confidence: 99%