Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1386
|View full text |Cite
|
Sign up to set email alerts
|

The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge

Abstract: This paper introduces our approaches for the orca activity and continuous sleepiness tasks in the Interspeech ComParE Challenge 2019. For the orca activity detection task, we extract deep embeddings using several deep convolutional neural networks, followed by the Support Vector Machine (SVM) based back end classifier. Both STFT spectrogram and log mel-spectrogram are explored as input features. To increase the size of training data and deal with the data imbalance, we propose four kinds of data augmentation. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 24 publications
0
10
0
Order By: Relevance
“…In [22], a .367 of CC was obtained by an early fusion of the learnt representations from attention and sequence to sequence autoencoders. Fisher Vector encodings were fused with the outputs of the ComParE Functionals in [23] to get a .365 of CC. In both [24] and [25], CNNs were exploited in an end-to-end deep learning approach: no fusion techniques are executed in the former study to get a .335 of CC; in the latter, a fusion of their CNN models was made to get a .325 of CC score.…”
Section: Resultsmentioning
confidence: 99%
“…In [22], a .367 of CC was obtained by an early fusion of the learnt representations from attention and sequence to sequence autoencoders. Fisher Vector encodings were fused with the outputs of the ComParE Functionals in [23] to get a .365 of CC. In both [24] and [25], CNNs were exploited in an end-to-end deep learning approach: no fusion techniques are executed in the former study to get a .335 of CC; in the latter, a fusion of their CNN models was made to get a .325 of CC score.…”
Section: Resultsmentioning
confidence: 99%
“…Feature aggregation has been a key component of successful systems for speech paralinguistic tasks [64]- [66], and in recent years we have also had success using these methods [67]- [69]. In this work, we experimented with three types of feature aggregation methods that are based on functionals, Bag of Words [70], and Fisher Vector Encoding [71].…”
Section: Feature Aggregationmentioning
confidence: 99%
“…There are also benchmarking solutions to the LID. Google compact language detector (CLD) and TextCat employ n-gram based method 1 , LogR [20] uses a discriminative strategy with regularized logistic regression [16]. Cavnar and Trenkle [14] provided outstanding results compared to the other state-of-the-art methods.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Natural language processing (NLP) techniques play an important role in the classification and processing of huge digital documents on the Web [1,2]. Determination of the language of a text's content is called Language Identification (LID classification).…”
Section: Introductionmentioning
confidence: 99%