Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1242
|View full text |Cite
|
Sign up to set email alerts
|

Towards Learning a Universal Non-Semantic Representation of Speech

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
101
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 96 publications
(101 citation statements)
references
References 24 publications
0
101
0
Order By: Relevance
“…CBoW [16,25] SG [16,25] TemporalGap [16,25] Triplet Loss [16,25] TRILL [13] ble 2 shows that COLA embeddings consistently outperform all these methods. In particular, on acoustic scene classification, we obtain a competitive accuracy of 94% compared to 73% achieved with a triplet loss in [16].…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…CBoW [16,25] SG [16,25] TemporalGap [16,25] Triplet Loss [16,25] TRILL [13] ble 2 shows that COLA embeddings consistently outperform all these methods. In particular, on acoustic scene classification, we obtain a competitive accuracy of 94% compared to 73% achieved with a triplet loss in [16].…”
Section: Resultsmentioning
confidence: 99%
“…It contains 2 millions excerpts of 10 seconds audio from YouTube videos that are annotated in a multi-label fashion with over 500 classes. This dataset has been used by [16,25,13] for self-supervised pre-training. Since our method is self-supervised, we never use Audioset labels.…”
Section: Datasets and Tasksmentioning
confidence: 99%
See 3 more Smart Citations