2020
DOI: 10.48550/arxiv.2007.13465
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation

Abstract: We propose a self-supervised representation learning model for the task of unsupervised phoneme boundary detection. The model is a convolutional neural network that operates directly on the raw waveform. It is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle. At test time, a peak detection algorithm is applied over the model outputs to produce the final boundaries. As such, the proposed model is trained in a fully unsupervised manner with no manual annotatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(29 citation statements)
references
References 23 publications
0
29
0
Order By: Relevance
“…The frames with high dissimilarity are considered segment boundary candidates. Similar to [23], [35], [36], we apply a peak detection algorithm to find the final segment boundaries. The peak prominence value for the peak detection algorithm is fined-tuned on the validation dataset [23].…”
Section: E Inferencementioning
confidence: 99%
See 4 more Smart Citations
“…The frames with high dissimilarity are considered segment boundary candidates. Similar to [23], [35], [36], we apply a peak detection algorithm to find the final segment boundaries. The peak prominence value for the peak detection algorithm is fined-tuned on the validation dataset [23].…”
Section: E Inferencementioning
confidence: 99%
“…Similar to [23], [35], [36], we apply a peak detection algorithm to find the final segment boundaries. The peak prominence value for the peak detection algorithm is fined-tuned on the validation dataset [23]. For word segmentation, the model outputs a dissimilarity score between context representation and segment representation.…”
Section: E Inferencementioning
confidence: 99%
See 3 more Smart Citations