2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461761
|View full text |Cite
|
Sign up to set email alerts
|

Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the “Speaking Rosetta” JSALT 2017 Workshop

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
29
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 34 publications
(29 citation statements)
references
References 3 publications
0
29
0
Order By: Relevance
“…In (Kamper et al) a dynamic time warping alignment is used to discover similar segment pairs. Our work is inspired by the research efforts in reducing the dependence on labeled data for building ASR systems through unsupervised unit discovery and acoustic representation leaning (Park and Glass, 2008;Glass;et. al., a,f), and through multiand cross-lingual transfer learning in low-resource conditions (et.…”
Section: Discussion and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In (Kamper et al) a dynamic time warping alignment is used to discover similar segment pairs. Our work is inspired by the research efforts in reducing the dependence on labeled data for building ASR systems through unsupervised unit discovery and acoustic representation leaning (Park and Glass, 2008;Glass;et. al., a,f), and through multiand cross-lingual transfer learning in low-resource conditions (et.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…background noise, recording channel, speaker identity, accent, emotional state, topic under discussion, and the language used in communication. The practical need for building ASR systems for new conditions with limited resources spurred a lot of work focused on unsupervised speech recognition and representation learning (Park and Glass, 2008;Glass;et. al., a,f;van den Oord et al, 2018;, in addition to semiand weakly-supervised learning techniques aiming at reducing the supervised data needed in realworld scenarios (Vesely et al;Li et al, b;Krishnan Parthasarathi and Strom;Chrupała et al;Kamper et al, 2017).…”
Section: Introductionmentioning
confidence: 99%
“…Several recent studies have trained models on images paired with unlabelled speech [4][5][6][22][23][24][25][26]. Most approaches map images and speech into a common space, allowing images to be retrieved using speech and vice versa.…”
Section: Related Workmentioning
confidence: 99%
“…The speech and images are then projected into the same "semantic" space. The DNN then learns to associate 1 Note, a summary and initial results of this work were presented in [59], also available in the HAL repository: https://hal.archives-ouvertes.fr/hal-01709578/document. The current paper provides more details on the experimental setups of the experiments, including more details on the used Deep Neural Network architectures and algorithms and rationales for the experiments.…”
Section: Introductionmentioning
confidence: 99%