Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-3130
|View full text |Cite
|
Sign up to set email alerts
|

Multiview Shared Subspace Learning Across Speakers and Speech Commands

Abstract: In many speech processing applications, the objective is to model different modes of variability to obtain robust speech features. In this paper, we learn speech representations in a multiview paradigm by constraining the views to known modes of variability such as speakers or spoken words. We use deep multiset canonical correlation (dMCCA) because it can model more than two views in parallel to learn a shared subspace across them. In order to model thousands of views (e.g., speakers), we demonstrate that stoc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 23 publications
0
6
0
Order By: Relevance
“…Schroff et al [19] have introduced FaceNet and the triplet-loss for projecting images onto a latent space that quantifies similarity in a supervised-learning manner. Recently, Somandepalli et al [20] used tracking of faces in a photo-realistic video, followed by clustering and verification using MvCorr [21] and Improved Triplet [22] to adapt available face representation data to perform better on racially diverse images following [23]. Aneja et al [24] have suggested DeepExpr model for facial expression recognition for multiple styles.…”
Section: Related Workmentioning
confidence: 99%
“…Schroff et al [19] have introduced FaceNet and the triplet-loss for projecting images onto a latent space that quantifies similarity in a supervised-learning manner. Recently, Somandepalli et al [20] used tracking of faces in a photo-realistic video, followed by clustering and verification using MvCorr [21] and Improved Triplet [22] to adapt available face representation data to perform better on racially diverse images following [23]. Aneja et al [24] have suggested DeepExpr model for facial expression recognition for multiple styles.…”
Section: Related Workmentioning
confidence: 99%
“…We first review the multi-view correlation (mv-corr) objective developed by Somandepalli et. al., (2019a;2019b).…”
Section: Proposed Approachmentioning
confidence: 99%
“…recordings from over 1800 speakers saying one or more of 30 commands such as "On" and "Off". The application of mv-corr for spoken-word recognition and text-dependent speaker recognition in SCD was studied by Somandepalli et al (2019a) for speaker recognition task compared to the SoA in this domain (Snyder et al, 2017). Building upon their work, in this paper, we analyze spoken-word recognition on SCD in a greater detail.…”
Section: Speech Commands Datasetmentioning
confidence: 99%
See 2 more Smart Citations