2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639335
|View full text |Cite
|
Sign up to set email alerts
|

Zero resource spoken audio corpus analysis

Abstract: Zero-resource speech processing involves the automatic analysis of a collection of speech data in a completely unsupervised fashion without the benefit of any transcriptions or annotations of the data. In this paper, our zero-resource system seeks to automatically discover important words, phrases and topical themes present in an audio corpus. This system employs a segmental dynamic time warping (S-DTW) algorithm for acoustic pattern discovery in conjunction with a probabilistic model which treats the topic an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 22 publications
(15 reference statements)
0
8
0
Order By: Relevance
“…Given a spoken query, the system then expanded the query using other commonly co-occurring discovered terms, thereby allowing retrieval of semantically related words (other than the one spoken). In [81], a topic model was applied to discovered words in a similar way. Here we are also doing semantic speech retrieval in the absence of transcriptions.…”
Section: E Abstractpotting and (Semantic) Speech Searchmentioning
confidence: 99%
“…Given a spoken query, the system then expanded the query using other commonly co-occurring discovered terms, thereby allowing retrieval of semantically related words (other than the one spoken). In [81], a topic model was applied to discovered words in a similar way. Here we are also doing semantic speech retrieval in the absence of transcriptions.…”
Section: E Abstractpotting and (Semantic) Speech Searchmentioning
confidence: 99%
“…Moreover, Harwath and Hazen utilized PLSA to represent the topics of a transcribed conversation, and then ranked words in the transcript based on topical similarity to the topics found in the conversation [6]. Similarly, Harwath et al extracted the keywords or key phrases of an audio file by directly applying PLSA on the links among audio frames obtained using segmental dynamic time warping, and then using mutual information measure for ranking the key concepts in the form of audio file snippets [28]. A semi-supervised latent concept classification algorithm was presented by Celikyilmaz and Hakkani-Tur using LDA topic modeling for multi-document information extraction [29].…”
Section: B Keyword Extraction Methodsmentioning
confidence: 99%
“…As a result, based on the characteristics of the documents, the acoustic patterns, the probabilities of observing the acoustic patterns given the latent topics, and the latent topic distribution for the spoken documents were jointly learned from the spoken archive. This approach has not yet been applied on semantic retrieval without ASR at the time of writing this article, but the experiments conducted on a set of telephone calls from the Fisher Corpus have demonstrated that the framework successfully provided a means of summarizing the topical structure of an spoken archive by extracting a small set of audio intervals which are actually instances of representative words or phrases for the discovered latent topics [296].…”
Section: F Semantic Retrieval Without Asrmentioning
confidence: 98%
“…With the topic models, for example, spoken documents can be expanded by acoustic patterns semantically related to its topics but originally not in the documents. The word-level acoustic patterns can also be discovered jointly with the latent topic models [296]. In this approach, segmental DTW mentioned in Section V-B was employed first to discover a set of audio intervals, and similar audio intervals very probably sharing the same underlying text transcription were linked together [234].…”
Section: F Semantic Retrieval Without Asrmentioning
confidence: 99%