2005
DOI: 10.1109/tsa.2005.852088
|View full text |Cite
|
Sign up to set email alerts
|

SpeechFind: advances in spoken document retrieval for a National Gallery of the Spoken Word

Abstract: Abstract-Advances in formulating spoken document retrieval for a new National Gallery of the Spoken Word (NGSW) are addressed. NGSW is the first large-scale repository of its kind, consisting of speeches, news broadcasts, and recordings from the 20th century. After presenting an overview of the audio stream content of the NGSW, with sample audio files from U.S. Presidents from 1893 to the present, an overall system diagram is proposed with a discussion of critical tasks associated with effective audio informat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
56
0

Year Published

2006
2006
2013
2013

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 78 publications
(56 citation statements)
references
References 64 publications
0
56
0
Order By: Relevance
“…(see Fig. 6), and the inclusion in complete retrieval systems such as Rough 'n' Ready [55] and SpeechFind [56] allow users to see the current speaker information, understand the general flow of speakers throughout the broadcast, or search for a particular speaker within the audio. Experiments are also underway to ascertain if additional tasks, such as the process of annotating data, can be facilitated using diarization output.…”
Section: Discussionmentioning
confidence: 99%
“…(see Fig. 6), and the inclusion in complete retrieval systems such as Rough 'n' Ready [55] and SpeechFind [56] allow users to see the current speaker information, understand the general flow of speakers throughout the broadcast, or search for a particular speaker within the audio. Experiments are also underway to ascertain if additional tasks, such as the process of annotating data, can be facilitated using diarization output.…”
Section: Discussionmentioning
confidence: 99%
“…Previous experiment demonstrates that under-segmentation, caused by a high number of miss detections, is more cumbersome to remedy than over-segmentation caused by a high number of false alarms [12], [13], [15], [16], [23], [40]. For example, over-segmentation could be alleviated by clustering and/or merging.…”
Section: B Mathematical Properties Of the Ig Distribution And Its Apmentioning
confidence: 99%
“…The window size is also set equal to r taking into consideration as many data as possible. When more data are available, more accurate Gaussian models are built, since BIC behaves better for large windows, whereas short changes are not easily detectable by BIC [12], [16]. Moreover, it was shown in [22], that the bigger the window size, the better the performance.…”
Section: Bic-based Speaker Segmentationmentioning
confidence: 99%
“…Having segmented speech regions, it is also often necessary to segment these further in terms of homogeneous speaker turns. In addition to improving ASR systems, speaker turn information can be helpful for speaker adaptation in rich transcription of videos and meetings (Bonastre et al, 2000) and for content based audio classification and retrieval (Hansen et al, 2005) which have a wide range of applications in the entertainment industry, audio archive management, surveillance, etc. Audio segmentation would also be an important tool in summarizing meetings, which has recently gained a lot of interest in the research community.…”
Section: Introductionmentioning
confidence: 99%