1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings
DOI: 10.1109/asru.1997.659132
|View full text |Cite
|
Sign up to set email alerts
|

Topic extraction based on continuous speech recognition in broadcast-news speech

Abstract: This paper reports on topic extraction in Japanese broadcast-news speech. We studied, using continuous speech recognition, the extraction of several topic-words from broadcast-news. A combination of multiple topic-words represents the content of the news. This is more detailed and more flexible than a single word or a single category. A topic-extraction model shows the degree o f relevance between each topic-word and each word in the articles. For all walrds in a n article, topic-words which have high total re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 13 publications
0
7
0
Order By: Relevance
“…To compare scenes on a computer, the topic information for each scene must be converted to some kind of index. TF-IDF [2,4], the quantity of mutual information [5] considering TF-IDF, and the χ 2 value [9] are often used as indexes. Also, a method is proposed wherein text is converted to indexes, in which the dimensionality is independent of the vocabulary size, using latent semantic analysis (LSA) [10] of word and text matrices and/or latent semantic analysis [11] of co-occurrence matrices.…”
Section: Methods For Converting Scenes To Indexesmentioning
confidence: 99%
“…To compare scenes on a computer, the topic information for each scene must be converted to some kind of index. TF-IDF [2,4], the quantity of mutual information [5] considering TF-IDF, and the χ 2 value [9] are often used as indexes. Also, a method is proposed wherein text is converted to indexes, in which the dimensionality is independent of the vocabulary size, using latent semantic analysis (LSA) [10] of word and text matrices and/or latent semantic analysis [11] of co-occurrence matrices.…”
Section: Methods For Converting Scenes To Indexesmentioning
confidence: 99%
“…The system first extracts speech intervals via the audio segmentation process. The extracted speech intervals are then fed to the automatic speech recognition process [2]. The linguistic topic segmentation process next analyzes the resulting timestamped transcript.…”
Section: Introductionmentioning
confidence: 99%
“…An example is the classification of written documents such as newspaper articles [4]. Another type of classification is the classification of spoken documents such as news programs [5][6][7][8]. In most of these approaches, the relevance scores for descriptions and topics are calculated from the mutual information or TF-IDF for the word and the topic in the description.…”
Section: Introductionmentioning
confidence: 99%