Automatic speech recognition and its application to information extraction

Furu, Sadaoki

doi:10.3115/1034678.1034680

Cited by 12 publications

(5 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although, the process is indeed automatic, transcripts remain very expensive to produce, both in terms of time (generating transcripts is often performed in dozen multiple of real-time recordings, the higher the accuracy, the higher the cost) and computer processing power. In addition, acceptable results can not be guaranteed, as inappropriate language models, poor recording conditions, individual speakers' accents etc, can cause dramatic reductions in the recognition rates [7]. This is clearly illustrated in Fig.…”

Section: Using Speech Transcriptsmentioning

confidence: 96%

An analytical evaluation of search by content and interaction patterns on multimodal meeting records

Bouamrane

Luz

2007

Multimedia Systems

View full text Add to dashboard Cite

It has been suggested that combining contentbased indexing with automatically generated temporal metadata might help improve search and browsing of recordings of computer-mediated collaborative activities such as on-line meetings, which are characterised by extensive multimodal communication. This paper presents an analytical evaluation of the effectiveness of these techniques as implemented through automatic speech recognition and temporal mapping. In particular, it assesses the extent to which this strategy can help uncover contextual relationships between audio and text segments in recorded remote meetings. Results show that even simple temporal mapping can effectively support retrieval of recorded audio segments, improve retrieval performance in situations where speech recognition alone would have exhibited prohibitively high word error rates, and provide a basic form of semantic adaptation.

show abstract

Section: Using Speech Transcriptsmentioning

confidence: 96%

An analytical evaluation of search by content and interaction patterns on multimodal meeting records

Bouamrane

Luz

2007

Multimedia Systems

View full text Add to dashboard Cite

show abstract

“…Unconstrained LVCSR is a difficult task for a number of reasons including speech disfluencies in spontaneous dialogues, lack of word or sentence boundaries, poor recording conditions, crosstalk, inappropriate language models, out-of-vocabulary items and variations in accent and pronunciation. These conditions combined can cause substantial decreases in recognition rates [21]. Speech recognition is the task of automatically identifying a sequence of spoken words according to the speech signal [52,79,36].…”

Section: Automatic Speech Recognitionmentioning

confidence: 99%

Meeting browsing

Bouamrane

Luz

2006

Multimedia Systems

View full text Add to dashboard Cite

Meeting, to discuss and share information, take decisions and allocate tasks, is a central aspect of human activity. Computer mediated communication offers enhanced possibilities for synchronous collaboration by allowing seamless capture of meetings, thus relieving participants from time-consuming documentation tasks. However, in order for meeting systems to be truly effective, they must allow users to efficiently navigate and retrieve information of interest from recorded meetings. In this article, we review the state of the art in multimedia segmentation, indexing and browsing techniques and show how existing meeting browser systems build on these techniques and integrate various modalities to meet their users' information needs.

show abstract

“…The speech-to-text automation of nursing records can lighten the burden of administrative work. Although automatic speech recognition in the medical domain was first reported in the 1980s [13], all subsequent studies up to 1999 tested the transcription of single words as opposed to continuous speech in this context [14]. In recent years, a few studies have been conducted on speech recognition in the medical domain in terms of the word error rate (WER) [15][16][17].…”

Section: Introductionmentioning

confidence: 99%

Code-Switching Automatic Speech Recognition for Nursing Record Documentation: System Development and Evaluation

Hou¹,

Chen²,

Chang³

et al. 2022

JMIR Nursing

View full text Add to dashboard Cite

Background Taiwan has insufficient nursing resources due to the high turnover rate of health care providers. Therefore, reducing the heavy workload of these employees is essential. Herein, speech transcription, which has various potential clinical applications, was employed for the documentation of nursing records. The requirement of including only one speaker per transcription facilitated data collection and system development. Moreover, authorization from patients was unnecessary. Objective The aim of this study was to construct a speech recognition system for nursing records such that health care providers can complete nursing records without typing or with only a few edits. Methods Nursing records in Taiwan are mainly written in Mandarin, with technical terms and abbreviations presented in both Mandarin and English. Therefore, the training set consisted of English code-switching information. Next, transfer learning (TL) and meta-TL (MTL) methods, which perform favorably in code-switching scenarios, were applied. Results As of September 2021, the China Medical University Hospital Artificial Intelligence Speech (CMaiSpeech) data set was established by manually annotating approximately 100 hours of recordings from 525 speakers. The word error rate (WER) of the benchmark model of syllable-based TL was 29.54% in code-switching. The WER of the proposed model of syllable-based MTL was 22.20% in code-switching. The test set comprised 17,247 words. Moreover, in a clinical case, the proposed model of syllable-based MTL yielded a WER of 31.06% in code-switching. The clinical test set contained 1159 words. Conclusions This paper has two main contributions. First, the CMaiSpeech data set—a Mandarin-English corpus—has been established. Health care providers in Taiwan are often compelled to use a mixture of Mandarin and English in nursing records. Second, an automatic speech recognition system for nursing record document conversion was proposed. The proposed system can shorten the work handover time and further reduce the workload of health care providers.

show abstract

Automatic speech recognition and its application to information extraction

Abstract: This paper describes recent progress and the author's perspectives of speech recognition technology. Applications of speech recognition technology can be classified into two main areas, dictation and human-computer dialogue systems.

Cited by 12 publications

References 13 publications

An analytical evaluation of search by content and interaction patterns on multimodal meeting records

An analytical evaluation of search by content and interaction patterns on multimodal meeting records

Meeting browsing

Code-Switching Automatic Speech Recognition for Nursing Record Documentation: System Development and Evaluation

Contact Info

Product

Resources

About