Robust distant-talking speech recognition

Pearson, John C.; Lin, Qing; Che, ChiWei; Yuk, Dong-Suk; Jin, L.; DeVries, Byron; Flanagan, James L.

doi:10.1109/icassp.1996.540280

Cited by 19 publications

(5 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Expected results which have also been earlier observed in the literature [13,3] such as model level adaptation improves performance. Additionally we note that adaptation has the greatest impact where there is the greatest mismatch between train and test conditions (ie.…”

Section: Mapping Of Higher Order Mfccssupporting

confidence: 62%

Neural network based regression for robust overlapping speech recognition using microphone arrays

Liu¹,

Dines²,

Magimai.-Doss³

et al. 2008

Interspeech 2008

View full text Add to dashboard Cite

Section: Mapping Of Higher Order Mfccssupporting

confidence: 62%

Neural network based regression for robust overlapping speech recognition using microphone arrays

Liu¹,

Dines²,

Magimai.-Doss³

et al. 2008

Interspeech 2008

View full text Add to dashboard Cite

“…One concept of the user station is shown in Fig 3(a) and its laboratory implementation is shown in Fig 3(b). In this case, a line-array microphone atop the monitor is fixed in focus on the user position and it activates a large-vocabulary speech recognizer resident in the local station (Lin et al, 1996). Software for the language model of the recognizer, for context analysis, for dialog generation and for text-to-speech synthesis output is also incorporated.…”

Section: Machine-mediated Communicationmentioning

confidence: 99%

Speech-centric multimodal interfaces

Flanagan¹

2004

IEEE Signal Process. Mag.

View full text Add to dashboard Cite

Benefiting from knowledge of speech, language and hearing ─ accumulated by many researchers over nearly a century ─ new technology is beginning to serve users of complex information systems. This technology aims for a natural communication environment ─ capturing attributes that humans favor in face-to-face exchange. Ideally the environment provides three-dimensional spatial realism in the sensory dimensions of sight, sound and touch. Conversational interaction bears a central burden, with visual and manual signaling simultaneously supplementing the communication process. Current research therefore addresses multimodal interfaces that can transcend the limitations of mouse and keyboard. In addition to instrumenting sensors for each mode, the interface must incorporate context-aware algorithms for fusing and interpreting multiple sensory channels. The ultimate objective is a reliable estimate of user intent, from which actionable responses can be made. This report indicates the currently-early status of multimodal interfaces and identifies emerging opportunities for enhanced usability and naturalness. It concludes by advocating focused research on a frontier issue ─ the formulation of a quantitative language framework for multimodal communication. PERSPECTIVE Over the past few years, society has enjoyed exceptional gains in productivity. A large part of this advance is owing to the benefits of information technology-computing, networking, and software. Processors monotonically achieve greater speeds as costs decline, and broadband transport capacity becomes pervasive. A central issue is how to employ these advantages to maintain the hard-won momentum in productivity. This report argues that an appropriate employment of advanced computing and networking is towards creating a communication environment for human users that is as natural and habitable as face-to-face information exchange. Implied is the (presently unrealistic) ideal of threedimensional spatial realism.

show abstract

“…Microphone arrays with an adaptive filter are a primary method for noise and reverberation removal in a car environment (Chien, Lai, 2005). A similar multi-microphone system supported by the neural network, designed and tested for robust ASR, was presented (Pearson et al, 1996). Different methods have been designed to attenuate interference over a wide frequency range (Ward et al, 2001), or comprise an additional speech enhancement method (Shi, Aarabi, 2003;Seltzer et al, 2004).…”

Section: Dereverberation Methodsmentioning

confidence: 99%

Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Kundegorski¹,

Jackson²,

Ziółko³

2015

Archives of Acoustics

View full text Add to dashboard Cite

Reverberation is a common problem for many speech technologies, such as automatic speech recognition (ASR) systems. This paper investigates the novel combination of precedence, binaural and statistical independence cues for enhancing reverberant speech, prior to ASR, under these adverse acoustical conditions when two microphone signals are available. Results of the enhancement are evaluated in terms of relevant signal measures and accuracy for both English and Polish ASR tasks. These show inconsistencies between the signal and recognition measures, although in recognition the proposed method consistently outperforms all other combinations and the spectral-subtraction baseline.

show abstract

Robust distant-talking speech recognition

Cited by 19 publications

References 6 publications

Neural network based regression for robust overlapping speech recognition using microphone arrays

Neural network based regression for robust overlapping speech recognition using microphone arrays

Speech-centric multimodal interfaces

Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Contact Info

Product

Resources

About