In collaboration with colleagues at UW, OGI, IBM, and SRI, we are developing technology to process spoken language from informal meetings. The work includes a substantial data collection and transcription effort, and has required a nontrivial degree of infrastructure development. We are undertaking this because the new task area provides a significant challenge to current HLT capabilities, while offering the promise of a wide range of potential applications. In this paper, we give our vision of the task, the challenges it represents, and the current state of our development, with particular attention to automatic transcription. D: There's a-there are-there's a whole bunch of tools J: Yes. / D: web page, where they have a listing. D: like 10 of them or something. J: Are you speaking about Mississippi State per se? or D: No no no, there's some .. I mean, there just-there arethere are a lot of / J: Yeah. J: Actually, I wanted to mention-/ D: (??) J: There are two projects, which are .. international .. huge projects focused on this kind of thing, actually .. one of them's MATE, one of them's EAGLES .. and um. D: Oh, EAGLES. D: (??) / J: And both of them have J: You know, I shou-, I know you know about the big book. E: Yeah. J: I think you got it as a prize or something. E: Yeah. / D: Mhm. J: Got a surprise. flaughg f J. thought "as a prize" sounded like "surprise"g Note that interruptions are quite frequent; this is, in our experience, quite common in informal meetings, as is acoustic overlap
The Mel-Frequency Cepstral Coefficient (MFCC) or Perceptual Linear Prediction (PLP) feature extraction typically used for automatic speech recognition (ASR) employ several principles which have known counterparts in the cochlea and auditory nerve: frequency decomposition, mel-or bark-warping of the frequency axis, and compression of amplitudes. It seems natural to ask if one can profitably employ a counterpart of the next physiological processing step, synaptic adaptation. We therefore incorporated a simplified model of short-term adaptation into MFCC feature extraction. We evaluated the resulting ASR performance on the AURORA 2 and AURORA 3 tasks, in comparison to ordinary MFCCs, MFCCs processed by RASTA, and MFCCs processed by cepstral mean subtraction (CMS), and both in comparison to and in combination with Wiener filtering. The results suggest that our approach offers a simple, causal robustness strategy which is competitive with RASTA, CMS and Wiener filtering and performs well in combination with Wiener filtering. Compared to the structurally related RASTA, our adaptation model provides superior performance on AURORA 2 and, if Wiener filtering is used prior to both approaches, on AURORA 3 as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.