Proceedings of the Workshop on Speech and Natural Language - HLT '90 1990
DOI: 10.3115/116580.116614
|View full text |Cite
|
Sign up to set email alerts
|

Developing an evaluation methodology for spoken language systems

Abstract: There has been a long-standing methodology for evaluating work in speech recognition (SR), but until recently no community-wide methodology existed for either natural language (NL) researchers or speech understanding (SU) researchers for evaluating the systems they developed. Recently considerable progress has been made by a number of groups involved in the DARPA Spoken Language Systems (SLS) program to agree on a methodology for comparative evaluation of SLS systems, and that methodology is being used in prac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
18
0

Year Published

1992
1992
1998
1998

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(18 citation statements)
references
References 2 publications
0
18
0
Order By: Relevance
“…Annotation is done manually by a trained group of annotators. Once annotated, the data can be run repeatedly and answers can be scored automatically using the comparator program (11). This methodology has evolved over four evaluations.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Annotation is done manually by a trained group of annotators. Once annotated, the data can be run repeatedly and answers can be scored automatically using the comparator program (11). This methodology has evolved over four evaluations.…”
Section: Discussionmentioning
confidence: 99%
“…At the start of the Spoken Language Systems program in 1989, an accepted metric had evolved for speech recognition, namely word accuracy (10); however, no comparable metric was available for measuring understanding. Over the past 4 years, the research community has developed an understanding metric for database interface tasks, using either speech or typed input (4,11). To date, there is still no agreed upon metric for the rich multidimensional space of interactive systems, which includes the system's ability to communicate effectively with the user, as well as an ability to understand what the user is trying to accomplish.…”
mentioning
confidence: 99%
“…We have continued our strong participation in developing a methodology for common evaluation of spoken language systems, especially the evaluation of natural language understanding systems [15], [12], [14] For example, we made our database expertise available to I TI and helped them in their effort to produce a relational ATIS database from the original ATIS data obtained from OAG. We also helped in specifying various aspects of the Wizard data collection scenario and the performance evaluation process with NIST, including specification of general templates for descriptions of the CSR and NL systems submitted for evaluation.…”
Section: Bbn Report No 7715mentioning
confidence: 99%
“…For the first two years of the DARPA Spoken Language Program, common evaluation in the ATIS domain has been performed solely with the Common Answer Specification (CAS) protocol [4], whereby a system's performance is determined by comparing its output, expressed as a set of database tuples, with one or more predetermined reference answers [1]. The CAS protocol has the advantage that system evaluation can be carried out automatically, once the principles for generating the reference answers have been established and a corpus has been annotated accordingly.…”
Section: Introductionmentioning
confidence: 99%