Developing an evaluation methodology for spoken language systems

Bates, Madeleine; Boisen, Sean; Makhoul, John

doi:10.3115/116580.116614

Cited by 22 publications

(18 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Annotation is done manually by a trained group of annotators. Once annotated, the data can be run repeatedly and answers can be scored automatically using the comparator program (11). This methodology has evolved over four evaluations.…”

Section: Discussionmentioning

confidence: 99%

“…At the start of the Spoken Language Systems program in 1989, an accepted metric had evolved for speech recognition, namely word accuracy (10); however, no comparable metric was available for measuring understanding. Over the past 4 years, the research community has developed an understanding metric for database interface tasks, using either speech or typed input (4,11). To date, there is still no agreed upon metric for the rich multidimensional space of interactive systems, which includes the system's ability to communicate effectively with the user, as well as an ability to understand what the user is trying to accomplish.…”

mentioning

confidence: 99%

See 1 more Smart Citation

The roles of language processing in a spoken language interface.

Hirschman¹

1995

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

This paper provides an overview of the colloquium's discussion session on natural language understanding, which followed presentations by M. Bates [Bates, M. (1995) Proc This paper provides an overview of the natural language understanding session at the Colloquium on HumanMachine Communication by Voice held by the National Academy of Sciences (NAS). The aim of the paper is to review the role that language understanding plays in spoken language systems and to summarize the discussion that followed the two presentations by Bates (1) and Moore (2). A number of questions were raised during the discussion, including whether a single system could provide both understanding and constraint, what the future role of discourse should be, how to evaluate performance on interactive systems, and whether we are moving in the right direction toward realizing the goal of interactive human-machine communication.Background: The ARPA Spoken Language Program

show abstract

Section: Discussionmentioning

confidence: 99%

mentioning

confidence: 99%

The roles of language processing in a spoken language interface.

Hirschman¹

1995

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

“…We have continued our strong participation in developing a methodology for common evaluation of spoken language systems, especially the evaluation of natural language understanding systems [15], [12], [14] For example, we made our database expertise available to I TI and helped them in their effort to produce a relational ATIS database from the original ATIS data obtained from OAG. We also helped in specifying various aspects of the Wizard data collection scenario and the performance evaluation process with NIST, including specification of general templates for descriptions of the CSR and NL systems submitted for evaluation.…”

Section: Bbn Report No 7715mentioning

confidence: 99%

Development of a spoken language system

Makhoul¹,

Bates²

1992

Proceedings of the Workshop on Speech and Natural Language - HLT '91

Self Cite

View full text Add to dashboard Cite

“…For the first two years of the DARPA Spoken Language Program, common evaluation in the ATIS domain has been performed solely with the Common Answer Specification (CAS) protocol [4], whereby a system's performance is determined by comparing its output, expressed as a set of database tuples, with one or more predetermined reference answers [1]. The CAS protocol has the advantage that system evaluation can be carried out automatically, once the principles for generating the reference answers have been established and a corpus has been annotated accordingly.…”

Section: Introductionmentioning

confidence: 99%

Experiments in evaluating interactive spoken language systems

Polifroni

Hirschman

Seneff

et al. 1992

Proceedings of the Workshop on Speech and Natural Language - HLT '91

View full text Add to dashboard Cite

As the DARPA spoken language community moves towards developing useful systems for interactive problem solving, we must explore alternative evaluation procedures that measure whether these systems aid people in solving problems within the task domain. In this paper, we describe several experiments exploring new evaluation procedures. To look at end-to-end evaluation, we modified our data collection procedure slightly in order to experiment with several objective task completion measures. We found that the task completion time is well correlated with the number of queries used. We also explored log file evaluation, where evaluators were asked to judge the clarity of the query and the correctness of the response based on examination of the log file. Our results show that seven evaluators were unanimous on more than 80% of the queries, and that at least 6 out of 7 evaluators agreed over 90% of the time. Finally, we applied these new procedures to compare two systems, one system requiring a complete parse and the other using the more flexible robust parsing mechanism. We found that these metrics could distinguish between these systems: there were significant differences in ability to complete the task, number of queries required to complete the task, and score (as computed through a log file evaluation) between the robust and the non-robust modes.

show abstract

Developing an evaluation methodology for spoken language systems

Cited by 22 publications

References 2 publications

The roles of language processing in a spoken language interface.

The roles of language processing in a spoken language interface.

Development of a spoken language system

Experiments in evaluating interactive spoken language systems

Contact Info

Product

Resources

About