Performance Evaluation Techniques for an Automatic Question Answering System

Gunawardena, Tilani; Pathirana, Nishara; Lokuhetti, Medhavi; Ragel, Roshan; Deegalla, Sampath

doi:10.7763/ijmlc.2015.v5.523

Cited by 5 publications

(2 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Automatic evaluation for QA was addressed by Magnini et al (2002) and also for multiple sub-domain QA systems (Leidner and Callison-Burch, 2003;Lin and Demner-Fushman, 2006;Shah and Pomerantz, 2010;Gunawardena et al, 2015). However, little progress has been made in the past two decades towards obtaining a standard method.…”

Section: Related Workmentioning

confidence: 99%

AVA: an Automatic eValuation Approach for Question Answering Systems

Vu¹,

Moschitti²

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

We introduce AVA, an automatic evaluation approach for Question Answering, which given a set of questions associated with Gold Standard answers (references), can estimate system Accuracy. AVA uses Transformer-based language models to encode question, answer, and reference texts. This allows for effectively assessing answer correctness using similarity between the reference and an automatic answer, biased towards the question semantics. To design, train, and test AVA, we built multiple large training, development, and test sets on public and industrial benchmarks. Our innovative solutions achieve up to 74.7% F1 score in predicting human judgment for single answers. Additionally, AVA can be used to evaluate the overall system Accuracy with an error lower than 7% at 95% of confidence when measured on several QA systems.

show abstract

Section: Related Workmentioning

confidence: 99%

AVA: an Automatic eValuation Approach for Question Answering Systems

Vu¹,

Moschitti²

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…QA has recently been used to evaluate a summarization task (Eyal et al, 2019). Automatic evaluation for QA was addressed by Magnini et al (2002) and also for multiple subdomain QA systems (Leidner and Callison-Burch, 2003;Lin and Demner-Fushman, 2006;Shah and Pomerantz, 2010;Gunawardena et al, 2015). However, little progress has been made in the past two decades towards obtaining a standard method.…”

Section: Related Workmentioning

confidence: 99%

AVA: an Automatic eValuation Approach to Question Answering Systems

Vu¹,

Moschitti²

2020

Preprint

View full text Add to dashboard Cite

We introduce AVA, an automatic evaluation approach for Question Answering, which given a set of questions associated with Gold Standard answers, can estimate system Accuracy. AVA uses Transformer-based language models to encode question, answer, and reference text. This allows for effectively measuring the similarity between the reference and an automatic answer, biased towards the question semantics. To design, train and test AVA, we built multiple large training, development, and test sets on both public and industrial benchmarks. Our innovative solutions achieve up to 74.7% in F1 score in predicting human judgement for single answers. Additionally, AVA can be used to evaluate the overall system Accuracy with an RMSE, ranging from 0.02 to 0.09, depending on the availability of multiple references.

show abstract