2016
DOI: 10.1007/s10462-016-9505-7
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement

Abstract: The evaluation of artificial intelligence systems and components is crucial for the progress of the discipline. In this paper we describe and critically assess the different ways AI systems are evaluated, and the role of components and techniques in these systems. We first focus on the traditional task-oriented evaluation approach. We identify three kinds of evaluation: human discrimination, problem benchmarks and peer confrontation. We describe some of the limitations of the many evaluation schemes and compet… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
84
0
2

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 107 publications
(86 citation statements)
references
References 181 publications
(196 reference statements)
0
84
0
2
Order By: Relevance
“…For a final outlook towards the impact of especially future AI developments [82][83][84][85][86][87][88][89] in the supply chain and business domain a metaphor might be adapted: Bostrom started his 2014 book about superintelligence-hence AI-with a fable of sparrows looking out for an owl, symbolizing humans on the verge of developing AI instances (see [14], p. v). However, this unfinished fable lacks three important features: First, sparrows and owls are-though quite different-still both quite similar bird species.…”
Section: Discussionmentioning
confidence: 99%
“…For a final outlook towards the impact of especially future AI developments [82][83][84][85][86][87][88][89] in the supply chain and business domain a metaphor might be adapted: Bostrom started his 2014 book about superintelligence-hence AI-with a fable of sparrows looking out for an owl, symbolizing humans on the verge of developing AI instances (see [14], p. v). However, this unfinished fable lacks three important features: First, sparrows and owls are-though quite different-still both quite similar bird species.…”
Section: Discussionmentioning
confidence: 99%
“…The categories and overlaps between problems could be assessed via theoretical models, instead of using factor analysis as in psychometrics. In other words, a theoretical alternative to the classification of mental abilities should be endeavoured (see Hernández-Orallo, 2016;Hernández-Orallo, 2017].…”
Section: Discussionmentioning
confidence: 99%
“…One way or the other, there seems to be an agreement that there will be an increasing number of machines in the near future which show a range of cognitive abilities, and that we will require evaluation mechanisms for them [Hernández-Orallo, 2017]. These mechanisms will have to give scores for several cognitive abilities so that we can compare them with humans and other animals.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Actually, some of these platforms can integrate any task and hence in principle they supersede many existing AI benchmarks [2] in their aim to test "general problem solving ability".…”
Section: Ai Experimentation and Evaluationmentioning
confidence: 99%