2021
DOI: 10.48550/arxiv.2111.15366
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AI and the Everything in the Whole Wide World Benchmark

Abstract: There is a tendency across different subfields in AI to valorize a small collection of influential benchmarks. These benchmarks operate as stand-ins for a range of anointed common problems that are frequently framed as foundational milestones on the path towards flexible and generalizable AI systems. State-of-the-art performance on these benchmarks is widely understood as indicative of progress towards these long-term goals. In this position paper, we explore the limits of such benchmarks in order to reveal th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
26
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(27 citation statements)
references
References 52 publications
0
26
0
Order By: Relevance
“…Benchmarking is a way of evaluating and comparing new methods in ML for performance on a particular dataset [81]. Following Raji et al [85], we can understand a benchmark, for the purposes of this paper, as a dataset plus a metric for measuring the performance of a particular model on a specific task. For example, suppose that the current state of the art of image classification on ImageNet for top-1 accuracy is 85%.…”
Section: Benchmarking Ethical Decisionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Benchmarking is a way of evaluating and comparing new methods in ML for performance on a particular dataset [81]. Following Raji et al [85], we can understand a benchmark, for the purposes of this paper, as a dataset plus a metric for measuring the performance of a particular model on a specific task. For example, suppose that the current state of the art of image classification on ImageNet for top-1 accuracy is 85%.…”
Section: Benchmarking Ethical Decisionsmentioning
confidence: 99%
“…However, if the new method performs better than this benchmark, it is the best-performing algorithm to date (again, modulo efficiency, the volume of training data, etc.). By and large, state-of-the-art progress on certain benchmarks is typically taken to indicate progress on a particular task or set of tasks [85].…”
Section: Benchmarking Ethical Decisionsmentioning
confidence: 99%
“…The model can make fast inferences and predic-tions with less than 1 second required to successfully classify a heartbeat into its respective class. 3.Generality:This paper uses the definition of generality used by [19]. This defines generality as the ability of a model to provide a balanced or generalized approach towards classification within the scope of the application domain.…”
Section: Efficiencymentioning
confidence: 99%
“…Frameworks like Data Statements for Natural Language Processing (Bender & Friedman, 2018), The Dataset Nutrition Label (Holland et al, 2018), Model Cards for Model Reporting (Mitchell et al, 2019), Datasheets for Datasets (Gebru et al, 2021), Closing the AI accountability gap (Raji et al, 2021), The Ethical Pipeline for Healthcare Model Development (Chen et al, 2020), and The Clinician and Dataset Shift in Artificial Intelligence (Finlayson et al, 2021) offer guidance for breaking down the data generating process into relevant component parts to identify potential dimensions on which context shift could lead to performance changes from a test benchmark to production environments. Likewise, meta-frameworks offer guidance for ensuring data documentation frameworks are useful and actionable (Heger et al, 2022).…”
Section: Contextualizing the Benchmark-production Gapmentioning
confidence: 99%