Machine Learning for Engineering Processes

Koch, Christian

doi:10.1007/978-3-030-20482-2_26

Cited by 6 publications

(5 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With this objective in mind, we recognize the history and trajectory of AI benchmarking aligns with institutional privilege. 63 Benchmarks set the agenda and orient progress: we should aspire for holistic, pluralistic, and democratic benchmarks. 3 Given the understated but significant power of benchmarks to drive change, which in turn indicates that benchmark design confers power, we foreground our objectives for HELM along with its limitations.…”

Section: Discussionmentioning

confidence: 99%

Holistic Evaluation of Language Models

Bommasani

Liang

Lee

2023

Annals of the New York Academy of Sciences

View full text Add to dashboard Cite

Language models (LMs) like GPT‐3, PaLM, and ChatGPT are the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of LMs. LMs can serve many purposes and their behavior should satisfy many desiderata. To navigate the vast space of potential scenarios and metrics, we taxonomize the space and select representative subsets. We evaluate models on 16 core scenarios and 7 metrics, exposing important trade‐offs. We supplement our core evaluation with seven targeted evaluations to deeply analyze specific aspects (including world knowledge, reasoning, regurgitation of copyrighted content, and generation of disinformation). We benchmark 30 LMs, from OpenAI, Microsoft, Google, Meta, Cohere, AI21 Labs, and others. Prior to HELM, models were evaluated on just 17.9% of the core HELM scenarios, with some prominent models not sharing a single scenario in common. We improve this to 96.0%: all 30 models are now benchmarked under the same standardized conditions. Our evaluation surfaces 25 top‐level findings. For full transparency, we release all raw model prompts and completions publicly. HELM is a living benchmark for the community, continuously updated with new scenarios, metrics, and models https://crfm.stanford.edu/helm/latest/.

show abstract

Section: Discussionmentioning

confidence: 99%

Holistic Evaluation of Language Models

Bommasani

Liang

Lee

2023

Annals of the New York Academy of Sciences

View full text Add to dashboard Cite

show abstract

“…Even when they are reused, however, they can be misused. For example, the majority of benchmark datasets are appropriated to address ML tasks that differ from what they originally intended to solve (Koch, Denton, Hanna, & Foster, 2021). Moreover, the most widely used benchmarks in use originate from elite and primarily Western institutions, such as Stanford, Microsoft, and Princeton.…”

Section: Wrangling With Ghostsmentioning

confidence: 99%

Munging the Ghosts in the Machine: Coded Bias and the Craft of Wrangling Archival Data

Yung,

Colyvas

2024

Proceedings

View full text Add to dashboard Cite

“…When creating an artifact based on the RRR principle [8], one selects an object of interest for reuse, then reduces its complexity, and finally recycles parts of it for some innovative modification. Adopting the RRR principle, our suggestion to arrive at a high-quality neural network for image classification is as follows:…”

Section: The Rrr Principle For Image Classificationmentioning

confidence: 99%

“…There is evidence that low-complexity models can, in some conditions, lead to comparably good or better performance [7]. Our charter is to elaborate on the basic "Reuse" methodology described above (replacing the last layer with a new classifier) by applying the three classical resource-saving "precepts" to the greatest possible extent: "Reuse, Reduce, and Recycle" [8]. For simplicity of demonstration, we limit ourselves to the popular ResNet152 model [4], which provides us with enough flexibility to carry out our analyses.…”

mentioning

confidence: 99%

RRR-Net: Reusing, Reducing, and Recycling a Deep Backbone Network

Sun

Guyon

Mohr

et al. 2023

2023 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

It has become mainstream in computer vision and other machine learning domains to reuse backbone networks pre-trained on large datasets as preprocessors. Typically, the last layer is replaced by a shallow learning machine of sorts; the newly-added classification head and (optionally) deeper layers are fine-tuned on a new task. Due to its strong performance and simplicity, a common pre-trained backbone network is ResNet152. However, ResNet152 is relatively large and induces inference latency. In many cases, a compact and efficient backbone with similar performance would be preferable over a larger, slower one. This paper investigates techniques to reuse a pre-trained backbone with the objective of creating a smaller and faster model. Starting from a large ResNet152 backbone pre-trained on ImageNet, we first reduce it from 51 blocks to 5 blocks, reducing its number of parameters and FLOPs by more than 6 times, without significant performance degradation. Then, we split the model after 3 blocks into several branches, while preserving the same number of parameters and FLOPs, to create an ensemble of sub-networks to improve performance. Our experiments on a large benchmark of 40 image classification datasets from various domains suggest that our techniques match the performance (if not better) of "classical backbone fine-tuning" while achieving a smaller model size and faster inference speed. Background and motivationsOver the last decade, Deep Learning has set new standards in computer vision. Tasks in this area include the recognition of street signs, placards, and living beings. While it has achieved state-of-the-art in various academic and industrial fields, training deep networks from scratch requires massive amounts of data and hours of GPU training, which prevents it from being deployed in data-scarce and resource-scarce scenarios.This limitation has been mainly addressed through the notion of Transfer learning [1]. Here, knowledge is transferred from a source domain (typically learned from a large dataset) to one or several target domains (typically with less available data). A common transfer learning approach is last-layer fine-tuning [2], in which a considerable part (the backbone) of a pre-trained deep network is reused; only the last layer is replaced with a new classifier and trained to the new task at hand. Depending on the distribution shift between the source domain and target domains, more layers may be fine-tuned. Pre-trained networks that have been used for fine-tuning range from the historical AlexNet [3] to various ResNets [4,5].Modern neural networks are thought of as being "the bigger, the better" as big networks keep beating large benchmarks (such as ImageNet [6]). However, they are considerably over-parameterized when applied to smaller tasks. There is evidence that low-complexity models can, in some conditions, lead to comparably good or better performance [7]. Our charter is to elaborate on the basic "Reuse" methodology described above (replacing the last layer with a new classifier) by app...

show abstract

Machine Learning for Engineering Processes

Cited by 6 publications

References 22 publications

Holistic Evaluation of Language Models

Holistic Evaluation of Language Models

Munging the Ghosts in the Machine: Coded Bias and the Craft of Wrangling Archival Data

RRR-Net: Reusing, Reducing, and Recycling a Deep Backbone Network

Contact Info

Product

Resources

About