Transformers Can Do Bayesian Inference

Müller, S.; Hollmann, Noah; Arango, Sebastian Pineda; Grabocka, Josif; Hutter, Frank

doi:10.48550/arxiv.2112.10510

Cited by 7 publications

(5 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“… 26 Unlike meta-learning, which is centered around enhancing the learning process, PFNs are pretrained on synthetic data to approximate Bayesian inference on new data. 43 Bayesian inference is a statistical method that allows for the quantification and management of uncertainty, a crucial feature in clinical prognosis scenarios. 44 The pretraining on synthetic data enables TabPFN to adeptly navigate complex patterns within real-world data, showcasing a nuanced approach to tabular data handling.…”

Section: Discussionmentioning

confidence: 99%

Advancing precision prognostication in neuro-oncology: Machine learning models for data-driven personalized survival predictions in IDH-wildtype glioblastoma

Karabacak,

Jagtiani,

et al. 2024

Neuro-Oncology Advances

View full text Add to dashboard Cite

Background Glioblastoma (GBM) remains associated with a dismal prognosis despite standard therapies. While population-level survival statistics are established, generating individualized prognosis remains challenging. We aim to develop machine learning (ML) models that generate personalized survival predictions for GBM patients to enhance prognostication. Methods Adult patients with histologically confirmed IDH-wildtype GBM from the National Cancer Database (NCDB) were analyzed. ML models were developed with TabPFN, TabNet, XGBoost, LightGBM, and Random Forest algorithms to predict mortality at 6, 12, 18, and 24 months post-diagnosis. SHapley Additive exPlanations (SHAP) were employed to enhance the interpretability of the models. Models were primarily evaluated using the area under the receiver operating characteristic (AUROC) values, and the top-performing models indicated by the highest AUROCs for each outcome were deployed in a web application that was created for individualized predictions. Results A total of 7,537 patients were retrieved from the NCDB. Performance evaluation revealed the top-performing models for each outcome were built using the TabPFN algorithm. The TabPFN models yielded mean AUROCs of 0.836, 0.78, 0.732, and 0.724 in predicting 6-, 12-, 18-, and 24-month mortality, respectively. Conclusions This study establishes ML models tailored to individual patients to enhance GBM prognostication. Future work should focus on external validation and dynamic updating as new data emerge.

show abstract

Section: Discussionmentioning

confidence: 99%

Advancing precision prognostication in neuro-oncology: Machine learning models for data-driven personalized survival predictions in IDH-wildtype glioblastoma

Karabacak,

Jagtiani,

et al. 2024

Neuro-Oncology Advances

View full text Add to dashboard Cite

show abstract

“…There is a long line of work investigating the capabilities [Vaswani et al, 2017, Dehghani et al, 2018, Yun et al, 2019, Pérez et al, 2019, Yao et al, 2021, Bhattamishra et al, 2020b, Zhang et al, 2022, limitations [Hahn, 2020, Bhattamishra et al, 2020a, applications [Lu et al, 2021a, Dosovitskiy et al, 2020, Parmar et al, 2018, and internal workings [Elhage et al, 2021, Snell et al, 2021, Weiss et al, 2021, Edelman et al, 2022, Olsson et al, 2022 of Transformer models. Most similar to our work, Müller et al [2021] introduce a "Prior-data fitted transformer network" that is trained to approximate Bayesian inference and generate predictions for downstream learning problems. However, while their focus is on performing Bayesian inference faster than existing methods (e.g., MCMC) and using their network for downstream tasks (with or without parameter fine-tuning), we focus on formalizing and understanding in-context learning through simple function classes.…”

Section: Related Workmentioning

confidence: 99%

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Garg¹,

Tsipras²,

Liang³

et al. 2022

Preprint

View full text Add to dashboard Cite

In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class? We show empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions-that is, the trained model is able to learn unseen linear functions from in-context examples with performance comparable to the optimal least squares estimator. In fact, in-context learning is possible even under two forms of distribution shift: (i) between the training data of the model and inference-time prompts, and (ii) between the in-context examples and the query input during inference. We also show that we can train Transformers to in-context learn more complex function classes-namely sparse linear functions, two-layer neural networks, and decision trees-with performance that matches or exceeds task-specific learning algorithms. 1 * Equal contribution. 1 Our code and models are available at https://github.com/dtsip/in-context-learning.

show abstract

“…Decision-Pretrained Transformer. DPT is an alternative approach inspired by the Bayesian inference approximation (Müller et al, 2021). Unlike AD, it trains a transformer to predict the optimal action for a query state given a random, task specific, context.…”

Section: In-context Reinforcement Learningmentioning

confidence: 99%