Foundation Transformers

Wang, Hongyu; Ma, Shuming; Huang, Shaohan; Liu, Dong; Wang, Wenhui; ZhiLiang, Peng; Wu, Yu; Bajaj, Payal; Singhal, Saksham; Benhaim, Alon; Patra, Barun; Liu, Zhun; Chaudhary, Vishrav; Song, Xinshan; Wei, Furu

doi:10.48550/arxiv.2210.06423

Cited by 2 publications

(2 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…BEiT-3 [14] is a general-purpose multimodal foundation model based on Magneto [15]. And it can be used by importing TorchScale [16], which is an open-source toolkit that enables scaling Transformers both efficiently and effectively.…”

Section: Introductionmentioning

confidence: 99%

Refine neutrino events reconstruction with BEiT-3

Li,

Cai,

Jiang

2024

J. Inst.

View full text Add to dashboard Cite

Neutrino Events Reconstruction has always been crucial for IceCube Neutrino Observatory. In the Kaggle competition “IceCube — Neutrinos in Deep Ice”, many solutions use Transformer. We present ISeeCube, a pure Transformer model based on TorchScale (the backbone of BEiT-3). When having relatively same amount of total trainable parameters, our model outperforms the 2nd place solution. By using TorchScale, the lines of code drop sharply by about 80% and a lot of new methods can be tested by simply adjusting configs. We compared two fundamental models for predictions on a continuous space, regression and classification, trained with MSE Loss and CE Loss respectively. We also propose a new metric, overlap ratio, to evaluate the performance of the model. Since the model is simple enough, it has the potential to be used for more purposes such as energy reconstruction, and many new methods such as combining it with GraphNeT can be tested more easily. The code and pretrained models are available at https://github.com/ChenLi2049/ISeeCube.

show abstract

Section: Introductionmentioning

confidence: 99%

Refine neutrino events reconstruction with BEiT-3

Li,

Cai,

Jiang

2024

J. Inst.

View full text Add to dashboard Cite

show abstract

“…However, the novel aspect of GPT scaling [18][19] suggests that the current generators do not need preparation for such challenges. They perform as zero-shot [20] or few-shot [21] models that generate meaningful answers from prompts without specialization or fine-tuning cycles.…”

Section: Introductionmentioning

confidence: 99%

Large Language Models for Ciphers

Noever¹

2023

IJAIA

View full text Add to dashboard Cite

This study investigates whether transformer models like ChatGPT (GPT4, MAR2023) can generalize beyond their training data by examining their performance on the novel Cipher Dataset, which scrambles token order. The dataset consists of 654 test cases, and the analysis focuses on 51 text examples and 13 algorithmic choices. Results show that the models perform well on low-difficulty ciphers like Caesar and can unscramble tokens in 77% of the cipher examples. Despite their reliance on training data, the model's ability to generalize outside of token order is surprising, especially when leveraging large-scale models with hundreds of billions of weights and a comprehensive text corpus with few examples. The original contributions of the work focus on presenting a cipher challenge dataset and then scoring historically significant ciphers for large language models to descramble. The real challenge for these generational models lies in executing the complex algorithmic steps on new cipher inputs, potentially as a novel reasoning challenge that relies less on knowledge acquisition and more on trial-and-error or out-ofbounds responses.

show abstract

Foundation Transformers

Cited by 2 publications

References 16 publications

Refine neutrino events reconstruction with BEiT-3

Refine neutrino events reconstruction with BEiT-3

Large Language Models for Ciphers

Contact Info

Product

Resources

About