Gregor Geigle scite author profile

Gregor Geigle

5Publications

59Citation Statements Received

202Citation Statements Given

How they've been cited

How they cite others

116

198

Affiliations

Technical University of Darmstadt

Publications

Order By: Most citations

AdapterDrop: On the Efficiency of Adapters in Transformers

Rücklé¹,

Geigle²,

Glockner³

et al. 2021

View full text Add to dashboard Cite

Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters. In this paper, we propose AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions. We show that Adap-terDrop can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances. We further prune adapters from AdapterFusion, which improves the inference efficiency while maintaining the task performances entirely. 1 We experiment with newer and older GPUs, Nvidia V100 and Titan X, respectively. See Appendix A.1 for details. 2 We include detailed plots in Appendix G.1.

show abstract

AdapterDrop: On the Efficiency of Adapters in Transformers

Rücklé¹,

Geigle²,

Glockner³

et al. 2020

Preprint

View full text Add to dashboard Cite

xGQA: Cross-Lingual Visual Question Answering

Pfeiffer¹,

Geigle²,

Kamath³

et al. 2022

View full text Add to dashboard Cite

Recent advances in multimodal vision and language modeling have predominantly focused on the English language, mostly due to the lack of multilingual multimodal datasets to steer modeling efforts. In this work, we address this gap and provide xGQA, a new multilingual evaluation benchmark for the visual question answering task. We extend the established English GQA dataset (Hudson and Manning, 2019) to 7 typologically diverse languages, enabling us to detect and explore crucial challenges in cross-lingual visual question answering. We further propose new adapter-based approaches to adapt multimodal transformer-based models to become multilingual, and-vice versa-multilingual models to become multimodal. Our proposed methods outperform current state-of-the-art multilingual multimodal models (e.g., M 3 P) in zeroshot cross-lingual settings, but the accuracy remains low across the board; a performance drop of around 38 accuracy points in target languages showcases the difficulty of zero-shot cross-lingual transfer for this task. Our results suggest that simple cross-lingual transfer of multimodal models yields latent multilingual multimodal misalignment, calling for more sophisticated methods for vision and multilingual language modeling. 1

show abstract

Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval

Geigle¹,

Pfeiffer²,

Reimers³

et al. 2021

Preprint

View full text Add to dashboard Cite

Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image. While offering unmatched retrieval performance, such models: 1) are typically pretrained from scratch and thus less scalable, 2) suffer from huge retrieval latency and inefficiency issues, which makes them impractical in realistic applications. To address these crucial gaps towards both improved and efficient cross-modal retrieval, we propose a novel fine-tuning framework which turns any pretrained textimage multi-modal model into an efficient retrieval model. The framework is based on a cooperative retrieve-and-rerank approach which combines: 1) twin networks to separately encode all items of a corpus, enabling efficient initial retrieval, and 2) a cross-encoder component for a more nuanced (i.e., smarter) ranking of the retrieved small set of items. We also propose to jointly fine-tune the two components with shared weights, yielding a more parameter-efficient model. Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.

show abstract

xGQA: Cross-Lingual Visual Question Answering

Pfeiffer¹,

Geigle²,

Kamath³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gregor Geigle

AdapterDrop: On the Efficiency of Adapters in Transformers

AdapterDrop: On the Efficiency of Adapters in Transformers

xGQA: Cross-Lingual Visual Question Answering

Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval

xGQA: Cross-Lingual Visual Question Answering

Contact Info

Product

Resources

About