Iulia Turc scite author profile

Iulia Turc

5Publications

69Citation Statements Received

71Citation Statements Given

How they've been cited

118

How they cite others

Affiliations

Peoria Hospital, Google (United States)

Publications

Order By: Most citations

Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Clark

Garrette

Turc

et al. 2022

View full text Add to dashboard Cite

Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all languages, and the use of any fixed vocabulary may limit a model’s ability to adapt. In this paper, we present Canine, a neural encoder that operates directly on character sequences—without explicit tokenization or vocabulary—and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias. To use its finer-grained input effectively and efficiently, Canine combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context. Canine outperforms a comparable mBert model by 5.7 F1 on TyDi QA, a challenging multilingual benchmark, despite having fewer model parameters.

show abstract

The MultiBERTs: BERT Reproductions for Robustness Analysis

Sellam¹,

Yadlowsky²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

Measuring Attribution in Natural Language Generation Models

Rashkin¹,

Николаев²,

Lamm³

et al. 2021

Preprint

View full text Add to dashboard Cite

With recent improvements in natural language generation (NLG) models for various applications, it has become imperative to have the means to identify and evaluate whether NLG output is only sharing verifiable information about the external world. In this work, we present a new evaluation framework entitled Attributable to Identified Sources (AIS) for assessing the output of natural language generation models, when such output pertains to the external world. We first define AIS and introduce a twostage annotation pipeline for allowing annotators to appropriately evaluate model output according to AIS guidelines. We empirically validate this approach on three generation datasets (two in the conversational QA domain and one in summarization) via human evaluation studies that suggest that AIS could serve as a common framework for measuring whether model-generated statements are supported by underlying sources. We release guidelines for the human evaluation studies. * Equal contribution. All authors contributed to all parts of the paper. ♠ Led development of the conceptual framework. ♣ Led human annotation study. ♦ Contributed to modeling experiments. ♥ Provided project leadership and management.

show abstract

High Performance Natural Language Processing

Ilharco¹,

Ilharco²,

Turc³

et al. 2020

View full text Add to dashboard Cite

Scale has played a central role in the rapid progress natural language processing has enjoyed in recent years. While benchmarks are dominated by ever larger models, efficient hardware use is critical for their widespread adoption and further progress in the field. In this cutting-edge tutorial, we will recapitulate the state-of-the-art in natural language processing with scale in perspective. After establishing these foundations, we will cover a wide range of techniques for improving efficiency, including knowledge distillation, quantization, pruning, more efficient architectures, along with case studies and practical implementation tricks.

show abstract

Measuring Attribution in Natural Language Generation Models

Rashkin

Николаев

Lamm

et al. 2023

View full text Add to dashboard Cite

Large neural models have brought a new challenge to natural language generation (NLG): it has become imperative to ensure the safety and reliability of the output of models that generate freely. To this end, we present an evaluation framework, Attributable to Identified Sources (AIS), stipulating that NLG output pertaining to the external world is to be verified against an independent, provided source. We define AIS and a two-stage annotation pipeline for allowing annotators to evaluate model output according to annotation guidelines. We successfully validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset). We provide full annotation guidelines in the appendices and publicly release the annotated data at https://github.com/google-research-datasets/AIS.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Iulia Turc

Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

The MultiBERTs: BERT Reproductions for Robustness Analysis

Measuring Attribution in Natural Language Generation Models

High Performance Natural Language Processing

Measuring Attribution in Natural Language Generation Models

Contact Info

Product

Resources

About