Recently, pre-trained language models (LMs) have achieved strong performance when finetuned on difficult benchmarks like Super-GLUE. However, performance can suffer when there are very few labeled examples available for fine-tuning. Pattern Exploiting Training (PET) is a recent approach that leverages patterns for few-shot learning. However, PET uses task-specific unlabeled data. In this paper, we focus on few shot learning without any unlabeled data and introduce ADAPET, which modifies PET's objective to provide denser supervision during fine-tuning. As a result, ADAPET outperforms PET on Su-perGLUE without any task-specific unlabeled data. Our code can be found at https:// github.com/rrmenon10/ADAPET.
Background and Objective Hong Kong, like many parts of Asia, faces a high burden of hepatocellular carcinoma (HCC) caused by high endemic rates of hepatitis B virus infection. Hong Kong clinicians have developed a high level of expertise in HCC treatment across surgical, transarterial, ablative, radiotherapeutic and systemic modalities. This publication summarizes the latest evidence-based recommendations on how these modalities should be used. Methods In two meetings held in 2020, a multidisciplinary panel of surgeons, oncologists and interventional radiologists performed a narrative review of evidence on the management of HCC, with an emphasis on treatment of HCC not amenable to surgical resection. Close attention was paid to new evidence published since the previous version of these statements in 2018. Key Content and Findings The expert panel has formulated 60 consensus statements to guide the staging and treatment of unresectable HCC. Since the previous version of these statements, considerable additions have been made to the recommendations on use of targeted therapies and immunotherapies because of the large volume of new evidence. Conclusions Our consensus statements offer guidance on how to select HCC patients for surgical or non-surgical treatment and for choosing among non-surgical modalities for patients who are not candidates for resection. In particular, there is a need for more evidence to aid physicians in the selection of second-line systemic therapies, as currently most data are limited to patients with disease progression on first-line sorafenib.
String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE-a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE's ability to detect whether two strings can refer to the same entity-a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE (or one of its variants) outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE's ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in B 3 F1 over the previous state-of-the-art approach.
Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new parameter-efficient fine-tuning method called (IA) 3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model [1] called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark [2], attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available. 1 * Equal contribution. 1 https://github.com/r-three/t-few Preprint. Under review.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.