Training Complex Models with Multi-Task Weak Supervision

Ratner, Alexander; Hancock, Braden; Dunnmon, Jared; Sala, F.; Pandey, Shreyash; Ré, Christopher

doi:10.1609/aaai.v33i01.33014763

Cited by 121 publications

(177 citation statements)

References 23 publications

Supporting

Mentioning

177

Contrasting

Order By: Relevance

“…For information extraction tasks such as relation extraction and entity typing, distant supervision (Mintz et al, 2009) is a powerful approach for adding more data, using a knowledge base (Del Corro et al, 2015;Rabinovich and Klein, 2017) or heuristics (Ratner et al, 2016;Hancock et al, 2018) to automatically label instances. One can treat this data just like any other supervised data, but it is noisy; more effective approaches employ specialized probabilistic models (Riedel et al, 2010;Ratner et al, 2018a), capturing its interaction with other supervision (Wang and Poon, 2018) or breaking down aspects of a task on which it is reliable (Ratner et al, 2018b). However, these approaches often require sophisticated probabilistic inference for training of the final model.…”

Section: Introductionmentioning

confidence: 99%

Learning to Denoise Distantly-Labeled Data for Entity Typing

Onoe¹,

Durrett²

2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

Distantly-labeled data can be used to scale up training of statistical models, but it is typically noisy and that noise can vary with the distant labeling technique. In this work, we propose a two-stage procedure for handling this type of data: denoise it with a learned model, then train our final model on clean and denoised distant data with standard supervised training. Our denoising approach consists of two parts. First, a filtering function discards examples from the distantly labeled data that are wholly unusable. Second, a relabeling function repairs noisy labels for the retained examples. Each of these components is a model trained on synthetically-noised examples generated from a small manually-labeled set. We investigate this approach on the ultrafine entity typing task of Choi et al. (2018). Our baseline model is an extension of their model with pre-trained ELMo representations, which already achieves state-of-the-art performance. Adding distant data that has been denoised with our learned models gives further performance gains over this base model, outperforming models trained on raw distant data or heuristically-denoised distant data.

show abstract

Section: Introductionmentioning

confidence: 99%

Learning to Denoise Distantly-Labeled Data for Entity Typing

Onoe¹,

Durrett²

2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

show abstract

“…Specifically, the conditional independence assumption made in IVY for the inclusion of individual SNPs is valid because the same assumption is made for methods employing allele scores (Sebastiani et al, 2012). We point out the possibility of extending beyond the assumption of conditional independence such as handling SNPs occurring physically upon the same chromosome which display some degree of genetic linkage (Ratner et al, 2018). Empirical validation of such an extension is left for future research.…”

Section: Discussionmentioning

confidence: 96%

“…In IVY, we also made use of such SNPs. We refer interested readers to (Ratner et al, 2018) for a discussion of scenarios such as handling dependencies among * 's. In this case, Ivy generalizes beyond the standard conditional independence assumption.…”

Section: Assumptions and Problem Formulationmentioning

confidence: 99%

“…Specifically, we focus upon the identification and representation of instrumental variables by learning a valid and influential latent instrumental variable from candidate genetic variants. Our framework leverages recent work on learning and combining noisy labels (Natarajan, Dhillon, Ravikumar, & Tewari, 2018); (Ratner, De Sa, Wu, Selsam, & Ré, 2016;Ratner et al, 2018), with the goal of more accurate prediction and representation of population-level variance using individual-level data. Our approach, termed Instrumental Variable sYnthesis (IVY), is complementary to existing causal effect estimators, whether classical (the Wald estimator (Wald, 1940), the two-stage least squares approach (Angrist et al, 1996) ), robust (Bowden, Davey Smith, Haycock, & Burgess,…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Mendelian Randomization with Instrumental Variable Synthesis (IVY)

Kuang

Córdova‐Palomera

Sala

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Mendelian Randomization (MR) is an important causal inference method primarily used in biomedical research. This work applies contemporary techniques in machine learning to improve the robustness and power of traditional MR tools. By denoising and combining candidate genetic variants through techniques from unsupervised probabilistic graphical models, an influential latent instrumental variable is constructed for causal effect estimation. We present results on identifying relationships between biomarkers and the occurrence of coronary artery disease using individual-level real-world data from UK-BioBank via the proposed method. The approach, termed Instrumental Variable sYnthesis (IVY) is proposed as a complement to current methods, and is able to improve results based on allele scoring, particularly at moderate sample sizes. 2016; Kang, Zhang, Cai, & Small, 2016), or modern (deep learning-based techniques (Hartford, Lewis, Leyton-Brown, & Taddy, 2017; IVY seeks to produce instrumental variables that can be used by downstream standard instrumental variable methodology for estimating causal-effects.

show abstract

“…Moreover, motivated by the increasing volume of digitized but unlabeled medical data, we evaluate the scaling of our approach with respect to additional unlabeled data, and find that performance consistently increases as more unlabeled data is collected in a way that is consistent with theoretical predictions. 33,34 In summary, cross-modal data programming can lower a substantial barrier to machine learning model development in medicine by serving as a fundamental new interface that reduces labeling time required from domain experts, thereby providing a stepping stone towards widespread adoption of machine learning models to provide positive, tangible clinical impact.…”

mentioning

confidence: 99%

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

et al. 2020

Self Cite

View full text Add to dashboard Cite

Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. images or time series) by writing rules over an auxiliary modality (e.g. text reports). The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine.Modern machine learning approaches have achieved impressive empirical successes on diverse clinical tasks that include predicting cancer prognosis from digital pathology, 1, 2 classifying skin lesions from dermatoscopy, 3 characterizing retinopathy from fundus photographs, 4 detecting intracranial hemorrhage through computed tomography, 5, 6 and performing automated interpretation of chest radiographs. 7,8 Remarkably, these applications typically build on standardized reference neural network architectures 9 supported in professionally-maintained open source frameworks, 10, 11 suggesting that model design is no longer a major barrier to entry in medical machine learning. However, each of these application successes was predicated on a not-so-hidden cost: massive hand-labeled training datasets, often produced through years of institutional investment and expert clinician labeling time, at a cost of hundreds of thousands of dollars per task or more. 4,12 In addition to being extremely costly, these training sets are inflexible: given a new classification schema, imaging system, patient population, or other change in the data distribution or modeling task, the training set generally needs to be relabeled from scratch. These factors suggest 1

show abstract

Training Complex Models with Multi-Task Weak Supervision

Cited by 121 publications

References 23 publications

Learning to Denoise Distantly-Labeled Data for Entity Typing

Learning to Denoise Distantly-Labeled Data for Entity Typing

Mendelian Randomization with Instrumental Variable Synthesis (IVY)

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

Contact Info

Product

Resources

About