2017
DOI: 10.1007/jhep05(2017)145
|View full text |Cite
|
Sign up to set email alerts
|

Weakly supervised classification in high energy physics

Abstract: As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. This paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics -quark versus gluon tagging -we show that weakly supervised classification can match the performance of fully supervis… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
97
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
7
3

Relationship

2
8

Authors

Journals

citations
Cited by 105 publications
(97 citation statements)
references
References 31 publications
0
97
0
Order By: Relevance
“…Understanding these issues will be a crucial next step. Possibilities are to include uncertainties in the training or to train on data in a weakly supervised or unsupervised fashion [64].…”
Section: Comparisonmentioning
confidence: 99%
“…Understanding these issues will be a crucial next step. Possibilities are to include uncertainties in the training or to train on data in a weakly supervised or unsupervised fashion [64].…”
Section: Comparisonmentioning
confidence: 99%
“…refs. [33][34][35][36][37][38] and [39][40][41][42][43]. They rely on categorizing and comparing datasets with different expected signal and background admixtures or identifying anomalous events inside large datasets.…”
Section: Introductionmentioning
confidence: 99%
“…Further, studies thus far have used simulation to train the models, which is not reality, and this risks learning the idiosyncrasies of the simulation, and not real physics. Recent ideas for training directly on the data [74][75][76][77] are closely related to the notions of power counting and parametric discrimination power developed here [16,17]. More generally, in order to trust the output of the model and identify the relevant physics that drives the discrimination power, first-principles theoretical calculations that parallel the machine as best as possible are necessary.…”
Section: Discussionmentioning
confidence: 99%