Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL - ACL '06 2006
DOI: 10.3115/1220175.1220203
|View full text |Cite
|
Sign up to set email alerts
|

Training conditional random fields with multivariate evaluation measures

Abstract: This paper proposes a framework for training Conditional Random Fields (CRFs) to optimize multivariate evaluation measures, including non-linear measures such as F-score. Our proposed framework is derived from an error minimization approach that provides a simple solution for directly optimizing any evaluation measure. Specifically focusing on sequential segmentation tasks, i.e. text chunking and named entity recognition, we introduce a loss function that closely reflects the target evaluation measure for thes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
37
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(37 citation statements)
references
References 17 publications
0
37
0
Order By: Relevance
“…As a measure of classification accuracy we use, similarly to , the token-and-separator variant (proposed in [Esuli and Sebastiani 2010]) of the well-known F 1 measure, according to which an information extraction system is evaluated on an event space consisting of all the t-units in the text. In other words, each t-unit x t (rather than each mention, as in the traditional "segmentation F-score" model [Suzuki et al 2006]) counts as a true positive, true negative, false positive, or false negative for a given concept c r , depending on whether x t belongs to c r or not in the predicted annotation and in the true annotation. This model has the advantage that it credits a system for partial success (i.e., degree of overlap between a predicted mention and a true mention for the same concept), and that it penalizes both overannotation and underannotation.…”
Section: Evaluation Measuresmentioning
confidence: 99%
“…As a measure of classification accuracy we use, similarly to , the token-and-separator variant (proposed in [Esuli and Sebastiani 2010]) of the well-known F 1 measure, according to which an information extraction system is evaluated on an event space consisting of all the t-units in the text. In other words, each t-unit x t (rather than each mention, as in the traditional "segmentation F-score" model [Suzuki et al 2006]) counts as a true positive, true negative, false positive, or false negative for a given concept c r , depending on whether x t belongs to c r or not in the predicted annotation and in the true annotation. This model has the advantage that it credits a system for partial success (i.e., degree of overlap between a predicted mention and a true mention for the same concept), and that it penalizes both overannotation and underannotation.…”
Section: Evaluation Measuresmentioning
confidence: 99%
“…It would, therefore, be useful to train the parameters in the probability distribution with respect to the target accuracy measures. This type of training is called ''MEA training'' in general, and there have been several studies of MEA training in the field of machine learning: (Suzuki et al, 2006;Gross et al, 2007b;Jansche, 2007). There are, however, few studies applying MEA training to problems in bioinformatics (Gross et al, 2007a), and further studies in that area would be enlightening.…”
Section: Training Probabilistic Models From the Viewpoint Of Mea (Meamentioning
confidence: 99%
“…As a result, this evaluation model is sometimes called segmentation F-score [5]. In this paper we argue that the segmentation F-score model has several shortcomings, and propose a new evaluation model that does not suffer from them.…”
Section: T P T P + F P T P T P + F N T P T P + F Pmentioning
confidence: 99%
“…According to the exact match model (currently the most frequently used model; see e.g., [8,9,10,11,12,5]) this should never be the case. This seems too harsh a criterion: for instance, given true segment σ="Ronald Reagan Presidential Library" for tag ORG, a tagger that tags as ORG the segmentσ="Reagan Presidential Library" would receive no credit at all for this (σ would generate a false negative andσ would generate a false positive).…”
Section: Problems With the Current Evaluation Modelmentioning
confidence: 99%