Training conditional random fields with multivariate evaluation measures

Suzuki, Jun; McDermott, Erik; Isozaki, Hideki

doi:10.3115/1220175.1220203

Cited by 30 publications

(37 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a measure of classification accuracy we use, similarly to , the token-and-separator variant (proposed in [Esuli and Sebastiani 2010]) of the well-known F 1 measure, according to which an information extraction system is evaluated on an event space consisting of all the t-units in the text. In other words, each t-unit x t (rather than each mention, as in the traditional "segmentation F-score" model [Suzuki et al 2006]) counts as a true positive, true negative, false positive, or false negative for a given concept c r , depending on whether x t belongs to c r or not in the predicted annotation and in the true annotation. This model has the advantage that it credits a system for partial success (i.e., degree of overlap between a predicted mention and a true mention for the same concept), and that it penalizes both overannotation and underannotation.…”

Section: Evaluation Measuresmentioning

confidence: 99%

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

Marcheggiani

Sebastiani

2017

J. Data and Information Quality

View full text Add to dashboard Cite

In the last five years there has been a flurry of work on information extraction from clinical documents, that is, on algorithms capable of extracting, from the informal and unstructured texts that are generated during everyday clinical practice, mentions of concepts relevant to such practice. Many of these research works are about methods based on supervised learning, that is, methods for training an information extraction system from manually annotated examples. While a lot of work has been devoted to devising learning methods that generate more and more accurate information extractors, no work has been devoted to investigating the effect of the quality of training data on the learning process for the clinical domain. Low quality in training data often derives from the fact that the person who has annotated the data is different from the one against whose judgment the automatically annotated data must be evaluated. In this article, we test the impact of such data quality issues on the accuracy of information extraction systems as applied to the clinical domain. We do this by comparing the accuracy deriving from training data annotated by the authoritative coder (i.e., the one who has also annotated the test data and by whose judgment we must abide) with the accuracy deriving from training data annotated by a different coder, equally expert in the subject matter. The results indicate that, although the disagreement between the two coders (as measured on the training set) is substantial, the difference is (surprisingly enough) not always statistically significant. While the dataset used in the present work originated in a clinical context, the issues we study in this work are of more general interest.

show abstract

Section: Evaluation Measuresmentioning

confidence: 99%

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

Marcheggiani

Sebastiani

2017

J. Data and Information Quality

View full text Add to dashboard Cite

show abstract

“…It would, therefore, be useful to train the parameters in the probability distribution with respect to the target accuracy measures. This type of training is called ''MEA training'' in general, and there have been several studies of MEA training in the field of machine learning: (Suzuki et al, 2006;Gross et al, 2007b;Jansche, 2007). There are, however, few studies applying MEA training to problems in bioinformatics (Gross et al, 2007a), and further studies in that area would be enlightening.…”

Section: Training Probabilistic Models From the Viewpoint Of Mea (Meamentioning

confidence: 99%

A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

Hamada

Asai

2012

Journal of Computational Biology

View full text Add to dashboard Cite

Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA.

show abstract

“…As a result, this evaluation model is sometimes called segmentation F-score [5]. In this paper we argue that the segmentation F-score model has several shortcomings, and propose a new evaluation model that does not suffer from them.…”

Section: T P T P + F P T P T P + F N T P T P + F Pmentioning

confidence: 99%

“…According to the exact match model (currently the most frequently used model; see e.g., [8,9,10,11,12,5]) this should never be the case. This seems too harsh a criterion: for instance, given true segment σ="Ronald Reagan Presidential Library" for tag ORG, a tagger that tags as ORG the segmentσ="Reagan Presidential Library" would receive no credit at all for this (σ would generate a false negative andσ would generate a false positive).…”

Section: Problems With the Current Evaluation Modelmentioning

confidence: 99%

Evaluating Information Extraction

Esuli

Sebastiani

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The issue of how to experimentally evaluate information extraction (IE) systems has received hardly any satisfactory solution in the literature. In this paper we propose a novel evaluation model for IE and argue that, among others, it allows (i) a correct appreciation of the degree of overlap between predicted and true segments, and (ii) a fair evaluation of the ability of a system to correctly identify segment boundaries. We describe the properties of this models, also by presenting the result of a re-evaluation of the results of the CoNLL'03 and CoNLL'02 Shared Tasks on Named Entity Extraction.

show abstract

Training conditional random fields with multivariate evaluation measures

Cited by 30 publications

References 17 publications

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

Evaluating Information Extraction

Contact Info

Product

Resources

About