2008
DOI: 10.1007/s10579-008-9079-3
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of machine learning-based information extraction algorithms: criticisms and recommendations

Abstract: We survey the evaluation methodology adopted in information extraction (IE), as defined in a few different efforts applying machine learning (ML) to IE. We identify a number of critical issues that hamper comparison of the results obtained by different researchers. Some of these issues are common to other NLP-related tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(17 citation statements)
references
References 13 publications
0
17
0
Order By: Relevance
“…The typical approach (Lavelli et al, 2008) is to define a predicate match (g, p), with values in {T rue, F alse}, which determines if there is a match between two annotations g ∈ G and p ∈ P , and then use this predicate to compute an approximate version of precision (π) and recall (ρ):…”
Section: Annotation-based Modelmentioning
confidence: 99%
“…The typical approach (Lavelli et al, 2008) is to define a predicate match (g, p), with values in {T rue, F alse}, which determines if there is a match between two annotations g ∈ G and p ∈ P , and then use this predicate to compute an approximate version of precision (π) and recall (ρ):…”
Section: Annotation-based Modelmentioning
confidence: 99%
“…is obtained by (i) computing the category-specific values T P i , F P i and F N i , (ii) obtaining T P as the sum of the T P i 's (same for F P and F N), and then (iii) applying Equation (1). F M 1 is instead obtained by first computing the categoryspecific F 1 values and then averaging them across the c i 's.…”
Section: The Event Spacementioning
confidence: 99%
“…A recent review paper on the evaluation of IE systems [1], while discussing in detail other undoubtedly important evaluation issues (such as datasets, training set / test set splits, and evaluation campaigns), devotes surprisingly little space to discussing the mathematical measures used in evaluating IE systems; and the same happens for a recent survey on information extraction methods and systems [2]. That the issue is far from solved is witnessed by a long discussion 1 , appeared on a popular NLP-related blog, in which prominent members of the NLP community voice their discontent with the evaluation measures currently used in the IE literature, and come to the conclusion that no satisfactory measure has been found yet.…”
Section: Introductionmentioning
confidence: 99%
“…Rule learning algorithms can be trained to achieve higher precision and recall, but using cross validation over a data set might give us an estimation of how an algorithm will behave in practice. Despite its importance, the majority of proposals in literature do not use this validation method [17].…”
Section: Introductionmentioning
confidence: 99%