Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005
DOI: 10.1145/1076034.1076091
|View full text |Cite
|
Sign up to set email alerts
|

Learning to extract information from semi-structured text using a discriminative context free grammar

Abstract: In recent work, conditional Markov chain models (CMM) have been used to extract information from semi-structured text (one example is the Conditional Random Field [10]). Applications range from finding the author and title in research papers to finding the phone number and street address in a web page. The CMM framework combines a priori knowledge encoded as features with a set of labeled training data to learn an efficient extraction process. We will show that similar problems can be solved more effectively… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0
3

Year Published

2009
2009
2016
2016

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 57 publications
(36 citation statements)
references
References 11 publications
0
33
0
3
Order By: Relevance
“…Paul Viola and Mukund Narasimhand [15] , present a classification algorithm based on discriminatively trained Context Free Grammar (CFG) to extract information from HTML text. The challenge is in converting the HTML information of customer (which is already available in an unstructured form on web sites and in email) into the regularized or schematized form required by a database system.…”
Section: Related Workmentioning
confidence: 99%
“…Paul Viola and Mukund Narasimhand [15] , present a classification algorithm based on discriminatively trained Context Free Grammar (CFG) to extract information from HTML text. The challenge is in converting the HTML information of customer (which is already available in an unstructured form on web sites and in email) into the regularized or schematized form required by a database system.…”
Section: Related Workmentioning
confidence: 99%
“…As baseline models we used a Maximum Entropy classifier (the local classifier described in Section 3.1), standard linear-chain CRF 5 and a grammar-based extraction approach similar to the ones presented by [8] or [22]. Because of its computational complexity, the SVM'ISO approach we described in Section 2.2 cannot be used on our corpora.…”
Section: Resultsmentioning
confidence: 99%
“…Another popular idea consists in capturing interaction among labels in a hierarchical approach [7,8]. For instance, in an Information Extraction task, [8] proposes to use a Context Free Grammar to escape the "linear tyranny" of chain-models.…”
Section: Existing Methods For Sequence Labeling: Long-term Output Depmentioning
confidence: 99%
See 2 more Smart Citations