An endogeneous corpus-based method for structural noun phrase disambiguation

Bourigault, Didier

doi:10.3115/976744.976755

Cited by 23 publications

(26 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Such approaches are difficult to evaluate without a golden standard and evaluations vary according to the methods. However, the recall is generally good ( [2] estimates the silence to 5%), while the precision is rather low ( [2] rejects 50% of the extracted term candidates, the system discussed in [10] has an error rate of 20%).…”

Section: Which Approach To Identify Terms?mentioning

confidence: 99%

“…Several strategies have been used and sometimes associated to finally extract the term candidates: statistical filtering [1], manual filtering through the tool interface [2] or the exploitation of external resources. We propose a combination of the three methods.…”

Section: Introductionmentioning

confidence: 99%

“…Different strategies can be applied: term extraction based on lexico-syntactic markers [1], chunking based syntactic frontiers and endogenous parsing [2] , and distributional analysis [3]. Those different techniques show satisfying extraction results regarding the recall [4].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Improving Term Extraction with Terminological Resources

Aubin¹,

Hamon²

2006

Advances in Natural Language Processing

View full text Add to dashboard Cite

Abstract. Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. Facing the difficulty or impossibility to customize existing tools, we developed a tunable term extractor. It exploits linguistic-based rules in combination with the reuse of existing terminologies, i.e. exogenous disambiguation. Experiments reported here show that the combination of the two strategies allows the extraction of a greater number of term candidates with a higher level of reliability. We further describe the extraction process involving both endogenous and exogenous disambiguation implemented in the term extractor Y A T E A .

show abstract

Section: Which Approach To Identify Terms?mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improving Term Extraction with Terminological Resources

Aubin¹,

Hamon²

2006

Advances in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Background and Introduction NLP techniques have been applied to extraction of information from corpora for tasks such as free indexing (extraction of descriptors from corpora), (Metzler and Haas, 1989;Schwarz, 1990;Sheridan and Smeaton, 1992;Strzalkowski, 1996), term acquisition (Smadja and McKeown, 1991;Bourigault, 1993;Justeson and Katz, 1995;Dallle, 1996), or extraction of lin9uistic information e.g. support verbs (Grefenstette and Teufel, 1995), and event structure of verbs (Klavans and Chodorow, 1992).…”

mentioning

confidence: 99%

“…The same system has been effectively applied both to English and French, although this paper focuses on French (see (Jacquemin, 1994) for the case of syntactic variants in English). All evaluation experiments were performed on two corpora: a training corpus [ECI] (ECI, 1989 and (Bourigault, 1993) The following section describes methods for grouping multi-word term variants; Section 4 presents a linguistically-motivated method for lexical analysis (inflectional analysis, part of speech tagging, and derivational analysis); Section 5 explains term expansion methods: constructions with a local parse through syntactic transformations preserving dependency relations; Section 6 illustrates the empirical tuning of linguistic rules; Section 7 presents an evaluation of the results in terms of precision and recall. • Semantic (Type 3): synonyms are found in the variant; the structure may be modified, e.g.…”

mentioning

confidence: 99%

Expansion of multi-word terms for indexing and retrieval using morphology and syntax

Jacquemin

Klavans

Tzoukermann³

1997

Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics -

View full text Add to dashboard Cite

A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the successful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. Final results are evaluated for precision and recall, and implications for indexing and retrieval are discussed. MotivationTerms are known to be excellent descriptors of the informational content of textual documents (Srinivasan, 1996), but they are subject to numerous linguistic variations. Terms cannot be retrieved properly with coarse text simplification techniques (e.g. stemming); their identification requires precise and efficient NLP techniques. We have developed a domain independent system for automatic term recognition from unrestricted text. The system presented in this paper takes as input a list of controlled terms and a corpus; it detects and marks occurrences of term We would like to thank the NLP Group of Columbia University, Bell Laboratories -Lucent Technologies, and the Institut Universitaire de Technologie de Nantes for their support of the exchange visitor program for the first author. We also thank the Institut de l'Information Scientifique et Technique (INIST-CNRS) for providing us with the agricultural corpus and the associated term list, and Didier Bourigault for providing us with terms extracted from the newspaper corpus through LEXTER (Bourigault, 1993).variants within the corpus. The system takes as input a precompiled (automatically or manually) term list, and transforms it dynamically into a more complete term list by adding automatically generated variants. This method extends the limits of term extraction as currently practiced in the IR community: it takes into account multiple morphological and syntactic ways linguistic concepts are expressed within language. Our approach is a unique hybrid in allowing the use of manually produced precompiled data as input, combined with fully automatic computational methods for generating term expansions. Our results indicate that we can expand term variations at least 30% within a scientific corpus. 2Background and Introduction NLP techniques have been applied to extraction of information from corpora for tasks such as free indexing (extraction of descriptors from corpora), (Metzler and Haas, 1989;Schwarz, 1990;Sheridan and Smeaton, 1992;Strzalkowski, 1996), term acquisition (Smadja and McKeown, 1991;Bourigault, 1993;Justeson and Katz, 1995; Dallle, 1996), or extraction of lin9uistic information e.g. support verbs (Grefenstette and Teufel, 1995), and event structure of verbs (Klavans and Chodorow, 1992). Although useful, these approaches suffer from two weaknesses which we address. First is the issue...

show abstract

Natural Language Processing and Digital Libraries

Chanod

1999

Information Extraction

View full text Add to dashboard Cite

An endogeneous corpus-based method for structural noun phrase disambiguation

Cited by 23 publications

References 4 publications

Improving Term Extraction with Terminological Resources

Improving Term Extraction with Terminological Resources

Expansion of multi-word terms for indexing and retrieval using morphology and syntax

Natural Language Processing and Digital Libraries

Contact Info

Product

Resources

About