Biocomputing 2002 2001
DOI: 10.1142/9789812799623_0031
|View full text |Cite
|
Sign up to set email alerts
|

Mining Medline: Abstracts, Sentences, or Phrases?

Abstract: A growing body of work addresses automated mining for biochemical information from digital repositories of scientific literature such as MEDLINE. Some of this work uses abstracts as the unit of text from which to extract facts. Other work uses sentences for this purpose, while still other work uses phrases. Here, we compare abstracts, sentences, and phrases in MEDLINE using the standard information retrieval performance measures of recall, precision, and effectiveness for the task of mining interactions among … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
116
1
2

Year Published

2002
2002
2016
2016

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 145 publications
(120 citation statements)
references
References 17 publications
1
116
1
2
Order By: Relevance
“…Many public databases and bioinformatics tools have been developed and are currently available for use (Ding & Berleant, 2002). The primary goal of bioinformaticians is to develop reliable databases and effective analysis tools capable of handling bulk amount of biological data.…”
Section: Bioinformatics Workflow and Platform Designmentioning
confidence: 99%
“…Many public databases and bioinformatics tools have been developed and are currently available for use (Ding & Berleant, 2002). The primary goal of bioinformaticians is to develop reliable databases and effective analysis tools capable of handling bulk amount of biological data.…”
Section: Bioinformatics Workflow and Platform Designmentioning
confidence: 99%
“…In the latter case, if the two co-occurring words/ phrases are physically positioned very far apart, co-occurrence may have no meaning. A recent study quantifies some of the precision-recall tradeoffs for different units, ranging from phrases to Abstracts [33].…”
Section: Semantic Boundariesmentioning
confidence: 99%
“…For protein name tagging, accuracies as high as around 95% have been reported [67], but care should be given to the test set composition. It is known that for some organisms or some protein subdomains, the nomenclature is fairly rigidly standardized and excellent tagging accuracy can be reached there.…”
Section: Named Entity Taggingmentioning
confidence: 99%
“…The effect of accidental co-occurrence could be minimized by requiring frequent corroboration of any pairing. Using a similar co-occurrence approach, Ding et al [67] found that precision and recall traded off when the length of the used text segment was varied. Working with phrases gave generally better precision, while working with entire abstracts gave best recall; sentences scored in between.…”
Section: Fact Extractionmentioning
confidence: 99%