2018
DOI: 10.1186/s12859-018-2021-9
|View full text |Cite
|
Sign up to set email alerts
|

LocText: relation extraction of protein localizations to assist database curation

Abstract: BackgroundThe subcellular localization of a protein is an important aspect of its function. However, the experimental annotation of locations is not even complete for well-studied model organisms. Text mining might aid database curators to add experimental annotations from the scientific literature. Existing extraction methods have difficulties to distinguish relationships between proteins and cellular locations co-mentioned in the same sentence.ResultsLocText was created as a new method to extract protein loc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
25
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
2
2
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 129 publications
(26 citation statements)
references
References 48 publications
1
25
0
Order By: Relevance
“…We hypothesize that the evidence for annotations is provided in the k-nearest sentences to the sentence where the protein (or its coding gene) is mentioned. Indeed, as shown by Cejuela et al [14], the k-1 sentences accounts for 89% of all unique relationships in the case of protein location evidence. To identify the candidate evidence sentences, occurrence of protein features, such as accession identifier, protein name (recommended, alternative and short), gene name and their synonyms, are searched in the sentences.…”
Section: Candidate Sentences For Annotation Evidencementioning
confidence: 88%
See 1 more Smart Citation
“…We hypothesize that the evidence for annotations is provided in the k-nearest sentences to the sentence where the protein (or its coding gene) is mentioned. Indeed, as shown by Cejuela et al [14], the k-1 sentences accounts for 89% of all unique relationships in the case of protein location evidence. To identify the candidate evidence sentences, occurrence of protein features, such as accession identifier, protein name (recommended, alternative and short), gene name and their synonyms, are searched in the sentences.…”
Section: Candidate Sentences For Annotation Evidencementioning
confidence: 88%
“…Text mining has also supported more specific curation tasks, such as protein localization. LocText, for example, implements a NER and RE for proteins based on SVM, achieving 86% precision (56% F1-score) [14]. To address the common issue of class imbalance in biocuration, an ensemble of SVM classifiers along with random under-sampling were proposed for automatically identifying relevant papers for curation in the Gene Expression Database [15].…”
Section: Introductionmentioning
confidence: 99%
“…Overall, most prediction methods came relatively close ( Fig. 2A: the more white the fewer the mistakes) to estimating the 7-class location spectrum for the new data sets that had not been used for developing the methods (HPA: new data from The Human Protein Atlas, and the LocText text mining results [9]). When error-correcting the predicted spectra according to the performance confusion matrix [14], the observations and predictions became more similar for most methods (Fig.…”
Section: Accurate Predictions Of Location Spectrum For Organismsmentioning
confidence: 88%
“…This resulted in a set of 5,563 proteins with experimental annotations from Swiss-Prot (release 2017_1; human proteome release up000005640), and in 12,036 from The Human Protein Atlas (version 15; confined to 32 localization classes). We had access to one additional set of experimental annotation in GO format extracted from scientific literature by the tool LocText [9].…”
Section: Experimental Annotationsmentioning
confidence: 99%
See 1 more Smart Citation