2017
DOI: 10.1093/bioinformatics/btx083
|View full text |Cite
|
Sign up to set email alerts
|

nala: text mining natural language mutation mentions

Abstract: MotivationThe extraction of sequence variants from the literature remains an important task. Existing methods primarily target standard (ST) mutation mentions (e.g. ‘E6V’), leaving relevant mentions natural language (NL) largely untapped (e.g. ‘glutamic acid was substituted by valine at residue 6’).ResultsWe introduced three new corpora suggesting named-entity recognition (NER) to be more challenging than anticipated: 28–77% of all articles contained mentions only available in NL. Our new method nala captured … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
7
1
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 30 publications
0
15
0
Order By: Relevance
“…To date, a handful of automatic tools have attempted to address this issue. For instance, command-line automatic variation detection tools such as EMU ( 10 ), MutationFinder ( 11 ) or Nala ( 12 ) can recognize variation mentions in text and return the results in wNm format (e.g. ‘A146T’), while SETH ( 13 ) and tmVar ( 6 ) can further map the extracted mentions to the specific dbSNP identifiers (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…To date, a handful of automatic tools have attempted to address this issue. For instance, command-line automatic variation detection tools such as EMU ( 10 ), MutationFinder ( 11 ) or Nala ( 12 ) can recognize variation mentions in text and return the results in wNm format (e.g. ‘A146T’), while SETH ( 13 ) and tmVar ( 6 ) can further map the extracted mentions to the specific dbSNP identifiers (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…Inversely, whereas all other variant corpora were identified to have the ‘+’ character as significant (a symbol commonly used to denote mutations), VariomeCorpus was the only not to share such characteristic. Indeed, VariomeCorpus was recently reported to annotate many vague mentions such as ‘de novo mutation’ and ‘large deletion’, with only a subset mentioning position-specific variants ( Cejuela et al , 2017 ). Due to such differences, this was excluded from subsequent power analyses.…”
Section: Resultsmentioning
confidence: 99%
“…The annotation of variants involves a critical step that finds literature to explain the potential influence of a discovered variant on protein functions, cell behaviors, organ normality, etc. In this regard, many efforts have been made to improve the search performance [4,5]. Even though, we observed that the sensitivity of variant queries still has space to increase.…”
Section: Introductionmentioning
confidence: 85%