2017
DOI: 10.1093/bioinformatics/btx799
|View full text |Cite
|
Sign up to set email alerts
|

GeoBoost: accelerating research involving the geospatial metadata of virus GenBank records

Abstract: Supplementary data are available at Bioinformatics online.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
13
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
7

Relationship

4
3

Authors

Journals

citations
Cited by 11 publications
(13 citation statements)
references
References 8 publications
0
13
0
Order By: Relevance
“…While georeferenced datasets describing environmental and climactic phenomena are more readily available, emerging genetic data sets present some challenges, as location of isolation are generally extracted manually from public records or publications. To address visualization of genetic sequence data, there have been efforts to extract geospatial metadata such as location and host from GenBank records, to ease automation of linking relevant sequence data for spatial modeling of disease [ 131 ]. Efforts to model outbreaks within a decision support environment which integrate data collected on different spatial scales need to address automated data extraction and transformation such as aggregation of case reports, host population densities, and locations from which isolates were sequenced for a region under study such that visualization of a multifaceted scenario is possible.…”
Section: Discussionmentioning
confidence: 99%
“…While georeferenced datasets describing environmental and climactic phenomena are more readily available, emerging genetic data sets present some challenges, as location of isolation are generally extracted manually from public records or publications. To address visualization of genetic sequence data, there have been efforts to extract geospatial metadata such as location and host from GenBank records, to ease automation of linking relevant sequence data for spatial modeling of disease [ 131 ]. Efforts to model outbreaks within a decision support environment which integrate data collected on different spatial scales need to address automated data extraction and transformation such as aggregation of case reports, host population densities, and locations from which isolates were sequenced for a region under study such that visualization of a multifaceted scenario is possible.…”
Section: Discussionmentioning
confidence: 99%
“…This paucity of high resolution geographic metadata has inspired researchers to develop new methods and tools to ascertain the LOIH for viral sequences represented in GenBank records ( Tahsin et al, 2014 ; Tahsin et al, 2017 ; Magge et al, 2018 ). Indeed, available pipelines for discerning the LOIH are configured such that they output not only the most probable location for a specific sequence, but also a vector of other possible locations along with their relative probabilities ( Magge et al, 2018 ).…”
Section: Introductionmentioning
confidence: 99%
“…In our prior work, we developed GeoBoost and other automated language processing methods to address the lack of geospatial certainty in sequence databases. GeoBoost improves the granularity of the location of the infected host (LOIH) for GenBank records (Tahsin et al. 2018).…”
Section: Introductionmentioning
confidence: 99%
“…From these, GeoBoost extracts all geospatial mentions and assigns a probability of the LOIH given the GenBank record, P||Linormal Ri) where Li represents the unknown location and Ri indicates the linked record information for taxon i . The probabilities are currently based on a set of predefined rules that assign higher probabilities to more specific and accurate locations found in papers that can be used jointly with information scanned from the GenBank record (Tahsin et al. 2018).…”
Section: Introductionmentioning
confidence: 99%