2019
DOI: 10.17129/botsci.2226
|View full text |Cite
|
Sign up to set email alerts
|

Datataxa: a new script to extract metadata sequence information from GenBank, the Flora of Bajío as a case study

Abstract: Background: GenBank is a public repository that houses millions of nucleotide sequences. Several software have been developed to extract information stored in GenBank. However, none of them are useful to extract and organize GenBank accession based on metadata. We developed a new script called Datataxa, which works to mine GenBank information. The checklist of the Flora del Bajío y de Regiones Adyacentes (FBRA) was used as a case study to apply our script.Questions: How many species occurring in the FBRA have … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…For the compilation of phylogenetic and biogeographical hypotheses, we performed a metasearch of evolutionary and biogeographical scientific publications based on molecular phylogenetic and/or morphological evidence of endemic and quasi-endemic taxa of the YPBP. First, we used the Datataxa script (Ruiz-Sánchez et al, 2019) to extract meta-information on articles from the Genbank database (Benson et al, 2018). The search categories were Phylogenetic studies (including the terms phylogen*, monop*, systemat*, sistemat*, relationsh*, relacio*), Phylogeographic studies (phylogeog*, filogeog*), Phylogenomic analysis (phylogenom*, genome-scale, “plastid genome”), Diversity studies (diver*, geneti*, pop*, pobl*) and Biogeography (biogeog*).…”
Section: Methodsmentioning
confidence: 99%
“…For the compilation of phylogenetic and biogeographical hypotheses, we performed a metasearch of evolutionary and biogeographical scientific publications based on molecular phylogenetic and/or morphological evidence of endemic and quasi-endemic taxa of the YPBP. First, we used the Datataxa script (Ruiz-Sánchez et al, 2019) to extract meta-information on articles from the Genbank database (Benson et al, 2018). The search categories were Phylogenetic studies (including the terms phylogen*, monop*, systemat*, sistemat*, relationsh*, relacio*), Phylogeographic studies (phylogeog*, filogeog*), Phylogenomic analysis (phylogenom*, genome-scale, “plastid genome”), Diversity studies (diver*, geneti*, pop*, pobl*) and Biogeography (biogeog*).…”
Section: Methodsmentioning
confidence: 99%
“…A question that may arise at this point is: What connection is there to specimens? Megaphylogenetic approaches, generally rooted primarily in GenBank resources, tend to anonymize the original data sources, but because (at least evaluated in terms of species coverage) most plant sequence data derives from molecular systematic or DNA barcoding studies (e.g., Ruiz-Sanchez et al, 2019), there usually are specimen links even if these are not reflected in inconsistently applied metadata fields (Chen and Sarkar, 2011;Tahsin et al, 2018;Troudet et al, 2018) or inconsistent voucher citation practices (Funk et al, 2018). Recent promising attempts have been made to discover and take advantage of linkages between molecular repositories and specimen metadata, which are particularly essential for studies conducted at the population level (Tahsin et al, 2016(Tahsin et al, , 2018Pelletier and Carstens, 2018), and other situations where sequence data without provenances are generally useless.…”
Section: Toward a Tree Of Lifementioning
confidence: 99%