2019
DOI: 10.21203/rs.2.4295/v3
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Shared Data Science Infrastructure for Genomics Data

Abstract: Background: Creating a scalable computational infrastructure to analyze the wealth of information contained in data repositories is difficult due to significant barriers in organizing, extracting and analyzing relevant data. Shared data science infrastructures like Boa_g is needed to efficiently process and parse data contained in large data repositories. The main features of Boa_g are inspired from existing languages for data intensive computing and can easily integrate data from biological data repositories.… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1
1
1

Relationship

3
0

Authors

Journals

citations
Cited by 3 publications
(10 citation statements)
references
References 13 publications
(13 reference statements)
0
10
0
Order By: Relevance
“…MG1655]’. BoaG is a domain-specific language that uses a Hadoop-based infrastructure for biological data ( Bagheri et al , 2019 ). A BoaG program is submitted to the BoaG infrastructure.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…MG1655]’. BoaG is a domain-specific language that uses a Hadoop-based infrastructure for biological data ( Bagheri et al , 2019 ). A BoaG program is submitted to the BoaG infrastructure.…”
Section: Methodsmentioning
confidence: 99%
“…We utilize a genomics-specific language, BoaG, that uses the Hadoop cluster ( Bagheri et al , 2019 ), to explore annotations in the NR database that is not available in other works.…”
Section: Introductionmentioning
confidence: 99%
“…When a BoaG program is executing in parallel, it emits values to the output aggregator that collects all data and provides the final output. Aggregators, for example, top, mean, maximum, and minimum, also can contain indices that would be a grouping operation similar to traditional query languages [9].…”
Section: Boag Domain-specific Languagementioning
confidence: 99%
“…To this end, we utilized BoaG to address these challenges at scale. BoaG belongs to the family of a domain-specific language and shared infrastructure, called Boa, that has been applied to address challenges in mining software repositories [9], genomics data [10], and big data transportation [11]. Boa can process and query terabytes of raw data and uses a backend based on map-reduce to effectively distribute computational analyses and querying tasks.…”
Section: Introductionmentioning
confidence: 99%
“…To this end, we utilized BoaG to address these challenges at scale. BoaG belongs to the family of a domain-specific language and shared infrastructure, called Boa, that has been applied to address challenges in mining software repositories [28], genomics data [12], and big data transportation [36]. Boa can process and query terabytes of raw data and uses a backend based on map-reduce to effectively distribute computational analyses and querying tasks.…”
Section: Discussionmentioning
confidence: 99%