2015
DOI: 10.1155/2015/502795
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

Abstract: Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very la… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0
1

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(10 citation statements)
references
References 19 publications
0
9
0
1
Order By: Relevance
“…We use Cassandra for two main reasons. First, Cassandra is one of the most used and best performing NoSQL databases today, with applications in several different domains ( Duarte & Bernardino, 2016 ; Daz, Martn & Rubio, 2016 ; Mahgoub et al, 2017a ; Le et al, 2014 ; Aniceto et al, 2015 ; Pinheiro et al, 2017 ). Second, the existing documentation is very complete, and it allows to easily replicate and generalize the experiments carried out in this work.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We use Cassandra for two main reasons. First, Cassandra is one of the most used and best performing NoSQL databases today, with applications in several different domains ( Duarte & Bernardino, 2016 ; Daz, Martn & Rubio, 2016 ; Mahgoub et al, 2017a ; Le et al, 2014 ; Aniceto et al, 2015 ; Pinheiro et al, 2017 ). Second, the existing documentation is very complete, and it allows to easily replicate and generalize the experiments carried out in this work.…”
Section: Methodsmentioning
confidence: 99%
“…Our approach is completely general, and can be applied to different relational and NoSQL databases with little effort. In this work we choose to study the performance of irace on the Cassandra database, one of the most popular NoSQL databases, used in several real-world applications such as Internet of Things, genomics, or electric consumption data ( Cassandra, 2014 ; Duarte & Bernardino, 2016 ; Daz, Martn & Rubio, 2016 ; Mahgoub et al, 2017a ; Le et al, 2014 ; Aniceto et al, 2015 ; Pinheiro et al, 2017 ). We measure the performance in terms of throughput using the YCSB benchmark ( Cooper et al, 2010 ; Wang & Tang, 2012 ), observing a speedup of up to 30% over the default configuration.…”
Section: Introductionmentioning
confidence: 99%
“…Ferreira et al 37 compared relational and NoSQL DBMS approaches by migrating source data from a PostgreSQL RDBMS to Cassandra, a column-based NoSQL system. Aniceto et al 38 extended this research by including MongoDB, a Document-Oriented NoSQL, comparing both Cassandra and MongoDB regarding PostgreSQL performance.…”
Section: Related Workmentioning
confidence: 99%
“…Two research studies were performed using the NoSQL Column-Oriented Cassandra database system for data provenance management using the PROV-Wf model. 37,38 The researchers analyzed the performance by running a Bioinformatics workflow. Ferreira et al 37 compared relational and NoSQL DBMS approaches by migrating source data from a PostgreSQL RDBMS to Cassandra, a column-based NoSQL system.…”
Section: Related Workmentioning
confidence: 99%
“…Data are stored in the database as a hash Compression of such data can be done efficiently in column-oriented databases. Genomic data generated from sequencing experiments can be stored and analysed using these databases (Aniceto et al, 2015).…”
Section: Horizontal Scaling Techniquesmentioning
confidence: 99%