Choosing the right NoSQL database for the job: a quality attribute evaluation

Lourenço, João Ricardo; Cabral, Bruno; Carreiro, Paulo; Vieira, Marco; Bernardino, Jorge

doi:10.1186/s40537-015-0025-0

Cited by 92 publications

(41 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…(c) Different technologies have been developed for stream processing of data [32]. Technology selection for a business case may require extensive evaluation, before an optimal decision can be made [56,57]. This study aims to facilitate decision making process of technology selection by providing new information regarding feasibility.…”

Section: Research Design and Methodologymentioning

confidence: 99%

Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing

Pääkkönen

2016

J Big Data

View full text Add to dashboard Cite

IntroductionStreaming data is increasingly important for online services. Facebook [1] and LinkedIn [2] have analysed event related data for understanding usage in their ecosystems. Twitter has created a big data streaming architecture, which is able to serve and process thousands of tweets in a second [3][4][5]. Also, several systems [6], methods [7,8], and benchmarking tools [9][10][11] have been created for facilitating implementation of tweet related processing and analysis by 3rd parties. Especially, new stream processing technologies (Spark [12], AsterixDB [13]) have been created, which could be selected for implementation of stream extraction, storage and analysis functionalities. Although stream processing performance has been studied [12,14], comparative feasibility analysis of the technologies has not been extensively performed in the context of semi-structured data processing. AbstractFor getting up-to-date insight into online services, extracted data has to be processed in near real time. For example, major big data companies (Facebook, LinkedIn, Twitter) analyse streaming data for development of new services. Several technologies have been developed, which could be selected for implementation of stream processing functionalities. The contribution of this paper is feasibility analysis of technologies for stream-based processing of semi-structured data. Particularly, feasibility of a Big Data management system for semi-structured data (AsterixDB) will be compared to Spark streaming, which has been integrated with Cassandra NoSQL database for persistence. The study focuses on stream processing in a simulated social media use case (tweet analysis), which has been implemented to Eucalyptus cloud computing environment on a distributed shared memory multiprocessor platform. The results indicate that AsterixDB is able to provide significantly better performance both in terms of throughput and latency, when data feed functionality of AsterixDB is used, and stream processing has been implemented with Java. AsterixDB also scaled on the same level or better, when the amount of nodes on the cloud platform was increased. However, stream processing in AsterixDB was delayed by batching of data, when tweets were streamed into the database with data feeds. Pääkkönen J Big Data (2016) Big Data (2016) 3:6 This article focuses on performance analysis of Spark streaming, Cassandra, and AsterixDB technologies for stream processing of semi-structured social media data (tweets). Especially, Spark streaming has been integrated with Cassandra for data persistence, which has been compared to AsterixDB on Eucalyptus cloud environment on a DSM multiprocessor platform. The results indicated that AsterixDB achieved significantly higher throughput and lower latency, when data feeds were utilized and stream processing was implemented with Java. Performance of AsterixDB also scaled better, when the amount of nodes on the cloud platform was increased. However, stream processing in AsterixDB was delayed by batching of streamed tw...

show abstract

Section: Research Design and Methodologymentioning

confidence: 99%

Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing

Pääkkönen

2016

J Big Data

View full text Add to dashboard Cite

show abstract

“…Dalam kinerja dan eksekusi berbagai jenis operasi, database NoSQL terbagi menjadi dua kategori. Yaitu read dan write yang dioptimasi [5]. Performa yang lebih cepat dikarenakan NoSQL database saat melakukan penyimpanan data tidak memerlukan pengecakan struktur data maupun primary key dan foreign key.…”

Section: Performaunclassified

“…Sistem ini bergantung pada skalabilitas horizontal dan "elastis", dengan menambahkan lebih banyak node ke sistem daripada mengupgrade perangkat keras. Istilah "elastis" mengacu pada elastisitas, yang merupakan karakterisasi dari cara cluster bereaksi terhadap penambahan atau penghilangan node [5].…”

Section: E Skalabilitasunclassified

Perbandingan Kemampuan Database NoSQL dan SQL dalam Kasus ERP Retail

Bhaswara

Sarno

Sunaryono

2017

JTITS

View full text Add to dashboard Cite

A510Abstrak-Makalah ini akan membahas dua jenis database, yaitu jenis Relational (SQL) dan Non-Relational (NOSQL). Mengingat dewasa ini para developer berlomba untuk memberikan database yang tepat dengan performa yang terbaik untuk aplikasinya. Kemudian makalah ini memberikan analisa perbandingan database NoSQL dengan SQL dalam hal kinerja, fleksibilitas, dan skalabilitas. Setelah terbukti database mana yang tepat, maka akan diterapkan aplikasi ERP Retail dengan yang berorientasikan multitenancy dengan tujuan agar aplikasi memiliki performa bagus, fleksibel dalam penyimpanan data, juga mendukung penyimpanan data yang terus berkembang seiring berjalannya waktu. Dalam uji coba yang telah dilakukan, database NoSQL terbukti memiliki kecepatan penyimpanan data yang lebih unggul dalam hal CRUD daripada SQL. Juga memiliki struktur penyimpanan data yang fleksibel karena model data berupa BSON (Binary JSON). Dan memiliki kemampuan untuk menjadi scalable dengan metode sharding. Jadi dalam hal ini database NoSQL akan lebih baik untuk diterapkan pada ERP Retail. Kata Kunci-ERP (Enterprise Resource Planning), Retail, Multitenancy, NoSQL

show abstract

“…simple hash tables) and graph databases (which are ideal for situations that are modeled as graph problems). Considering the last presented fact a comparison is presented at (Lourenço et al, 2015), where several DBMS are classified in a 5-point scale (Great, good, average, mediocre and bad) regarding a set of quality attributes.…”

Section: Nosql -Not Only Sqlmentioning

confidence: 99%

Real Time Analytics for Characterizing the Computer User's State

Carneiro

Araújo

Pimenta

et al. 2016

ADCAIJ

View full text Add to dashboard Cite

In the last years, the amount of devices that can be connected to a network grew significantly allowing to, among other tasks, collect data about the environment or the people in it in a non-intrusive way. This generated nowadays well-known topics such as Big Data or the Internet of Things. This also opened the door to the development of novel and interesting applications. In this paper we propose a distributed system for acquiring data about the users of technological devices in a non-intrusive way. We describe how this data can be collected and transformed to produce meaningful interaction features, that reveal the state of the individuals. We analyse the requirements of such a system, namely in terms of storage and speed, and describe three prototypes currently being used in three different domains of application.

show abstract

Choosing the right NoSQL database for the job: a quality attribute evaluation

Cited by 92 publications

References 55 publications

Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing

Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing

Perbandingan Kemampuan Database NoSQL dan SQL dalam Kasus ERP Retail

Real Time Analytics for Characterizing the Computer User's State

Contact Info

Product

Resources

About