Atrak: a MapReduce-based data warehouse for big data

Barkhordari, Mohammadhossein; Niamanesh, Mahdi

doi:10.1007/s11227-017-2037-3

Cited by 10 publications

(9 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data format unification can be applied to problems in other fields. This method can be used in data warehouse like Aras and Atrak methods [22,23], graph processing [24], integrating multidimensional data sources [25] and specific problems like finding patient similarity [26]. For future works, this method can also be used for interactive query processing, online data mining and stream processing.…”

Section: Resultsmentioning

confidence: 99%

Kavosh: an effective Map-Reduce-based association rule mining method

Barkhordari

Niamanesh

2018

J Big Data

Self Cite

View full text Add to dashboard Cite

Section: Resultsmentioning

confidence: 99%

Kavosh: an effective Map-Reduce-based association rule mining method

Barkhordari

Niamanesh

2018

J Big Data

Self Cite

View full text Add to dashboard Cite

“…Historical data is stored in warehouse system in huge volume of data comparing to OLTP, Online Transaction Processing [8]. Historical data can be useful in helping to predict the future when conducting predictive analyses [43].…”

Section: Historical Databasementioning

confidence: 99%

“…Data is managed through several ways to ensure data integrity and its consistency considering its fault tolerance, so several NewSQL databases provide horizontal and vertical scalability to ensure previous features and others. Horizontal scaling concerns of increasing the commodity nodes (hardware) whereas vertical scaling considers of the CPU and RAM power enhancement into current nodes/commodities [8]. Figure 12: Taxonomy 4 reveals mechanism and technologies categories sup-ported in NewSQL.…”

Section: Horizontal and Vertical Scalability Of Distributed System Gementioning

confidence: 99%

See 1 more Smart Citation

Top NewSQL Databases and Features Classification

Almassabi¹,

Bawazeer²,

Adam³

2018

IJDMS

View full text Add to dashboard Cite

Versatility of NewSQL databases is to achieve low latency constrains as well as to reduce cost commodity nodes. Out work emphasize on how big data is addressed through top NewSQL databases considering their features. This NewSQL databases paper conveys some of the top NewSQL databases [54] features collection considering high demand and usage. First part, around 11 NewSQL databases have been investigated for eliciting, comparing and examining their features so that they might assist to observe high hierarchy of NewSQL databases and to reveal their similarities and their differences. Our taxonomy involves four types categories in terms of how NewSQL databases handle, and process big data considering technologies are offered or supported. Advantages and disadvantages are conveyed in this survey for each of NewSQL databases. At second part, we register our findings based on several categories and aspects: first, by our first taxonomy which sees features characteristics are either functional or non-functional. A second taxonomy moved into another aspect regarding data integrity and data manipulation; we found data features classified based on supervised, semi-supervised, or unsupervised. Third taxonomy was about how diverse each single NewSQL database can deal with different types of databases. Surprisingly, Not only do NewSQL databases process regular (raw) data, but also they are stringent enough to afford diverse type of data such as historical and vertical distributed system, real-time, streaming, and timestamp databases. Thereby we release NewSQL databases are significant enough to survive and associate with other technologies to support other database types such as NoSQL, traditional, distributed system, and semirelationship to be as our fourth taxonomy-based. We strive to visualize our results for the former categories and the latter using chart graph. Eventually, NewSQL databases motivate us to analyze its big data throughput and we could classify them into good data or bad data. We conclude this paper with couple suggestions in how to manage big data using Predictable Analytics and other techniques.

show abstract

“…Flink supports various concepts in time-based windows such as event-based processing, timebased processing, and row count-based processing. Aras [40], Atrak [41] and Hengam [42] use data unification and in-Memory database to achieve higher performance on data warehouse query execution.…”

Section: Related Workmentioning

confidence: 99%

Chabok: a Map-Reduce based method to solve data warehouse problems

Barkhordari

Niamanesh

2018

J Big Data

Self Cite

View full text Add to dashboard Cite

Existing information is a valuable asset for many different types of organizations. Storing and analysing information can solve many problems within an organization [1]. The results from data analyses help organizations make correct decisions and provide better services for customers. Thus, high speed storage and retrieval of large volumes of data generated by electronic devices and software systems are critical issues [2-4]. Many organizations consider big data solutions because they cannot manage their data with traditional database management systems [5]; therefore, they must seek drastic measures for the design and implementation of new systems according to big data architectures. These organizations must change their architectures from Abstract Currently, immense quantities of data cannot be managed by traditional database management systems. Instead, they must be managed by big data solutions using shared nothing architectures. Data warehouse systems are systems that address very large amounts of information. The most prominent data warehouse model is star schema, which consists of a fact table and some number of dimension tables. It is necessary to join the facts and dimensions for query executions on the data warehouse. In shared nothing architecture, all of the required information is not placed on a single node so it is necessary to retrieve information from other nodes, which causes network congestion and low speeds of query execution. To avoid this problem and achieve maximum parallelism, dimensions can be replicated over nodes if they are not too large. However, if there are dimensions with data volumes greater than the capacity of a node or dimensions where the data volume summation exceeds node capacity, the query execution is confronted with serious problems. In big data problems, the amount of data is immense, and thus replicating immense data cannot be considered an appropriate method. In this paper, we propose a method called Chabok, which uses two-phased Map-Reduce to solve the data warehouse problem. In this method, aggregation is performed completely on Mappers, and intermediate results are sent to the Reducer. Chabok does not need data replication for join omission. The proposed method was implemented on Hadoop, and TPC-DS queries were executed for benchmarking. The query execution time on Chabok surpassed prominent big data products for data warehousing.

show abstract

Atrak: a MapReduce-based data warehouse for big data

Cited by 10 publications

References 23 publications

Kavosh: an effective Map-Reduce-based association rule mining method

Kavosh: an effective Map-Reduce-based association rule mining method

Top NewSQL Databases and Features Classification

Chabok: a Map-Reduce based method to solve data warehouse problems

Contact Info

Product

Resources

About