A Distributed Stream Library for Java 8

Chan, Yu; Wellings, Andy; Gray, Ian; Audsley, Neil

doi:10.1109/tbdata.2017.2666201

Cited by 7 publications

(6 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Big data technologies such as mapReduce help in thriving agile businesses [24] and therefore the test-bed has been implemented for the mapReduce operation. Another reason for choosing the mapReduce is the Java Stream API which was introduced in the 8th version of Java and it simplifies the implementation of mapReduce operation [25]. This section explains the evaluation scenario along with the discussion on experimental results.…”

Section: Experimentation and Resultsmentioning

confidence: 99%

Reliable Data Analysis through Blockchain based Crowdsourcing in Mobile Ad-hoc Cloud

et al. 2019

View full text Add to dashboard Cite

Mobile Ad-hoc Cloud (MAC) is the constellation of nearby mobile devices to serve the heavy computational needs of the resource-constrained edge devices. One of the major challenges of MAC is to convince the mobile devices to offer their limited resources for the shared computational pool. Credit-based rewarding system is considered as an effective way of incentivizing the arbitrary mobile devices for joining the MAC network and to earn the credits through computational crowdsourcing. The next challenge is to get the reliable computation as incentives attract the malicious devices to submit fake computational results for claiming their reward and we have used the blockchain based reputation system for identifying the malicious participants of MAC. This paper presents a malicious node identification algorithm integrated within the Iroha based permissioned blockchain. Iroha is a project of hyperledger which is focused on mobile devices and thus lightweight in nature. It is used for keeping the track of rewarding and reputation system driven by the malicious node detection algorithm. Experiments are conducted for evaluating the implemented test-bed and results show the effectiveness of algorithm in identifying the malicious devices and conducting reliable data analysis through the blockchain based computational crowdsourcing in MAC.

show abstract

Section: Experimentation and Resultsmentioning

confidence: 99%

Reliable Data Analysis through Blockchain based Crowdsourcing in Mobile Ad-hoc Cloud

et al. 2019

View full text Add to dashboard Cite

show abstract

“…As per literature, set of solutions are available for handling big data at each level, 2,[6][7][8][11][12][13]46,47 including domain-specific fourth-generation languages, in-memory databases, data compression techniques, programming models, third party libraries, accelerator hardwares, and storage devices. 8,45,[48][49][50][51][52][53][54] Each time a big data problem emerges, a large-scale dedicated solution is considered for handling it, which requires huge computing and personnel resources, substantially raising the overall project expenditures. The higher layers provide more abstraction and usability at the cost of reduced overall performance, 8,55 thus big data solutions are frequently provided at the higher levels, ignoring the performance opportunities at lower layers as discussed below.…”

Section: Big Data Processing Stackmentioning

confidence: 99%

“…It is less abstracted in comparison to the application level, the design and management tasks are more difficult for end programmers. The progress is significant by means of domain‐specific programming models and third‐party libraries 8,45,50‐52 …”

Section: Background and Motivationmentioning

confidence: 99%

Toward a novel engine for compiler optimization space exploration of big data workloads

Ahmed

Ismail

2021

Softw Pract Exp

View full text Add to dashboard Cite

Recently, big data specific technologies have been emerging, including domain-specific languages, software frameworks, databases, third-party libraries, and so forth. These techniques are successful in concealing the low-level details by producing high-level code, which is passed through the conventional compilation cycle for generating hardware operable code. Several optimization opportunities exist in the compiler which can assist in meeting the processing deadlines of big data workloads, through optimized machine code. However, the existing iterative compilation techniques are not enough for the exploration of big data applications. In this regard, a novel engine has been presented for exploiting the compiler optimization space of big data workloads. The engine is comprised of training and testing phases. During the training stage, the big data application is optimized with Mitigates the Compiler Phase-ordering (MiCOMP) and genetic algorithm (GA) optimization sequences, which are executed with train datasets. In the testing stage, the test datasets are executed only for the best 300 optimization sequences discovered at the training stage. The proposed engine has been tested with graph mining, machine learning, and text search categories of big data applications using a wide range of real-world and synthetic datasets. Overall, the engine is 56.8×, 47×, and 9.8× faster than Iterative Optimization for the Data Center (IODC), MiCOMP, and GA respectively in exploiting the compiler search space for big data workloads. Further, the integration of best-10 and best-3 techniques with the engine brings a speedup of 5.9× and 7.8×. The compiler level exploitation of general-purpose machines incurs no extra overhead, no heavy computing, and no personnel cost. Also, the overall performance of big data specialized software solutions can be enhanced by compiling their high-level code with suitable compiler optimizations.

show abstract

“…It is less abstracted in comparison to the application level, the design and management tasks are more difficult for end programmers. The growth is significant by means of programming models and third party libraries [25]- [27].…”

Section: ) Middleware and Management Levelmentioning

confidence: 99%

“…The detection stage can automatically trigger the specialized 3Vs optimizations, only if the application is big data, otherwise routine processing is continued. The 3Vs optimizations can be incorporated at various levels of big data stack such as hardware (GPGPUs, FPGAs) [20], [21], compiler (garbage collection, parallelization, loop, type inferencing, data layout optimizations) [22]- [24], third party libraries (Hadoop, Spark, Flink, Storm) [25]- [27], databases (MongoDB, VoltDB) [20], etc. Several benefits of automatic big data detection include optimal computational resource utilization, selection of appropriate tools, triggering of relevant codes either for General Purpose Processors (GPPs) or varying accelerator architectures, minimal overhead & user intervention.…”

Section: Introductionmentioning

confidence: 99%

Towards a Novel Framework for Automatic Big Data Detection

Ahmed

Ismail

2020

IEEE Access

View full text Add to dashboard Cite

Big data is a "relative" concept. It is the combination of data, application, and platform properties. Recently, big data specific technologies have emerged, including software frameworks, databases, hardware accelerators, storage technologies, etc. However, the automatic selection of these solutions for big data computations remains a non-trivial task. Presently, the big data tools are selected by analyzing the problem manually, or by using several performance prediction techniques. The manual identification is based on the data properties only, whereas the performance predictors only estimate basic execution metrics without linking them with big data (3Vs) thresholds. Hence, both ways of identification are mostly incorrect, which can lead to inefficient use of 3Vs optimizations, resulting into global inefficiency, reduced system performance, increasing power consumption, requiring greater effort on the part of the programming team, and misallocation of the hardware resources required for the task. In this regard, a novel framework has been proposed for automatic detection of 3Vs (Volume, Velocity, Variety) of big data, using machine learning. The detection is done through static code features, data, and platform properties, leading to relevant tool selection, and code generation, with minimal overheads, lesser programmer interventions, higher usability, and portability. Instead of handling each application with big data specialized solutions, or manually identifying the 3Vs, the framework can automatically detect and link the 3Vs to the relevant optimizations. Several standard applications have been tested using the proposed framework. In the case of volume, the average detection accuracy is up to 97.8% for seen and 95.9% for unseen applications. In the case of velocity, the average detection accuracy is up to 97.3% for seen and 92.6% for unseen applications. There is no margin of error in variety detection, as it has straightforward computations without any predictions. Furthermore, an airline recommendation system case study strengthens the effectiveness of the proposed approach.

show abstract

A Distributed Stream Library for Java 8

Cited by 7 publications

References 36 publications

Reliable Data Analysis through Blockchain based Crowdsourcing in Mobile Ad-hoc Cloud

Reliable Data Analysis through Blockchain based Crowdsourcing in Mobile Ad-hoc Cloud

Toward a novel engine for compiler optimization space exploration of big data workloads

Towards a Novel Framework for Automatic Big Data Detection

Contact Info

Product

Resources

About