Handling Data Skew in MapReduce Cluster by Using Partition Tuning

Gao, Yufei; Zhou, Yanjie; Zhou, Bing; Shi, Lei; Zhang, Jiacai

doi:10.1155/2017/1425102

Cited by 23 publications

(7 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Execute the Map Reduce jobs on dissimilar datasets, and work out the mean total percentage error (MAPE) of all partitions in every state. The MAPE is defined as follows Where N is the amount of reduce tasks in a job, and are the predicted and calculated value of partition size of reduce task i, respectively [18]. …”

Section: Resultsmentioning

confidence: 99%

Enhancement on Traffic Aware Partition in Mapreduce Using Clustering Techniques

Sindhupriya

2017

ijarcs

View full text Add to dashboard Cite

Abstract:The Map reduce is a programming model for handling and processing the huge datasets using map and reduce tasks in parallel distributing. To increase the execution of map reduce many number of activities have been made, but they ignore to deal with network traffic produced in shuffle stage. The existing map reduce traffic-aware partitions suffer from partition skew issue, where the output of map tasks is unevenly distributed among reduces tasks. Existing arrangements take after a comparative rule that repartitions workload among diminish undertakings. In any case, those methodologies frequently cause elite overhead because of the segment estimate expectation and repartitioning. The proposed work chooses dynamic data aware parallel with k-Means algorithm (DDAP-kM), a framework that provides dynamic partitioning skew reduction and clustering map reduce jobs. These works cope with partitioning skew by adjusting runtime resource allocation to reduce tasks. By the experimental results network traffic cost is compared in terms of traffic aware partition algorithm and DDAP-kM algorithm.

show abstract

Section: Resultsmentioning

confidence: 99%

Enhancement on Traffic Aware Partition in Mapreduce Using Clustering Techniques

Sindhupriya

2017

ijarcs

View full text Add to dashboard Cite

show abstract

“…In distributed execu tion, where each cluster of model elements is in a partition and localization, the model transformation processing can be minimized, with less network traffic overhead for sending data between executors (Worker Nodes). In both strategies there are open issues, such as data balancing (Le et al, 2014), data skew processing (Gao et al, 2017), and data locality (Jin et al, 2011) that need be contemplated in our approach.…”

Section: Executing Model Transformations Using Graphframementioning

confidence: 99%

Data-centric Model Transformation Approach using Model2GraphFrame Transformations

Camargo

Fabro

2021

JSERD

View full text Add to dashboard Cite

Data-centric (Dc) approaches are being used for data processing in several application domains, such as distributed systems, natural language processing, and others. There are different data processing frameworks that ease the task of parallel and distributed data processing. However, there are few research approaches studying on how to execute model manipulation operations, as model transformations models on such frameworks. In addition, it is often necessary to provide extraction of XMI-based formats into possibly distributed models. In this paper, we present a Model2GraphFrame operation to extract a model in a modeling technical space into the Apache Spark framework and its GraphFrame supported format. It generates GraphFrame from the input models, which can be used for partitioning and processing model operations. We used two model partitioning strategies: based on subgraphs, and clustering. The approach allows to perform model analysis applying operations on the generated graphs, as well as Model Transformations (MT). The proof of concept results such as model2GraphFrame, GraphFrame partitioning, GraphFrame connectivity, and GraphFrame model transformations indicate that our Model Extraction can be used in various application domains, since it enables the specification of analytical expressions on graphs. Furthermore, its model graph elements are used in model transformations on a scalable platform.

show abstract

“…In the past few years, the prevalence of big data has paved the way for applications of deep learning techniques [37], [38]. With the development of computational intelligence [39], deep learning has been successful in healthcare engineering and neuroscience, providing intelligent solutions with data volumes for significant neural image data processing and analytics. To overcome the limitation of traditional MVPA approaches and improve the performance of crosssubject decoding, Koyamada et al [13] introduced a feedforward deep neural network to classify different brain features representing of various tasks from fMRI data.…”

Section: B Deep Transfer Learningmentioning

confidence: 99%

Decoding Behavior Tasks From Brain Activity Using Deep Transfer Learning

et al. 2019

Self Cite

View full text Add to dashboard Cite

Recently, advances in noninvasive detection techniques have shown that it is possible to decode visual information from measurable brain activities. However, these studies typically focused on the mapping between neural activities and visual information, such as the image or video stimulus, on the individual level.Here, the common decoding models across individuals that classifying behavior tasks from brain signals were investigated. We proposed a cross-subject decoding approach using deep transfer learning (DTL) to decipher the behavior tasks from functional magnetic resonance imaging (fMRI) recording during subjects performing different tasks. We connected parts of the state-of-the-art networks pre-trained on the ImageNet dataset to our defined adaption layers to classify the behavior tasks from fMRI data. Our experiments on the Human Connectome Project (HCP) dataset showed that the proposed method achieved a higher decoding accuracy across subjects than the previous studies. We also conducted an experiment on five subsets of HCP data, which further demonstrated that our DTL approach is more effective on small dataset than the traditional methods.INDEX TERMS Neural decoding, functional magnetic resonance imaging, cross-subject, deep transfer learning.

show abstract

Handling Data Skew in MapReduce Cluster by Using Partition Tuning

Cited by 23 publications

References 19 publications

Enhancement on Traffic Aware Partition in Mapreduce Using Clustering Techniques

Enhancement on Traffic Aware Partition in Mapreduce Using Clustering Techniques

Data-centric Model Transformation Approach using Model2GraphFrame Transformations

Decoding Behavior Tasks From Brain Activity Using Deep Transfer Learning

Contact Info

Product

Resources

About