Comparative Analysis of Skew-Join Strategies for Large-Scale Datasets with MapReduce and Spark

In this paper, we first mine the interconnections between data in large-scale datasets through association rule models in machine learning and then perform T -time K-Means clustering on the mined datasets to realize large-scale data integration. On this basis, a classification prediction model based on an enhanced ChebNet model is proposed, which combines the efficient feature extraction capability of graph convolutional neural network and the accurate prediction advantage of big data analysis to effectively realize the processing of large-scale data sets. Taking the tobacco production monitoring data as an example, the model performs well in predicting the correlation of cigarette sensory indexes, especially when the sliding window size is 30 and the prediction jump step is 1. The model performance reaches the optimal, which provides strong support for the quality control of cigarette production, and is capable of processing large-scale datasets of tobacco production.

show abstract

Comparative Analysis of Skew-Join Strategies for Large-Scale Datasets with MapReduce and Spark

Cited by 3 publications

References 16 publications

How Big Data Analytical Framework has Redefined Academic Sciences

How Big Data Analytical Framework has Redefined Academic Sciences

RelJoin: Relative-cost-based selection of distributed join methods for query plan optimization

Research on machine learning based processing strategies for large-scale datasets

Contact Info

Product

Resources

About