2021
DOI: 10.1515/jisys-2020-0113
|View full text |Cite
|
Sign up to set email alerts
|

Research on parallel data processing of data mining platform in the background of cloud computing

Abstract: The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the tradi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…Load balancing efficiency refers to how better a system can distribute tasks to the available resources to enhance the overall performance [25]. A higher acceleration ratio means the system would perform better in case of more cores or computers [26]. From Table 1, we can see that the evaluation indexes of this method are better than those of the two other methods we studied, and the performance and effectiveness of this method are high.…”
Section: Experimental Analysismentioning
confidence: 98%
“…Load balancing efficiency refers to how better a system can distribute tasks to the available resources to enhance the overall performance [25]. A higher acceleration ratio means the system would perform better in case of more cores or computers [26]. From Table 1, we can see that the evaluation indexes of this method are better than those of the two other methods we studied, and the performance and effectiveness of this method are high.…”
Section: Experimental Analysismentioning
confidence: 98%
“…In this (9) study, a data mining platform based on the Hadoop distributed file system was developed, and then the K-means algorithm was enhanced with the concept of max-min distance. This parallel processing can be described as a class of techniques that enable the system to attain concurrently data-processing tasks to increase the computational speed of a computer system, but it also increases the cost of computers because more hardware is required.…”
Section: Introductionmentioning
confidence: 99%