2020 Fourth International Conference on I-Smac (IoT in Social, Mobile, Analytics and Cloud) (I-Smac) 2020
DOI: 10.1109/i-smac49090.2020.9243418
|View full text |Cite
|
Sign up to set email alerts
|

Performance Analysis of Small Files in HDFS using Clustering Small Files based on Centroid Algorithm

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 9 publications
0
3
0
Order By: Relevance
“…N. Alange et al [26] compared various existing techniques for small file problem based on the performance throughput. The CSFC technique, developed by R. Rathidevi et al [27], involves grouping together small files that are connected in some way. HDFS is used for additional processing of the merged small files after clustering.…”
Section: Related Workmentioning
confidence: 99%
“…N. Alange et al [26] compared various existing techniques for small file problem based on the performance throughput. The CSFC technique, developed by R. Rathidevi et al [27], involves grouping together small files that are connected in some way. HDFS is used for additional processing of the merged small files after clustering.…”
Section: Related Workmentioning
confidence: 99%
“…Li et al [21] proposed a virtual file pool technology, which uses a memory-mapped file method to achieve an efficient data reading process and provides a data exchange strategy in the pool. Rathidevi and Parameswari [22] proposed a centroid-based clustering of the small files method, which placed related files in clusters to improve data access performance. Xiang et al [23] proposed a Hadoop distributed file system storage strategy based on the response time of data nodes, and the simulation results show that this strategy can realize the distributed storage of big data and avoid the emergence of hot nodes.…”
Section: Medical Big Data Storage and Accessmentioning
confidence: 99%
“…The authors have also given an idea about the tools used for Big Data storage and processing along with the algorithms used in the Big Data processing. Hadoop, HDFS [3], CDH [4], Mon-goDB [5], Apache spark [6], Apache Solr [7], [8],Alteryx [9] Designer, Data Meer [9], Google Big Query [9] etc are the tools used to store process and analyze the Big Data and Support Vector Machine [10], Neural Network, Logistic regression [11], Linear Regression [11], Nearest Neighbor, Decision tree, Naive Bayes are the Algorithms used in healthcare analysis.…”
Section: Introductionmentioning
confidence: 99%