2016
DOI: 10.5120/ijca2016910611
|View full text |Cite
|
Sign up to set email alerts
|

An Efficient Approach for Storing and Accessing Small Files with Big Data Technology

Abstract: Hadoop is an open source Apache project and a software framework for distributed processing of large datasets across large clusters of computers with commodity hardware. Large datasets include terabytes or petabytes of data where as large clusters means hundreds or thousands of nodes. It supports master slave architecture, which involves one master node and thousands of slave nodes. NameNode acts as the master node which stores all the metadata of files and various data nodes are slave nodes which stores all t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 7 publications
0
8
0
Order By: Relevance
“…The metadata file, contains the meta information for the L-Store, the connector file contains index entries for mapping identifiers to identifiers, and the data file contains the medical fragment, corresponding to each identifier.In this way we can store a large amount of data in relatively smaller number of files. This strategy enables the most preferred way of data processing using MapReduce operations, with small number of large sized files [ 27 , 28 ]. As a result of this process, the UHPr is able to achieve transactional consistency.…”
Section: Ubiquitous Health Profile (Uhpr)mentioning
confidence: 99%
“…The metadata file, contains the meta information for the L-Store, the connector file contains index entries for mapping identifiers to identifiers, and the data file contains the medical fragment, corresponding to each identifier.In this way we can store a large amount of data in relatively smaller number of files. This strategy enables the most preferred way of data processing using MapReduce operations, with small number of large sized files [ 27 , 28 ]. As a result of this process, the UHPr is able to achieve transactional consistency.…”
Section: Ubiquitous Health Profile (Uhpr)mentioning
confidence: 99%
“…HAR is an excellent exit to avoid Namenode metadata jam, HAR file gives an option to access the files directly inside it as well as the creation steps done in easy commands, but on the other side, as a shortage, HAR cannot be altered after being created, cannot add more files, or delete some unwanted files from it. The bumpiest shortage in HAR is every file inside it requires 2 index files (Master index, Index) to read [23], that's mean reading a file from HDFS it self is much easier than reading from HAR file. Another limitation of HAR is the memory; HAR files puts extra pressure on the file system due to generate a copy of the original files and takes space as much as they need [23].…”
Section: Solutionsmentioning
confidence: 99%
“…The bumpiest shortage in HAR is every file inside it requires 2 index files (Master index, Index) to read [23], that's mean reading a file from HDFS it self is much easier than reading from HAR file. Another limitation of HAR is the memory; HAR files puts extra pressure on the file system due to generate a copy of the original files and takes space as much as they need [23]. -nHAR file: new Hadoop Archive is a correction of HAR file, its almost the same story but with some differences in architecture, first different is nHAR file needs only one index file to read, the second different is nHAR can be edit, you can add more files to the archive after create it.…”
Section: Solutionsmentioning
confidence: 99%
See 1 more Smart Citation
“…The problem in small file storage are creating an indices [10]. The small files are formed as clusters .…”
Section: A Techniques For Managing Small Files In Hadoopmentioning
confidence: 99%