2011 IEEE 27th International Conference on Data Engineering 2011
DOI: 10.1109/icde.2011.5767933
|View full text |Cite
|
Sign up to set email alerts
|

RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems

Abstract: Abstract-MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on our observations and analysis of Facebook production systems, we have characterized four requirements for the data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
112
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 220 publications
(112 citation statements)
references
References 15 publications
0
112
0
Order By: Relevance
“…Additionally, there has been extensive research into the performance and compression benefits of column-oriented storage within databases as noted in the related work. For Hive, RCFile's primary benefit is due to the additional compression that can be applied, rather than any performance benefit in lazily materializing rows [22].…”
Section: Columnarmentioning
confidence: 99%
“…Additionally, there has been extensive research into the performance and compression benefits of column-oriented storage within databases as noted in the related work. For Hive, RCFile's primary benefit is due to the additional compression that can be applied, rather than any performance benefit in lazily materializing rows [22].…”
Section: Columnarmentioning
confidence: 99%
“…It can use less space for storage to reduce the disk I/O requirements, and at the same time it has more advantages in terms of the network transmission. Similar to RCFile [13], in the spatio-temporal grid index, we will store the data of the same period of time in column and will store the data of the different periods of time in row. In the QaDTree index, we will write the MBC data of leaf nodes according to the column into data block.…”
Section: The Storage Structure Of Spatio-temporal Datamentioning
confidence: 99%
“…With the help of this technique they could significantly improve the query execution of range selection, update and OLAP queries. Other solutions like the RCFile approach [11] adopt the idea of the PAX system, but optimize the storage for specific use-cases. In the case of RCFile the data placement is optimized for data warehouses that follow the map reduce scheme.…”
Section: Related Workmentioning
confidence: 99%