Lifetime-based memory management for distributed data processing systems

Lu, Lu; Shi, Xuanhua; Zhou, Yongluan; Zhang, Xiong; Jin, Hai; Peng, Cheng; He, Ligang; Geng, Yuanzhen

doi:10.14778/2994509.2994513

Cited by 43 publications

(23 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, the collector can avoid periodically tracing these objects at each young/full GC. This method is similar to but more lightweight than the lifetime-based memory management proposed in Deca [42], which uses multiple self-defined data containers (byte arrays) to manage the objects with different lifetimes. The key advantage of our proposed approach is that it does not need to determine the number of allocated data containers, merge the data objects into different lifetimes, nor manually reclaim the objects.…”

Section: Lessons and Insightsmentioning

confidence: 99%

“…Framework memory management optimization. Researchers have proposed memory configuration tuning strategies [52], regionbased or lifetime-based memory management [33,37,43,45,42] for improving memory utilization and GC optimization. MemTune [52] dynamically adjusts the data cache size and cache policy based on memory usage statistics.…”

Section: Related Workmentioning

confidence: 99%

“…Facade [45] proposes a compiler and runtime system to bound the number of in-memory data objects, through storing data in an off-heap region and manipulating the data with control interfaces. Deca [42] proposes a lifetime-based memory manager to reduce GC overhead, through analyzing the lifetimes of the data objects in user-defined functions and grouping the objects with similar lifetimes into byte arrays. Garbage collection optimization for big data applications.…”

Section: Related Workmentioning

confidence: 99%

“…MemTune [52] dynamically tunes the configuration for data cache to optimize the memory utilization. Facade [45], Deca [42], and Broom [37] propose region-based and lifetime-based memory managers to reduce the GC overhead. Recently, Yak [44] and NG2C [32] optimize garbage collection mechanisms, such as heap layout and GC algorithms.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

An experimental evaluation of garbage collectors on big data applications

Guo

Dou

et al. 2019

Proc. VLDB Endow.

View full text Add to dashboard Cite

Popular big data frameworks, ranging from Hadoop MapReduce to Spark, rely on garbage-collected languages, such as Java and Scala. Big data applications are especially sensitive to the effectiveness of garbage collection (i.e., GC), because they usually process a large volume of data objects that lead to heavy GC overhead. Lacking indepth understanding of GC performance has impeded performance improvement in big data applications. In this paper, we conduct the first comprehensive evaluation on three popular garbage collectors, i.e., Parallel, CMS, and G1, using four representative Spark applications. By thoroughly investigating the correlation between these big data applications' memory usage patterns and the collectors' GC patterns, we obtain many findings about GC inefficiencies. We further propose empirical guidelines for application developers, and insightful optimization strategies for designing big-datafriendly garbage collectors.

show abstract

Section: Lessons and Insightsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An experimental evaluation of garbage collectors on big data applications

Guo

Dou

et al. 2019

Proc. VLDB Endow.

View full text Add to dashboard Cite

show abstract

“…There are some solutions based on off-heap memory [41,44,46] (i.e., allocating memory for the application outside the GCmanaged heap). While this is an effective approach to allocate and keep data out of the range of the GC (and therefore, reducing object copying), it has several important drawbacks: i) off-heap data needs to be serialized to be saved in off-heap memory, and de-serialized before being used by the application (this obviously has performance overheads); ii) off-heap memory must be explicitly collected by the application developer (which is error prone [11,14] and completely ignores the advantages of running inside Figure 1: Allocation of Objects in Different Generations a memory managed environment); iii) the application must always have objects identifying the data stored in off-heap (these so called header objects are stored in the managed heap therefore stressing the GC).…”

Section: Off-heap Based Solutionsmentioning

confidence: 99%

NG2C: pretenuring garbage collection with dynamic generations for HotSpot big data applications

Bruno

Oliveira²,

Ferreira

2017

Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management

View full text Add to dashboard Cite

Big Data applications suffer from unpredictable and unacceptably high pause times due to Garbage Collection (GC). This is the case in latency-sensitive applications such as on-line credit-card fraud detection, graph-based computing for analysis on social networks, etc. Such pauses compromise latency requirements of the whole application stack and result from applications' aggressive buffering/caching of data, exposing an ill-suited GC design, which assumes that most objects will die young and does not consider that applications hold large amounts of middle-lived data in memory.To avoid such pauses, we propose NG2C, a new GC algorithm that combines pretenuring with an N-Generational heap. By being able to allocate objects into different generations, NG2C is able to group objects with similar lifetime profiles in the same generation. By allocating objects with similar lifetime profiles close to each other, i.e. in the same generation, we avoid object promotion (copying between generations) and heap fragmentation (which leads to heap compactions) both responsible for most of the duration of HotSpot GC pause times.NG2C is implemented for the OpenJDK 8 HotSpot Java Virtual Machine, as an extension of the Garbage First GC. We evaluate NG2C using Cassandra, Lucene, and GraphChi with three different GCs: Garbage First (G1), Concurrent Mark Sweep (CMS), and NG2C. Results show that NG2C decreases the worst observable GC pause time by up to 94.8% for Cassandra, 85.0% for Lucene and 96.45% for GraphChi, when compared to current collectors (G1 and CMS). In addition, NG2C has no negative impact on application throughput or memory usage.

show abstract