2016
DOI: 10.14778/2994509.2994513
|View full text |Cite
|
Sign up to set email alerts
|

Lifetime-based memory management for distributed data processing systems

Abstract: In-memory caching of intermediate data and eager combining of data in shuffle buffers have been shown to be very effective in minimizing the re-computation and I/O cost in distributed data processing systems like Spark and Flink. However, it has also been widely reported that these techniques would create a large amount of long-living data objects in the heap, which may quickly saturate the garbage collector, especially when handling a large dataset, and hence would limit the scalability of the system. To elim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
22
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 43 publications
(23 citation statements)
references
References 23 publications
1
22
0
Order By: Relevance
“…Therefore, the collector can avoid periodically tracing these objects at each young/full GC. This method is similar to but more lightweight than the lifetime-based memory management proposed in Deca [42], which uses multiple self-defined data containers (byte arrays) to manage the objects with different lifetimes. The key advantage of our proposed approach is that it does not need to determine the number of allocated data containers, merge the data objects into different lifetimes, nor manually reclaim the objects.…”
Section: Lessons and Insightsmentioning
confidence: 99%
See 3 more Smart Citations
“…Therefore, the collector can avoid periodically tracing these objects at each young/full GC. This method is similar to but more lightweight than the lifetime-based memory management proposed in Deca [42], which uses multiple self-defined data containers (byte arrays) to manage the objects with different lifetimes. The key advantage of our proposed approach is that it does not need to determine the number of allocated data containers, merge the data objects into different lifetimes, nor manually reclaim the objects.…”
Section: Lessons and Insightsmentioning
confidence: 99%
“…Framework memory management optimization. Researchers have proposed memory configuration tuning strategies [52], regionbased or lifetime-based memory management [33,37,43,45,42] for improving memory utilization and GC optimization. MemTune [52] dynamically adjusts the data cache size and cache policy based on memory usage statistics.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…There are some solutions based on off-heap memory [41,44,46] (i.e., allocating memory for the application outside the GCmanaged heap). While this is an effective approach to allocate and keep data out of the range of the GC (and therefore, reducing object copying), it has several important drawbacks: i) off-heap data needs to be serialized to be saved in off-heap memory, and de-serialized before being used by the application (this obviously has performance overheads); ii) off-heap memory must be explicitly collected by the application developer (which is error prone [11,14] and completely ignores the advantages of running inside Figure 1: Allocation of Objects in Different Generations a memory managed environment); iii) the application must always have objects identifying the data stored in off-heap (these so called header objects are stored in the managed heap therefore stressing the GC).…”
Section: Off-heap Based Solutionsmentioning
confidence: 99%