NUMA obliviousness through memory mapping

Gawade, Mrunal; Kersten, Martin

doi:10.1145/2771937.2771948

Cited by 6 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A recent prototype of SAP Hana implements an adaptive algorithm to decide between data placement and thread stealing when load imbalance is detected [5] (Li et al [20] presented a similar tradeoff strategy). However, the number of threads changes based on the core utilization opposed to MonetDB [24] and SQL Server [6] that bound the number of worker threads and maximum data partitions to the number of cores available per socket. MonetDB and SQL Server implement similar vector structures internally (i.e., BAT) to boost parallel access to disjoint partitions of columns (discussed in Section II).…”

Section: Related Workmentioning

confidence: 99%

An Elastic Multi-Core Allocation Mechanism for Database Systems

Dominico¹,

Almeida²,

Meira³

et al. 2018

2018 IEEE 34th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

An Elastic Multi-Core Allocation Mechanism for Database Systems

Dominico¹,

Almeida²,

Meira³

et al. 2018

2018 IEEE 34th International Conference on Data Engineering (ICDE)

View full text Add to dashboard Cite

“…MCC-DB classi es queries in cache-sensitive and cacheinsensitive to feed the query execution scheduler. Similar to our mechanism, in [5] NUMA cores are allocated one by one to mitigate access to memory banks in distant nodes when the OS tries to keep data locality. However, these approaches are intrusive requiring modi cations in the source-code of the DBMS (e.g., MonetDB and PostgreSQL).…”

Section: Related Workmentioning

confidence: 99%

A PetriNet mechanism for OLAP in NUMA

Dominico

Almeida

Meira

2017

Proceedings of the 13th International Workshop on Data Management on New Hardware

View full text Add to dashboard Cite

In the parallel execution of queries in Non-Uniform Memory Access (NUMA), the operating system maps database processes/threads (i.e., workers) to the available cores across the NUMA nodes. However, this mapping results in poor cache activity with many minor page faults and slower query response time when workers and data are allocated in di erent NUMA nodes. e system needs to move large volumes of data around the NUMA nodes to catch up with the running workers. Our hypothesis is that we mitigate the data movement to boost cache hits and response time if we only hand out to the system the local optimum number of cores instead of all the available ones. In this paper we present a PetriNet mechanism that represents the load of the database workers for dynamically computing and allocating the local optimum number of CPU cores to tackle such load. Preliminary results show that data movement diminishes with the local optimum number of CPU cores. CCS CONCEPTS•Computer systems organization →Multicore architectures; •Information systems →Data management systems; KEYWORDS Multi-core CPUs; OLAP; Abstract Model; NUMA

show abstract

“…Figure 4b shows that when the same experiment is repeated on the 4 socket NUMA machine on a 100GB dataset, the results are quite different. No explicit NUMA aware data partitioning is used as MonetDB uses memory mapped storage [13]. Execution with up to 48 threads uses the physical threads (12 threads on each socket with numactl [2] process and memory affinity), whereas 72 and 96 threaded execution also uses the hyper-threads.…”

Section: Socket Numamentioning

confidence: 99%

Multi-core column-store parallelization under concurrent workload

Gawade

Kersten

Simitsis

2016

Proceedings of the 12th International Workshop on Data Management on New Hardware

Self Cite

View full text Add to dashboard Cite

Columnar database systems, designed for an optimal OLAP workload performance, strive for maximum multi-core utilization under concurrent query executions. However, multicore parallel plan generated for isolated execution leads to suboptimal performance during concurrent query execution. In this paper, we analyze the concurrent workload resource contention effects on multi-core plans using three intra-query parallelization techniques, static, adaptive, and cost model parallelization. We focus on a plan level comparison of selected TPC-H queries, using in-memory multicore columnar systems. Excessive partitions in statically parallelized plans result into heavy L3 cache misses leading to memory contention, degrading query performance severely. Overall, adaptive plans show more robustness, less scheduling overheads, and an average 50% execution time improvement compared to statically parallelized plans, and cost model based plans.

show abstract

NUMA obliviousness through memory mapping

Cited by 6 publications

References 11 publications

An Elastic Multi-Core Allocation Mechanism for Database Systems

An Elastic Multi-Core Allocation Mechanism for Database Systems

A PetriNet mechanism for OLAP in NUMA

Multi-core column-store parallelization under concurrent workload

Contact Info

Product

Resources

About