Best of both latency and throughput

Grochowski, Ed; Ronen, Ronny; Shen, John Paul; Wang, Hong

doi:10.1109/iccd.2004.1347928

Cited by 91 publications

(69 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, parallel phases can be executed on numerous processors in parallel. Therefore, the lowest execution time for the parallel phases is achieved by executing them on many simple processors that consume less energy per instruction (EPI) [7]. We claim that a choice of symmetric cores is suboptimal due to the contradicting requirements of the serial and parallel phases within the same application.…”

mentioning

confidence: 99%

“…Kumar et al [9] have shown how a heterogeneous multiprocessor could achieve similar performance to a homogeneous multiprocessor for less power and area. Grochowski et al [2], [7] have proposed and demonstrated an asymmetric multiprocessor by employing voltage and frequency scaling on a symmetric multiprocessor. Menasce et al [10] have shown the analytic benefit of heterogeneous systems using queuing models.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors

Morad

Weiser

Kolodnyt³

et al. 2006

IEEE Comput. Arch. Lett.

123

View full text Add to dashboard Cite

Abstract-This paper evaluates asymmetric cluster chip multiprocessor (ACCMP) architectures as a mechanism to achieve the highest performance for a given power budget. ACCMPs execute serial phases of multithreaded programs on large high-performance cores whereas parallel phases are executed on a mix of large and many small simple cores. Theoretical analysis reveals a performance upper bound for symmetric multiprocessors, which is surpassed by asymmetric configurations at certain power ranges. Our emulations show that asymmetric multiprocessors can reduce power consumption by more than two thirds with similar performance compared to symmetric multiprocessors.

show abstract

mentioning

confidence: 99%

mentioning

confidence: 99%

Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors

Morad

Weiser

Kolodnyt³

et al. 2006

IEEE Comput. Arch. Lett.

123

View full text Add to dashboard Cite

show abstract

“…This transition between LQS to GQS is related to tradeoffs between the signaling network's latency, or speed of activation, and its throughput, or the total spatial range over which all the components of the system communicate [32]. Communities in the LQS regime have a reduced time to activation, but are restricted to shortrange communication.…”

Section: Fig 4: (A)mentioning

confidence: 99%

Spatial dispersal of bacterial colonies induces a dynamical transition from local to global quorum sensing

Yusufaly

Boedicker

2016

Phys. Rev. E

View full text Add to dashboard Cite

Bacteria communicate using external chemical signals called autoinducers (AI) in a process known as quorum sensing (QS). QS efficiency is reduced by both limitations of AI diffusion and potential interference from neighboring strains. There is thus a need for predictive theories of how spatial community structure shapes information processing in complex microbial ecosystems. As a step in this direction, we apply a reaction-diffusion model to study autoinducer signaling dynamics in a single-species community as a function of the spatial distribution of colonies in the system. We predict a dynamical transition between a local quorum sensing (LQS) regime, with the AI signaling dynamics primarily controlled by the local population densities of individual colonies, and a global quorum sensing (GQS) regime, with the dynamics being dependent on collective intercolony diffusive interactions. The crossover between LQS to GQS is intimately connected to a tradeoff between the signaling network's latency, or speed of activation, and its throughput, or the total spatial range over which all the components of the system communicate.Multicellular communities, such as colonies of bacteria, communicate with each other to coordinate changes in their collective group behavior. This communication usually takes the form of the production and secretion of extracellular signaling molecules called autoinducers (AI), as illustrated in Figure 1. Released autoinducers diffuse through the environment, and each cell senses the local concentration of signal to inform changes in gene regulation. This intercellular signaling network, known as quorum sensing (QS), is crucial for a wide array of important microbial processes, including biofilm formation, regulation of virulence and horizontal gene transfer [1][2][3].Decades of research have advanced our knowledge of QS, but several subtleties remain unresolved. In particular, AI signals may convey information about many aspects of the cellular network and local environment beyond simply the total number of cells in the system [4]. Far from being reducible to homogeneous, uniform density populations, microbial communities are typically characterized by high spatial heterogeneity [5]. As a result, several new phenomena emerge due to crosstalk between spatially segregated populations [6]. Consequently, it appears that AI molecules can be an indicator of increased local population density, and can also be proxies of other variables, such as population dispersal [7][8][9][10][11].In recent years, advances in the ability to experimentally probe the properties of cellular populations at the single-cell level [12] have resulted in a growing community of theoretical physicists working to catalogue the different classes of collective behavior found in interacting communities of organisms [13][14][15][16][17]. This approach has already successfully yielded insight into a wide variety of ecological problems, with notable recent examples including the effects of invasion in cooperative populations [18], opti...

show abstract

“…Past works have also used atomicity in other ways to (i) simplify the core microarchitecture, (ii) enable better scalability and/or performance, or (iii) enable optimizations or other code transformations. Heterogeneous cores: Several works have examined the use of either multiple heterogeneous cores [4,56,22,20,7,12], one core with multiple heterogeneous backends [35], or a core with variable parameters [5,22] in order to adapt to the running application at a coarse granularity for better energy efficiency. We quantitatively compared to a coarse-grained heterogeneous approach [35] in §5 and showed that although coarse-grained designs can achieve good energy-efficiency, HBA does better by exploiting much finer-grained heterogeneity.…”

Section: Related Workmentioning

confidence: 99%

“…To exploit this diversity, past works proposed core-level heterogeneity. These heterogeneous designs either combine multiple separate cores (e.g., [29,22,3,20,53,7,12,26,4,56]), or else combine an inorder pipeline and out-of-order pipeline with a shared frontend in a single core [35]. Past works demonstrate energy-efficiency improvements with usually small impact to performance.…”

Section: Introductionmentioning

confidence: 99%

The heterogeneous block architecture

Fallin¹,

Wilkerson

Mutlu³

2014

2014 IEEE 32nd International Conference on Computer Design (ICCD)

View full text Add to dashboard Cite

This paper makes two new observations that lead to a new heterogeneous core design. First, we observe that most serial code exhibits fine-grained heterogeneity: at the scale of tens or hundreds of instructions, regions of code fit different microarchitectures better (at the same point or at different points in time). Second, we observe that by grouping contiguous regions of instructions into blocks that are executed atomically, a core can exploit this heterogeneity: atomicity allows each block to be executed independently on its own execution backend that fits its characteristics best.Based on these observations, we propose a fine-grained heterogeneous design that combines heterogeneous execution backends into one core. Our core design, the heterogeneous block architecture (HBA), breaks the program into blocks of code, determines the best backend for each block, and specializes the block for that backend. As an initial, concrete design, we combine out-of-order, VLIW, and in-order backends, using simple heuristics to choose backends. We compare HBA to multiple baseline core designs (including monolithic out-of-order, clustered out-of-order, in-order and a state-of-the-art heterogeneous core design) and show that HBA can provide significantly better energy efficiency than all designs at similar performance. Averaged across 184 traces from a wide variety of workloads, HBA reduces core power by 36.4% and energy per instruction by 31.9% compared to a 4-wide out-of-order core. We conclude that HBA provides a flexible substrate for exploiting fine-grained heterogeneity, enabling new energy-performance tradeoff points in core design.

show abstract

Best of both latency and throughput

Abstract: Abstract

Cited by 91 publications

References 27 publications

Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors

Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors

Spatial dispersal of bacterial colonies induces a dynamical transition from local to global quorum sensing

The heterogeneous block architecture

Contact Info

Product

Resources

About