Sandya Subramanian scite author profile

Switches today provide a small menu of scheduling algorithms. While we can tweak scheduling parameters, we cannot modify algorithmic logic, or add a completely new algorithm, after the switch has been designed. This paper presents a design for a programmable packet scheduler, which allows scheduling algorithms-potentially algorithms that are unknown today-to be programmed into a switch without requiring hardware redesign.Our design uses the property that scheduling algorithms make two decisions: in what order to schedule packets and when to schedule them. Further, we observe that in many scheduling algorithms, definitive decisions on these two questions can be made when packets are enqueued. We use these observations to build a programmable scheduler using a single abstraction: the push-in first-out queue (PIFO), a priority queue that maintains the scheduling order or time.We show that a PIFO-based scheduler lets us program a wide variety of scheduling algorithms. We present a hardware design for this scheduler for a 64-port 10 Gbit/s sharedmemory (output-queued) switch. Our design costs an additional 4% in chip area. In return, it lets us program many sophisticated algorithms, such as a 5-level hierarchical scheduler with programmable decisions at each level.

show abstract

SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering

Daya

Chen

Subramanian

et al. 2014

View full text Add to dashboard Cite

In the many-core era, scalable coherence and on-chip interconnects are crucial for shared memory processors. While snoopy coherence is common in small multicore systems, directory-based coherence is the de facto choice for scalability to many cores, as snoopy relies on ordered interconnects which do not scale. However, directory-based coherence does not scale beyond tens of cores due to excessive directory area overhead or inaccurate sharer tracking. Prior techniques supporting ordering on arbitrary unordered networks are impractical for full multicore chip designs.We present SCORPIO, an ordered mesh Network-on-Chip (NoC) architecture with a separate fixed-latency, bufferless network to achieve distributed global ordering. Message delivery is decoupled from the ordering, allowing messages to arrive in any order and at any time, and still be correctly ordered. The architecture is designed to plug-and-play with existing multicore IP and with practicality, timing, area, and power as top concerns. Full-system 36 and 64-core simulations on SPLASH-2 and PARSEC benchmarks show an average application runtime reduction of 24.1% and 12.9%, in comparison to distributed directory and AMD HyperTransport coherence protocols, respectively.The SCORPIO architecture is incorporated in an 11 mm-by-13 mm chip prototype, fabricated in IBM 45nm SOI technology, comprising 36 Freescale e200 Power Architecture TM cores with private L1 and L2 caches interfacing with the NoC via ARM AMBA, along with two Cadence on-chip DDR2 controllers. The chip prototype achieves a post synthesis operating frequency of 1 GHz (833 MHz post-layout) with an estimated power of 28.8 W (768 mW per tile), while the network consumes only 10% of tile area and 19 % of tile power.

show abstract

A scalable architecture for ordered parallelism

Jeffrey¹,

Subramanian²,

Yan³

et al. 2015

View full text Add to dashboard Cite

We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks with programmer-specified timestamps. Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead of the earliest active task to uncover ordered parallelism. Swarm builds on prior TLS and HTM schemes, and contributes several new techniques that allow it to scale to large core counts and speculation windows, including a new execution model, speculation-aware hardware task management, selective aborts, and scalable ordered commits.We evaluate Swarm on graph analytics, simulation, and database benchmarks. At 64 cores, Swarm achieves 51-122× speedups over a single-core system, and outperforms software-only parallel algorithms by 3-18×.

show abstract

SMART: A Single-Cycle Reconfigurable NoC for SoC Applications

Chen

Park

Krishna

et al. 2013

View full text Add to dashboard Cite

Abstract-As technology scales, SoCs are increasing in core counts, leading to the need for scalable NoCs to interconnect the multiple cores on the chip. Given aggressive SoC design targets, NoCs have to deliver low latency, high bandwidth, at low power and area overheads. In this paper, we propose Single-cycle Multihop Asynchronous Repeated Traversal (SMART) NoC, a NoC that reconfigures and tailors a generic mesh topology for SoC applications at runtime. The heart of our SMART NoC is a novel low-swing clockless repeated link circuit embedded within the router crossbars, that allows packets to potentially bypass all the way from source to destination core within a single clock cycle, without being latched at any intermediate router. Our clockless repeater link has been proven in silicon in 45nm SOI. Results show that at 2GHz, we can traverse 8 mm within a single cycle, i.e. 8 hops with 1 mm cores. We implement the SMART NoC to layout and show that SMART NoC gives 60% latency savings, and 2.2X power savings compared to a baseline mesh NoC.

show abstract

Using network analysis to localize the epileptogenic zone from invasive EEG recordings in intractable focal epilepsy

Chennuri

Subramanian

et al. 2018

Network Neuroscience

View full text Add to dashboard Cite

Treatment of medically intractable focal epilepsy (MIFE) by surgical resection of the epileptogenic zone (EZ) is often effective provided the EZ can be reliably identified. Even with the use of invasive recordings, the clinical differentiation between the EZ and normal brain areas can be quite challenging, mainly in patients without MRI detectable lesions. Consequently, despite relatively large brain regions being removed, surgical success rates barely reach 60–65%. Such variable and unfavorable outcomes associated with high morbidity rates are often caused by imprecise and/or inaccurate EZ localization. We developed a localization algorithm that uses network-based data analytics to process invasive EEG recordings. This network algorithm analyzes the centrality signatures of every contact electrode within the recording network and characterizes contacts into susceptible EZ based on the centrality trends over time. The algorithm was tested in a retrospective study that included 42 patients from four epilepsy centers. Our algorithm had higher agreement with EZ regions identified by clinicians for patients with successful surgical outcomes and less agreement for patients with failed outcomes. These findings suggest that network analytics and a network systems perspective of epilepsy may be useful in assisting clinicians in more accurately localizing the EZ.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.