Tamal Das scite author profile

Many "big data" applications need to act on data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring either hot replication or long recovery times. We propose a new programming model, discretized streams (D-Streams), that offers a high-level functional API, strong consistency, and efficient fault recovery. D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup schemes in streaming databasesparallel recovery of lost state-and unlike previous systems, also mitigate stragglers. We implement D-Streams as an extension to the Spark cluster computing engine that lets users seamlessly intermix streaming, batch and interactive queries. Our system can process over 60 million records/second at sub-second latency on 100 nodes.

show abstract

Apache Spark

Zaharia

et al. 2016

View full text Add to dashboard Cite

show abstract

A Survey on Controller Placement in SDN

Das

Sridharan

Gurusamy

2020

IEEE Commun. Surv. Tutorials

160

102

View full text Add to dashboard Cite

Adaptive Stream Processing using Dynamic Batch Sizing

et al. 2014

View full text Add to dashboard Cite

AcknowledgementsMany thanks to Yuan Zhong, Ion Stoica and Scott Shenker for making this thesis possible. Also thanks to David Zats, Shivaram Venkataraman, and Neeraja Yadwadkar for providing feedback on earlier versions of the text.Finally, a very special thanks to both my advisers, Scott Shenker and Ion Stoica, for guiding me through my ups and downs in life and putting up with my idiosyncrasies. 2 AbstractThe need for real-time processing of "big data" has led to the development of frameworks for distributed stream processing in clusters. It is important for such frameworks to be robust against variable operating conditions such as server failures, changes in data ingestion rates, and workload characteristics. To provide fault tolerance and efficient stream processing at scale, recent stream processing frameworks have proposed to treat streaming workloads as a series of batch jobs on small batches of streaming data. However, the robustness of such frameworks against variable operating conditions has not been explored.In this paper, we explore the effect of the size of batches on the performance of streaming workloads. The throughput and end-to-end latency of the system can have complicated relationships with batch sizes, data ingestion rates, variations in available resources, workload characteristics, etc. We propose a simple yet robust control algorithm that automatically adapts batch sizes as the situation necessitates. We show through extensive experiments that this algorithm is powerful enough to ensure system stability and low end-to-end latency for a wide class of workloads, despite large variations in data rates and operating conditions. 3

show abstract

A Survey on Internet Multipath Routing and Provisioning

Singh

Das

Jukan

2015

IEEE Commun. Surv. Tutorials

115

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tamal Das

Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing

Apache Spark

A Survey on Controller Placement in SDN

Adaptive Stream Processing using Dynamic Batch Sizing

A Survey on Internet Multipath Routing and Provisioning

Contact Info

Product

Resources

About