Detecting failures in distributed systems with the Falcon spy network

Leners, Joshua B.; Wu, Hao; Hung, Wei-Lun; Aguilera, Marcos K.; Walfish, Michael

doi:10.1145/2043556.2043583

Cited by 69 publications

(48 citation statements)

References 38 publications

(42 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…If a client fails during a write the system might end up in a vulnerable state where only a limited number of further failures can be tolerated. To limit this vulnerable time inconsistencies must be detected as fast as possible, by other client accesses or a failure detector [3,16]. Clients send their read or write requests directly to the data servers; the access granularity is one block.…”

Section: Distributed System Modelmentioning

confidence: 99%

Consistency and fault tolerance for erasure-coded distributed storage systems

Peter

Reinefeld

2012

Proceedings of the Fifth International Workshop on Data-Intensive Distributed Computing Date

View full text Add to dashboard Cite

One challenge in applying erasure codes (or error-correcting codes) to distributed storage systems is to maintain consistency between data and redundancy blocks in the face of crashing servers. We present two access protocols that provide sequential consistency and maximum distance separable fault tolerance at the same time. The protocols use sequence numbers to recover a consistent version in the presence of failures or partial writes. The first (pessimistic) PSW protocol uses a master per stripe to execute updates in sequence. The second (optimistic) OCW protocol allows concurrent writes to blocks in the same stripe to happen in parallel at the cost of additional buffer space.We present empirical performance results for PSW and OCW and compare them to other protocols. Our results show that OCW is as fast as simple replication while providing better fault tolerance and/or reduced storage overhead. This demonstrates that erasure coding can be used as a space-efficient alternative to replication in distributed storage systems.

show abstract

Section: Distributed System Modelmentioning

confidence: 99%

Consistency and fault tolerance for erasure-coded distributed storage systems

Peter

Reinefeld

2012

Proceedings of the Fifth International Workshop on Data-Intensive Distributed Computing Date

View full text Add to dashboard Cite

show abstract

“…The information stored in network was used for detection of loss and errors in the network. Leners et al [17] presented a FALCON framework for network monitoring in which the error or malicious activities were tracked for the network in a distributed system. However, the end points and their data packets were not the prime focus.…”

Section: Literature Reviewmentioning

confidence: 99%

Internet Traffic Surveillance & Network Monitoring in India: Case Study of NETRA

Gupta

Muttoo

2017

NPA

View full text Add to dashboard Cite

Internet traffic surveillance is gaining importance in today's digital world. Lots of international agencies are putting in efforts to monitor the network around their countries to see suspicious activities and illegal or illegitimate transmission of messages. India, being a center of attraction for terrorist activities, is also working towards development of such surveillance systems. NETRA or Network Traffic Analysis is one such effort being taken by the Indian Government to filter suspicious keywords from messages in the network. But is it good enough to be used at highest level for security analysis or does the system design needs to be improved as compared to other similar systems around the world; this question is answered through this study. The comparison of NETRA is done against Dish Fire, Prism and Echelon. The design of the NETRA scheme and implementation level analysis of the system shows few weaknesses like limited memory options, limited channels for monitoring, pre-set filters, ignoring big data demands, security concerns, social values breach and ignoring ethical issues. These can be covered through alternate options which can improve the existing system. Inclusion of self-similarity models, Self-Configuring Network Monitoring and smart monitoring through early intrusion detections can be embedded in the architecture of existing surveillance system to give it more depth and make it more robust.

show abstract

“…Several works have presented modifications and additions to ZooKeeper (e.g., [21,23,25,36,40,55]), but (almost) none of them deals with changing the service's programming model. A notable exception is a recent short paper by Kalantari et al [36] which identifies inefficiencies related to ZooKeeper's watch mechanism.…”

Section: Related Workmentioning

confidence: 99%

Extensible distributed coordination

Distler

Bahn

Bessani

et al. 2015

Proceedings of the Tenth European Conference on Computer Systems

View full text Add to dashboard Cite

Most services inside a data center are distributed systems requiring coordination and synchronization in the form of primitives like distributed locks and message queues. We argue that extensibility is a crucial feature of the coordination infrastructures used in these systems. Without the ability to extend the functionality of coordination services, applications might end up using sub-optimal coordination algorithms, possibly leading to low performance. Adding extensibility, however, requires mechanisms that constrain extensions to be able to make reasonable security and performance guarantees. We propose a scheme that enables extensions to be introduced and removed dynamically in a secure way. To avoid performance overheads due to poorly designed extensions, it constrains the access of extensions to resources. Evaluation results for extensible versions of ZooKeeper and DepSpace show that it is possible to increase the throughput of a distributed queue by more than an order of magnitude (17x for ZooKeeper, 24x for DepSpace) while keeping the underlying coordination kernel small.

show abstract

Detecting failures in distributed systems with the Falcon spy network

Cited by 69 publications

References 38 publications

Consistency and fault tolerance for erasure-coded distributed storage systems

Consistency and fault tolerance for erasure-coded distributed storage systems

Internet Traffic Surveillance & Network Monitoring in India: Case Study of NETRA

Extensible distributed coordination

Contact Info

Product

Resources

About