Abstract-Current infrastructures for developing big-data applications are able to process -via big-data analytics-huge amounts of data, using clusters of machines that collaborate to perform parallel computations. However, current infrastructures were not designed to work with the requirements of time-critical applications; they are more focused on general-purpose applications rather than time-critical ones. Addressing this issue from the perspective of the real-time systems community, this paper considers time-critical big-data. It deals with the definition of a time-critical big-data system from the point of view of requirements, analyzing the specific characteristics of some popular big-data applications. This analysis is complemented by the challenges stemmed from the infrastructures that support the applications, proposing an architecture and offering initial performance patterns that connect application costs with infrastructure performance.
h i g h l i g h t s• Model combining stream processing technology and real-time.• Extensions to the Storm processor.• Performance evaluation of the extension on a cluster. a b s t r a c t Next generation real-time applications demand big-data infrastructures to process huge and continuous data volumes under complex computational constraints. This type of application raises new issues on current big-data processing infrastructures. The first issue to be considered is that most of current infrastructures for big-data processing were defined for general purpose applications. Thus, they set aside real-time performance, which is in some cases an implicit requirement. A second important limitation is the lack of clear computational models that could be supported by current big-data frameworks. In an effort to reduce this gap, this article contributes along several lines. First, it provides a set of improvements to a computational model called distributed stream processing in order to formalize it as a real-time infrastructure. Second, it proposes some extensions to Storm, one of the most popular stream processors. These extensions are designed to gain an extra control over the resources used by the application in order to improve its predictability. Lastly, the article presents some empirical evidences on the performance that can be expected from this type of infrastructure.
Abstract-In recent yeais, big data systems have become an active area of research and development. Stream processing is one of the potential application scenarios of big data systems where the goal is to process aconlinoous, high velochyflow of information items. High frequencytradirg (HFT) in stock markets ortrendirgtopicdetection in Twitter are som e examples of stream processing applications. In some cases (like, for instance, in HFT), these applications have end-t�nd qualhy-of-service reqlirements and may benefh from the usage� real-time techniques. Taking this into account, the present articl e analyzes, from the point� view of real-time systems, a set of patterns that can be used when implementing a stream processing application. For each pattern, we discuss its advantages and dsadvanlages, as well as its impact in application performance, measured as response time, maximum ill)Ut frequency and changes in utilization demands due to the pattern.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.