Architecting Time-Critical Big-Data Systems

Val, Pablo Basanta; Audsley, Neil; Wellings, Andy; Gray, Ian; Fernandez-Garcia, Norberto

doi:10.1109/tbdata.2016.2622719

Cited by 59 publications

(25 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, the authors areanalyzing theadvantages stemmed from the use of different techniques of parallel and distributed computing as architectonic blocks useful to reduce the total computation time of our current engine. The second line refers to the development of an alternative approach by using common off the shelf big data engines (based on Storm (Apache Storm, 2014;Marz and Warren, 2015;Basanta Val et al, 2015;Basanta Val et al, 2016), efficient map reduce strategies (Anjos et al, 2015;Lee et al, 2013), and Hadoop (Zikopoulos and Eaton, 2011)torun stream analytics.…”

Section: Discussionmentioning

confidence: 99%

T-Hoarder: A framework to process Twitter data streams

Congosto¹,

Val

Sánchez-Fernandez

2017

Journal of Network and Computer Applications

Self Cite

View full text Add to dashboard Cite

With the eruption of online social networks, like Twitter and Facebook, a series of new APIs have appeared to allow access to the data that these new sources of information accumulate. One of most popular online social networks is the micro blogging site Twitter. Its APIs allow many machines to access the torrent simultaneously to Twitter data, listening to tweets and accessing other useful information such as user profiles. A number of tools have appeared for processing Twitter data with different algorithms and for different purposes. In this paper T Hoarder is described: a framework that enables tweet crawling, data filtering, and which is also able to display summarized and analytical information about the Twitter activity with respect to a certain topic or event in a web page. This information is updated on a daily basis. The tool has been validated with real use cases that allow making a series of analysis on the performance one may expect from this type of infrastructure.

show abstract

Section: Discussionmentioning

confidence: 99%

T-Hoarder: A framework to process Twitter data streams

Congosto¹,

Val

Sánchez-Fernandez

2017

Journal of Network and Computer Applications

Self Cite

View full text Add to dashboard Cite

show abstract

“…This requires a different processing model than the batch paradigm. Current architectures of Big Data processing platforms require technologies that can handle both batch and stream workloads [18]. These frameworks simplify diverse processing requirements by allowing the same or related components and application programming interfaces (APIs) to be used for both types of data.…”

Section: Related Workmentioning

confidence: 99%

Big Data Processing and Analytics Platform Architecture for Process Industry Factories

Sarnovský

Bednár

Smatana

2018

BDCC

View full text Add to dashboard Cite

This paper describes the architecture of a cross-sectorial Big Data platform for the process industry domain. The main objective was to design a scalable analytical platform that will support the collection, storage and processing of data from multiple industry domains. Such a platform should be able to connect to the existing environment in the plant and use the data gathered to build predictive functions to optimize the production processes. The analytical platform will contain a development environment with which to build these functions, and a simulation environment to evaluate the models. The platform will be shared among multiple sites from different industry sectors. Cross-sectorial sharing will enable the transfer of knowledge across different domains. During the development, we adopted a user-centered approach to gather requirements from different stakeholders which were used to design architectural models from different viewpoints, from contextual to deployment. The deployed architecture was tested in two process industry domains, one from the aluminium production and the other from the plastic molding industry.

show abstract

“…Given the relevance of performance-throughput and response time-for SPEs, several proposals aim to model performance characteristics of SPEs with the goal of predicting or improving some quality of service metrics or the allocation of resources [33][34][35]. These works are complementary to the proposal of this paper, since they focus on predicting the performance of SPEs rather than modeling their execution semantics.…”

Section: Modeling Stream Processingmentioning

confidence: 99%

Defining the execution semantics of stream processing engines

et al. 2017

View full text Add to dashboard Cite

IntroductionSeveral modern data-intensive applications need to process large volumes of data on the fly as they are produced. Examples range from credit card fraud detection systems, which analyze massive streams of credit card transactions to identify suspicious patterns, to environmental monitoring applications that continuously analyze sensor data, to click stream analysis of Web sites that identify frequent patterns of interactions. More in general, stream processing is a central requirement in today's information systems.This state of facts pushed the development of several stream processing engines (SPEs) that continuously analyze streams of data to produce new results as new elements enter the streams. Unfortunately, existing SPEs adopt different processing models and standardized execution semantics have not yet emerged. This severely hampers the usability AbstractThe ability to process large volumes of data on the fly, as soon as they become available, is a fundamental requirement in today's information systems. Modern distributed stream processing engines (SPEs) address this requirement and provide low-latency and high-throughput data stream processing in cluster platforms, offering high-level programming interfaces that abstract from low-level details such as data distribution and hardware failures. The last decade saw a rapid increase in the number of available SPEs. However, each SPE defines its own processing model and standardized execution semantics have not emerged yet. This paper tackles this problem and analyzes the execution semantics of some widely adopted modern SPEs, namely Flink, Storm, Spark Streaming, Google Dataflow, and Azure Stream Analytics. We specifically target the notions of windowing and time, traditionally considered the key distinguishing factors that characterize the behavior of SPEs. We rely on the SECRET model, introduced in 2010 to analyze the windowing semantics for the SPEs available at that time. We show that SECRET models well some aspects of the behavior of modern SPEs, and we shed light on the evolution of SPEs after the introduction of SECRET by analyzing the elements that SECRET cannot fully capture. In this way, the paper contributes to the research in the area of stream processing by: (1) contrasting and comparing some widely used modern SPEs based on a formal model of their execution semantics; (2) discussing the evolution of SPEs since the introduction of the SECRET model; (3) suggesting promising research directions to direct further modeling efforts. Affetti et al. J Big Data (2017) et al. J Big Data (2017) 4:12 and interoperability of SPEs, since a user needs to understand system-specific aspects to confront various alternatives and select the ones that better suite her needs. SURVEY PAPERPage 2 of 24 AffettiThe main factors that differentiate the behaviors of SPEs are the models of windows and time they adopt [1]. Windows enable computations that would be otherwise unfeasible on unbounded datasets such as streams. For instance, counting the number o...

show abstract

Architecting Time-Critical Big-Data Systems

Cited by 59 publications

References 44 publications

T-Hoarder: A framework to process Twitter data streams

T-Hoarder: A framework to process Twitter data streams

Big Data Processing and Analytics Platform Architecture for Process Industry Factories

Defining the execution semantics of stream processing engines

Contact Info

Product

Resources

About