Abstract:In the era of big data, as the amount of streaming data continues to
increase, stream processing tasks (SPTs) face serious challenges in
real-time processing scenarios with low latency and high throughput.
However, much of the current literature on the performance of SPTs pays
attention to the reactive approach, which cannot well avoid the problem of
system crashes due to the inherent performance volatility. In this paper, a
novel throughput prediction method based on ExtraTree for SPTs is pr… Show more
“…Increasing [6,7,18,21,30,[32][33][34][35] Wave [1, 6, 18, 24, 30-32, 36, 37] Binary [1,5,6,18,19,[22][23][24] Spike [32,[37][38][39] Table 1: Data stream frequency patterns found in the literature.…”
Section: Data Frequency Strategy Related Workmentioning
Latency or throughput are often critical performance metrics in stream processing. Applications' performance can fluctuate depending on the input stream. This unpredictability is tied to variations in data arrival frequency, data size, complexity, and other factors. Researchers are constantly investigating new ways to mitigate the impact of these variations on performance with self-adaptive techniques involving elasticity or micro-batching. However, there is a lack of benchmarks capable of creating test scenarios to further evaluate these techniques. This work extends and improves the SPBench benchmarking framework to support dynamic micro-batching and data stream frequency management. We also propose a set of algorithms that generate the most commonly used frequency patterns for benchmarking stream processing in related work. It allows the creation of a wide variety of test scenarios. To validate our solution, we use SPBench to create custom benchmarks and evaluate the impact of micro-batching and data stream Micro-batch and Data Frequency for Stream Processing on Multi-cores frequency on the performance of Intel TBB and FastFlow. These are two libraries that leverage stream parallelism for multi-core architectures. Our results demonstrated that most test cases benefited from micro-batches, especially high throughput applications with ordering constraints. For different data stream frequency configurations, TBB ensured the lowest latency, while FastFlow assured higher throughput in shorter pipelines.
“…Increasing [6,7,18,21,30,[32][33][34][35] Wave [1, 6, 18, 24, 30-32, 36, 37] Binary [1,5,6,18,19,[22][23][24] Spike [32,[37][38][39] Table 1: Data stream frequency patterns found in the literature.…”
Section: Data Frequency Strategy Related Workmentioning
Latency or throughput are often critical performance metrics in stream processing. Applications' performance can fluctuate depending on the input stream. This unpredictability is tied to variations in data arrival frequency, data size, complexity, and other factors. Researchers are constantly investigating new ways to mitigate the impact of these variations on performance with self-adaptive techniques involving elasticity or micro-batching. However, there is a lack of benchmarks capable of creating test scenarios to further evaluate these techniques. This work extends and improves the SPBench benchmarking framework to support dynamic micro-batching and data stream frequency management. We also propose a set of algorithms that generate the most commonly used frequency patterns for benchmarking stream processing in related work. It allows the creation of a wide variety of test scenarios. To validate our solution, we use SPBench to create custom benchmarks and evaluate the impact of micro-batching and data stream Micro-batch and Data Frequency for Stream Processing on Multi-cores frequency on the performance of Intel TBB and FastFlow. These are two libraries that leverage stream parallelism for multi-core architectures. Our results demonstrated that most test cases benefited from micro-batches, especially high throughput applications with ordering constraints. For different data stream frequency configurations, TBB ensured the lowest latency, while FastFlow assured higher throughput in shorter pipelines.
“…It uses a random value for the split of each node, which leads to more diversified trees and fewer splitters. Previous studies have used Extratree in both prediction [44] and classification [45]. (2) Random Forest (RF) is a supervised ensemble learning model introduced by Ho [46], and its construction is based on the ensembles of unpruned classification or regression trees.…”
Artificial intelligence is widely applied to estimate ground-level fine particulate matter (PM2.5) from satellite data by constructing the relationship between the aerosol optical thickness (AOT) and the surface PM2.5 concentration. However, aerosol size properties, such as the fine mode fraction (FMF), are rarely considered in satellite-based PM2.5 modeling, especially in machine learning models. This study investigated the linear and non-linear relationships between fine mode AOT (fAOT) and PM2.5 over five AERONET stations in China (Beijing, Baotou, Taihu, Xianghe, and Xuzhou) using AERONET fAOT and 5-year (2015–2019) ground-level PM2.5 data. Results showed that the fAOT separated by the FMF (fAOT = AOT × FMF) had significant linear and non-linear relationships with surface PM2.5. Then, the Himawari-8 V3.0 and V2.1 FMF and AOT (FMF&AOT-PM2.5) data were tested as input to a deep learning model and four classical machine learning models. The results showed that FMF&AOT-PM2.5 performed better than AOT (AOT-PM2.5) in modelling PM2.5 estimations. The FMF was then applied in satellite-based PM2.5 retrieval over China during 2020, and FMF&AOT-PM2.5 was found to have a better agreement with ground-level PM2.5 than AOT-PM2.5 on dust and haze days. The better linear correlation between PM2.5 and fAOT on both haze and dust days (dust days: R = 0.82; haze days: R = 0.56) compared to AOT (dust days: R = 0.72; haze days: R = 0.52) partly contributed to the superior accuracy of FMF&AOT-PM2.5. This study demonstrates the importance of including the FMF to improve PM2.5 estimations and emphasizes the need for a more accurate FMF product that enables superior PM2.5 retrieval.
Acoustic impedance is the product of the density of a material and the speed at which an acoustic wave travels through it. Understanding this relationship is essential because low acoustic impedance values are closely associated with high porosity, facilitating the accumulation of more hydrocarbons. In this study, we estimate the acoustic impedance based on nine different inputs of seismic attributes in addition to depth and two-way travel time using three supervised machine learning models, namely extra tree regression (ETR), random forest regression, and a multilayer perceptron regression algorithm using the scikit-learn library. Our results show that the R2 of multilayer perceptron regression is 0.85, which is close to what has been reported in recent studies. However, the ETR method outperformed those reported in the literature in terms of the mean absolute error, mean squared error, and root-mean-squared error. The novelty of this study lies in achieving more accurate predictions of acoustic impedance for exploration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.