Throughput prediction based on ExtraTree for stream processing tasks

Chu, Zheng; Yu, Jiong; Hamdulla, Askar

doi:10.2298/csis200131031c

Cited by 10 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Increasing [6,7,18,21,30,[32][33][34][35] Wave [1, 6, 18, 24, 30-32, 36, 37] Binary [1,5,6,18,19,[22][23][24] Spike [32,[37][38][39] Table 1: Data stream frequency patterns found in the literature.…”

Section: Data Frequency Strategy Related Workmentioning

confidence: 99%

Micro-batch and data frequency for stream processing on multi-cores

et al. 2023

View full text Add to dashboard Cite

Latency or throughput are often critical performance metrics in stream processing. Applications' performance can fluctuate depending on the input stream. This unpredictability is tied to variations in data arrival frequency, data size, complexity, and other factors. Researchers are constantly investigating new ways to mitigate the impact of these variations on performance with self-adaptive techniques involving elasticity or micro-batching. However, there is a lack of benchmarks capable of creating test scenarios to further evaluate these techniques. This work extends and improves the SPBench benchmarking framework to support dynamic micro-batching and data stream frequency management. We also propose a set of algorithms that generate the most commonly used frequency patterns for benchmarking stream processing in related work. It allows the creation of a wide variety of test scenarios. To validate our solution, we use SPBench to create custom benchmarks and evaluate the impact of micro-batching and data stream Micro-batch and Data Frequency for Stream Processing on Multi-cores frequency on the performance of Intel TBB and FastFlow. These are two libraries that leverage stream parallelism for multi-core architectures. Our results demonstrated that most test cases benefited from micro-batches, especially high throughput applications with ordering constraints. For different data stream frequency configurations, TBB ensured the lowest latency, while FastFlow assured higher throughput in shorter pipelines.

show abstract

Section: Data Frequency Strategy Related Workmentioning

confidence: 99%

Micro-batch and data frequency for stream processing on multi-cores

et al. 2023

View full text Add to dashboard Cite

show abstract

“…It uses a random value for the split of each node, which leads to more diversified trees and fewer splitters. Previous studies have used Extratree in both prediction [44] and classification [45]. (2) Random Forest (RF) is a supervised ensemble learning model introduced by Ho [46], and its construction is based on the ensembles of unpruned classification or regression trees.…”

Section: Classical Machine Learning Modelsmentioning

confidence: 99%

Superior PM2.5 Estimation by Integrating Aerosol Fine Mode Data from the Himawari-8 Satellite in Deep and Classical Machine Learning Models

Zang

Guo

et al. 2021

Remote Sensing

View full text Add to dashboard Cite

Artificial intelligence is widely applied to estimate ground-level fine particulate matter (PM2.5) from satellite data by constructing the relationship between the aerosol optical thickness (AOT) and the surface PM2.5 concentration. However, aerosol size properties, such as the fine mode fraction (FMF), are rarely considered in satellite-based PM2.5 modeling, especially in machine learning models. This study investigated the linear and non-linear relationships between fine mode AOT (fAOT) and PM2.5 over five AERONET stations in China (Beijing, Baotou, Taihu, Xianghe, and Xuzhou) using AERONET fAOT and 5-year (2015–2019) ground-level PM2.5 data. Results showed that the fAOT separated by the FMF (fAOT = AOT × FMF) had significant linear and non-linear relationships with surface PM2.5. Then, the Himawari-8 V3.0 and V2.1 FMF and AOT (FMF&AOT-PM2.5) data were tested as input to a deep learning model and four classical machine learning models. The results showed that FMF&AOT-PM2.5 performed better than AOT (AOT-PM2.5) in modelling PM2.5 estimations. The FMF was then applied in satellite-based PM2.5 retrieval over China during 2020, and FMF&AOT-PM2.5 was found to have a better agreement with ground-level PM2.5 than AOT-PM2.5 on dust and haze days. The better linear correlation between PM2.5 and fAOT on both haze and dust days (dust days: R = 0.82; haze days: R = 0.56) compared to AOT (dust days: R = 0.72; haze days: R = 0.52) partly contributed to the superior accuracy of FMF&AOT-PM2.5. This study demonstrates the importance of including the FMF to improve PM2.5 estimations and emphasizes the need for a more accurate FMF product that enables superior PM2.5 retrieval.

show abstract

Acoustic impedance prediction based on extended seismic attributes using multilayer perceptron, random forest, and extra tree regressor algorithms

Surachman,

Abdulraheem,

Al-Shuhail

et al. 2024

J Petrol Explor Prod Technol

View full text Add to dashboard Cite

Acoustic impedance is the product of the density of a material and the speed at which an acoustic wave travels through it. Understanding this relationship is essential because low acoustic impedance values are closely associated with high porosity, facilitating the accumulation of more hydrocarbons. In this study, we estimate the acoustic impedance based on nine different inputs of seismic attributes in addition to depth and two-way travel time using three supervised machine learning models, namely extra tree regression (ETR), random forest regression, and a multilayer perceptron regression algorithm using the scikit-learn library. Our results show that the R2 of multilayer perceptron regression is 0.85, which is close to what has been reported in recent studies. However, the ETR method outperformed those reported in the literature in terms of the mean absolute error, mean squared error, and root-mean-squared error. The novelty of this study lies in achieving more accurate predictions of acoustic impedance for exploration.

show abstract

Throughput prediction based on ExtraTree for stream processing tasks

Cited by 10 publications

References 17 publications

Micro-batch and data frequency for stream processing on multi-cores

Micro-batch and data frequency for stream processing on multi-cores

Superior PM2.5 Estimation by Integrating Aerosol Fine Mode Data from the Himawari-8 Satellite in Deep and Classical Machine Learning Models

Acoustic impedance prediction based on extended seismic attributes using multilayer perceptron, random forest, and extra tree regressor algorithms

Contact Info

Product

Resources

About