2021 IEEE 37th International Conference on Data Engineering (ICDE) 2021
DOI: 10.1109/icde51399.2021.00078
|View full text |Cite
|
Sign up to set email alerts
|

Robust Factorization of Real-world Tensor Streams with Patterns, Missing Values, and Outliers

Abstract: Consider multiple seasonal time series being collected in real-time, in the form of a tensor stream. Real-world tensor streams often include missing entries (e.g., due to network disconnection) and at the same time unexpected outliers (e.g., due to system errors). Given such a real-world tensor stream, how can we estimate missing entries and predict future evolution accurately in real-time?In this work, we answer this question by introducing SOFIA, a robust factorization method for real-world tensor streams. I… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(3 citation statements)
references
References 40 publications
0
3
0
Order By: Relevance
“…According to [24], medical datasets with missing values may be difficult to impute as the null values are often contained in categorical attributes hence complicating pre-processing stages. However, advanced imputation techniques based on machine learning and decision-tree models are capable of effectively identifying outliers and replacing the missing values through K-NN computations [25]. Outliers pose significant risks of bias during statistical estimation procedures by increasing the likelihood of overstated or understated decision outcomes hence the dependability of imputations techniques is a critical consideration.…”
Section: Related Workmentioning
confidence: 99%
“…According to [24], medical datasets with missing values may be difficult to impute as the null values are often contained in categorical attributes hence complicating pre-processing stages. However, advanced imputation techniques based on machine learning and decision-tree models are capable of effectively identifying outliers and replacing the missing values through K-NN computations [25]. Outliers pose significant risks of bias during statistical estimation procedures by increasing the likelihood of overstated or understated decision outcomes hence the dependability of imputations techniques is a critical consideration.…”
Section: Related Workmentioning
confidence: 99%
“…CP decomposition (CPD) has been a core building block of numerous machine learning (ML) algorithms, which are designed for classification [41], weather forecast [14], recommendation [11], stock price prediction [13], to name a few. Moreover, CPD has proven useful for outlier removal [42], [43], imputation [12], [43], and dimensionality reduction [19], and thus it can be used as a preprocessing step of ML algorithms, many of which are known to be vulnerable to outliers, missings, and the curse of dimensionality. We refer the reader to [44] for more roles of tensor decomposition for ML.…”
Section: Relation To Machine Learningmentioning
confidence: 99%
“…The computational complexity of BRST is, however, very high, and thus, the method becomes inefficient when handling high-dimensional and fast-arriving data streams. Lee and Shin 15 proposed another robust streaming CP algorithm called SOFIA, which has the potential to handle real-world data streams with missing values and sparse outliers. Specifically, SOFIA exploits a well-known time-series forecasting model, namely, Holt-Winters, for detecting outliers and temporal patterns and hence factorizing the underlying tensor.…”
Section: Introductionmentioning
confidence: 99%