Correlation Coefficient Based Cluster Data Preprocessing and LSTM Prediction Model for Time Series Data in Large Aircraft Test Flights

Zhu, Hanlin; Zhu, Yongxin; Wu, Di; Wang, Hui; Tian, Li; Mao, Wei; Feng, Chao; Zha, Xiaowen; Deng, Guobao; Chen, Jiayi; Liu, Tao; Niu, Xinyu; Tsoi, Kuen Hung; Luk, Wayne

doi:10.1007/978-3-030-05755-8_37

Cited by 6 publications

(4 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Analyzing correlations is a vital step in data analysis and ML tasks. It allows data scientists to understand the possible patterns and connections between two variables or a group of variables and helps in choosing better models ( 53 ). This method is widely applied in medical analysis ( 54 ).…”

Section: Resultsmentioning

confidence: 99%

Evaluation of nutritional status and clinical depression classification using an explainable machine learning method

et al. 2023

View full text Add to dashboard Cite

IntroductionDepression is a prevalent disorder worldwide, with potentially severe implications. It contributes significantly to an increased risk of diseases associated with multiple risk factors. Early accurate diagnosis of depressive symptoms is a critical first step toward management, intervention, and prevention. Various nutritional and dietary compounds have been suggested to be involved in the onset, maintenance, and severity of depressive disorders. Despite the challenges to better understanding the association between nutritional risk factors and the occurrence of depression, assessing the interplay of these markers through supervised machine learning remains to be fully explored.MethodsThis study aimed to determine the ability of machine learning-based decision support methods to identify the presence of depression using publicly available health data from the Korean National Health and Nutrition Examination Survey. Two exploration techniques, namely, uniform manifold approximation and projection and Pearson correlation, were performed for explanatory analysis among datasets. A grid search optimization with cross-validation was performed to fine-tune the models for classifying depression with the highest accuracy. Several performance measures, including accuracy, precision, recall, F1 score, confusion matrix, areas under the precision-recall and receiver operating characteristic curves, and calibration plot, were used to compare classifier performances. We further investigated the importance of the features provided: visualized interpretation using ELI5, partial dependence plots, and local interpretable using model-agnostic explanations and Shapley additive explanation for the prediction at both the population and individual levels.ResultsThe best model achieved an accuracy of 86.18% for XGBoost and an area under the curve of 84.96% for the random forest model in original dataset and the XGBoost algorithm with an accuracy of 86.02% and an area under the curve of 85.34% in the quantile-based dataset. The explainable results revealed a complementary observation of the relative changes in feature values, and, thus, the importance of emergent depression risks could be identified.DiscussionThe strength of our approach is the large sample size used for training with a fine-tuned model. The machine learning-based analysis showed that the hyper-tuned model has empirically higher accuracy in classifying patients with depressive disorder, as evidenced by the set of interpretable experiments, and can be an effective solution for disease control.

show abstract

Section: Resultsmentioning

confidence: 99%

Evaluation of nutritional status and clinical depression classification using an explainable machine learning method

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Todorov et al evaluated different stochastic approaches to evaluate the sensitivity indexes, allowing a comparison to be made between the input parameters with respect to their influence on the points of interest [48]. To evaluate the results of the time series analysis we used the root mean square error (RMSE) [49] and the correlation coefficient (CC) [50]. The variable "M" represents the values of the records modeled by the recurrent networks (Elman, LSTM, and GRU), the variable "R" is the actual records, and "n" the number of total data.…”

Section: Model Evaluationmentioning

confidence: 99%

“…CC generates values between −1 and 1, being the closest to one in a positive way means that the predicted values (M) are closer to the real values (R). In other words, if the CC is equal to 1, there is no difference between the modeled data and the real data, while if a negative value is obtained it means that a mirror behavior to the real data was obtained [50]. Equation (14) shows the sections corresponding to the calculation.…”

Section: Model Evaluationmentioning

confidence: 99%

Airborne Particulate Matter Modeling: A Comparison of Three Methods Using a Topology Performance Approach

et al. 2021

View full text Add to dashboard Cite

Understanding the behavior of suspended pollutants in the atmosphere has become of paramount importance to determine air quality. For this purpose, a variety of simulation software packages and a large number of algorithms have been used. Among these techniques, recurrent deep neural networks (RNN) have been used lately. These are capable of learning to imitate the chaotic behavior of a set of continuous data over time. In the present work, the results obtained from implementing three different RNNs working with the same structure are compared. These RNNs are long-short term memory network (LSTM), a recurrent gated unit (GRU), and the Elman network, taking as a case study the records of particulate matter PM10 and PM2.5 from 2005 to 2019 of Mexico City, obtained from the Red Automatica de Monitoreo Ambiental (RAMA) database. The results were compared for these three topologies in execution time, root mean square error (RMSE), and correlation coefficient (CC) metrics.

show abstract

“…LSTM 18 is a special memory-maintained RNN, which has fabulous effects and useful application in predicting time series. 19 As the LSTM model, most prediction methods at the present stage is based on single-step data prediction, 19,20 which may unavoidably result in some errors when we apply them to multiple-step and cyclic prediction. Multiple-step prediction requires the combination of single-step predictions, which correspondingly accumulates errors in each single-step prediction.…”

Section: Related Workmentioning

confidence: 99%

Multiscale deep network based multistep prediction of high‐dimensional time series from power transmission systems

Zhu

Wang

et al. 2020

Trans Emerging Tel Tech

Self Cite

View full text Add to dashboard Cite

Internet of energy makes the future power and energy network a more complicated and intelligent system. With the development of energy industry, the sample data of such system is high dimensional, dynamic, correlative, and complex. In order to meet people's needs and reduce the power redundancy, predicting the future energy demand and production is an essential approach. It is necessary for us to predict the later hours' or days' data, which means multistep prediction. However, the common one‐step prediction model cannot forecast the power demand or production to make adequate preparation and the data have thousands of dimensions, which makes the problem challenging. In addition, the changeable pattern makes the common prediction algorithm do not perform good enough. In this article, we propose a sequence to sequence model to make multistep prediction with a baseline mean squared error (MSE) of 1.49×10−5. In addition, we improve the model to be a multiscale deep network and decrease the MSE to 1.23×10−5 through adding extra information to match different patterns. Furthermore, the multitask learning trick makes the MSE decrease to 1.18×10−5.

show abstract

Correlation Coefficient Based Cluster Data Preprocessing and LSTM Prediction Model for Time Series Data in Large Aircraft Test Flights

Cited by 6 publications

References 10 publications

Evaluation of nutritional status and clinical depression classification using an explainable machine learning method

Evaluation of nutritional status and clinical depression classification using an explainable machine learning method

Airborne Particulate Matter Modeling: A Comparison of Three Methods Using a Topology Performance Approach

Multiscale deep network based multistep prediction of high‐dimensional time series from power transmission systems

Contact Info

Product

Resources

About