Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean

Yilmaz, Ahmet; Poli, Riccardo

doi:10.1016/j.neunet.2022.05.030

Cited by 14 publications

(5 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A major challenge in RNNs is the vanishing gradient problem. The gradients of the loss function concerning the parameters of the network become extremely small as they are back-propagated from the output layer to the earlier layers during training 45 . Long Short-Term Memory (LSTM) networks are RNNs specialized to handle the vanishing gradient problem by having a complex memory cell instead of simple recurrent connections 46 .…”

Section: Addressing Temporal Dependence and Feature Selectionmentioning

confidence: 99%

Strategies for overcoming data scarcity, imbalance, and feature selection challenges in machine learning models for predictive maintenance

Hakami

2024

Sci Rep

View full text Add to dashboard Cite

Predictive maintenance harnesses statistical analysis to preemptively identify equipment and system faults, facilitating cost- effective preventive measures. Machine learning algorithms enable comprehensive analysis of historical data, revealing emerging patterns and accurate predictions of impending system failures. Common hurdles in applying ML algorithms to PdM include data scarcity, data imbalance due to few failure instances, and the temporal dependence nature of PdM data. This study proposes an ML-based approach that adapts to these hurdles through the generation of synthetic data, temporal feature extraction, and the creation of failure horizons. The approach employs Generative Adversarial Networks to generate synthetic data and LSTM layers to extract temporal features. ML algorithms trained on the generated data achieved high accuracies: ANN (88.98%), Random Forest (74.15%), Decision Tree (73.82%), KNN (74.02%), and XGBoost (73.93%).

show abstract

Section: Addressing Temporal Dependence and Feature Selectionmentioning

confidence: 99%

Strategies for overcoming data scarcity, imbalance, and feature selection challenges in machine learning models for predictive maintenance

Hakami

2024

Sci Rep

View full text Add to dashboard Cite

show abstract

“…We refer to this benchmark strategy as ''LR'' (Linear Regression). As other benchmark strategies, we use Xavier initialization (Glorot & Bengio, 2010) in all layers, including the last layer, Kaiming initialization (He et al, 2015), LeCun initialization (LeCun et al, 2012), Yilmaz & Poli initialization (Yilmaz & Poli, 2022) and orthogonal initialization (Saxe et al, 2013).…”

Section: Benchmark Strategiesmentioning

confidence: 99%

“…The initial loss of the other three benchmark strategies is large, with a validation RMSE around 90 after the weight initialization. Moreover, just as with the Xavier initialization strategy, the neural network gets stuck in a local optimum with the Lecun (Le-Cun et al, 2012), orthogonal (Saxe et al, 2013) and Yilmaz & Poli (Yilmaz & Poli, 2022) initialization strategies. However, in the end, the neural network converges to a similar final training, validation and test accuracy for all considered benchmark strategies: Yilmax & Poli initialization even has the same final test accuracy.…”

Section: Tablementioning

confidence: 99%

A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers

Pater

Mitici

2023

Neural Networks

View full text Add to dashboard Cite

“…For these reasons, LSTM networks are distinguished from other methods as an alternative machine learning method. They are especially used for long term time series analysis and have emerged as a solution method for these problems [20].…”

Section: Introductionmentioning

confidence: 99%

Multilayer LSTM Model for Wind Power Estimation in the Scada System

ÇELEBİ,

KARAMAN

2023

EJT

View full text Add to dashboard Cite

Wind energy is clean energy that does not pollute the environment. However, the complex and variable operating environment of a wind turbine often makes it difficult to predict the instantaneous active power generated. In this study, a wind turbine active power estimation system based on a short-term memory network (LSTM) using time series analysis is proposed. The data obtained from the wind turbine SCADA system is used as input variables. In the proposed method, a multilayer LSTM architecture is designed to train the model. The first LSTM network consists of 64 units, and the second one consists of 32 units. This is followed by a dense layer consisting of 16 neurons. In the last layer, the architecture is finalized by using a linear activation function for the prediction process. The proposed deep learning (DL)-based LSTM prediction model takes into account environmental factors such as wind speed and wind direction for active power forecasting. The results show that the LSTM-based time series analysis method is capable of effectively capturing time series features among the data. Thus, the proposed architecture can realize high-accuracy active power forecasting.

show abstract

Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean

Cited by 14 publications

References 30 publications

Strategies for overcoming data scarcity, imbalance, and feature selection challenges in machine learning models for predictive maintenance

Strategies for overcoming data scarcity, imbalance, and feature selection challenges in machine learning models for predictive maintenance

A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers

Multilayer LSTM Model for Wind Power Estimation in the Scada System

Contact Info

Product

Resources

About