A Review of Recurrent Neural Network Architecture for Sequence Learning: Comparison between LSTM and GRU

S, Nosouhian; Nosouhian, F; A, Kazemi Khoshouei

doi:10.20944/preprints202107.0252.v1

Cited by 33 publications

(13 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A gated recurrent unit (GRU) is an RNN variant that was originally designed to solve the problem of disappearing gradients in standard RNNs [ 15 ]. The structure is shown in Figure 2 .…”

Section: Methodsmentioning

confidence: 99%

Minimalist Deployment of Neural Network Equalizers in a Bandwidth-Limited Optical Wireless Communication System with Knowledge Distillation

Zhu,

Wei,

Chen

et al. 2024

Sensors

View full text Add to dashboard Cite

An equalizer based on a recurrent neural network (RNN), especially with a bidirectional gated recurrent unit (biGRU) structure, is a good choice to deal with nonlinear damage and inter-symbol interference (ISI) in optical communication systems because of its excellent performance in processing time series information. However, its recursive structure prevents the parallelization of the computation, resulting in a low equalization rate. In order to improve the speed without compromising the equalization performance, we propose a minimalist 1D convolutional neural network (CNN) equalizer, which is reconverted from a biGRU with knowledge distillation (KD). In this work, we applied KD to regression problems and explain how KD helps students learn from teachers in solving regression problems. In addition, we compared the biGRU, 1D-CNN after KD and 1D-CNN without KD in terms of Q-factor and equalization velocity. The experimental data showed that the Q-factor of the 1D-CNN increased by 1 dB after KD learning from the biGRU, and KD increased the RoP sensitivity of the 1D-CNN by 0.89 dB with the HD-FEC threshold of 1 × 10−3. At the same time, compared with the biGRU, the proposed 1D-CNN equalizer reduced the computational time consumption by 97% and the number of trainable parameters by 99.3%, with only a 0.5 dB Q-factor penalty. The results demonstrate that the proposed minimalist 1D-CNN equalizer holds significant promise for future practical deployments in optical wireless communication systems.

show abstract

“…A gated recurrent unit (GRU) is an RNN variant that was originally designed to solve the problem of disappearing gradients in standard RNNs [ 15 ]. The structure is shown in Figure 2 .…”

Section: Methodsmentioning

confidence: 99%

Minimalist Deployment of Neural Network Equalizers in a Bandwidth-Limited Optical Wireless Communication System with Knowledge Distillation

Zhu,

Wei,

Chen

et al. 2024

Sensors

View full text Add to dashboard Cite

show abstract

“…Gated Recurrent Unit (GRU) was introduced in 2014 as a solution to LSTM's complexity and as a solution to the vanishing gradient problem [41,42]. Moreover, by implementing gating mechanisms within their networks, GRU and LSTM can capture and propagate information over long sequences.…”

Section: Gated Neural Network (Gru)mentioning

confidence: 99%

“…The Bidirectional Encoder Representations from Transformers (BERT) model has been under development since Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova first published "All Attention You Need" in 2018 at Google Labs [42]. The model is an embedding layer of pre-trained bidirectional representations from a large collection of unsupervised text corpora, including Wikipedia and BookCorpus [42] As shown in Figure 4, the first step for building the BERT-based embedding method is to import the required library HuggingFACE and define the pre-trained BERT model to be used. In our case we took different considerations, firstly, the majority of resumes are written in French, and secondly, the average length of the sequence.…”

Section: The Bidirectional Encoder Representations From Transformers ...mentioning

confidence: 99%

“…The overall steps used in the majority of models of BERT are typically composed of the following steps: tokenization, padding, numericalization, and embedding [42].For the Tokenization, the [ids_token], mask_token], and other specialized tokens are generated, such as "[CLS]" mentioning the start of the sequence; [SEP] for the end of the sequence; [PAD] for padding when the sequences do not have the same length; and "[UNK]" for unknown words in sequences. Since most resumes are in French, we used the un-cased BERT model architecture with 12 layers, 768 hidden nodes, and 12 attention heads with 110M parameters.…”

Section: The Bidirectional Encoder Representations From Transformers ...mentioning

confidence: 99%

See 1 more Smart Citation

Resumes Classification Using Neural Network Approaches Combined with Bert and Gensim: CVS of Moroccan Engineering Students

Qostal,

Moumen,

Lakhrissi

2024

Preprint

View full text Add to dashboard Cite

Deep Learning (DL) oriented document processing is widely used in different fields for extraction, recognition, and classification processes from raw corpora of data. The article examines the application of deep learning approaches, based on different neural network methods, including Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), and Convolutional Neural Networks (CNN). The compared models were combined with two different word embedding techniques, namely: Bidirectional Encoder Representations from Transformers BERT and Gensim Word2Vec. The models are designed to evaluate the performance of architectures based on neural network techniques for the classification of CVs of Moroccan engineering students at ENSAK(National School of Applied Sciences of Kenitra, Ibn Tofail University). The used dataset included resumes collected from engineering students at ENSAK in 2023 for a project on the employability of Moroccan engineers in which new approaches were applied, especially machine learning, deep learning, and big data. Accordingly, 867 resumes were collected from five specialties of study (Electrical Engineering, Networks and Systems Telecommunications, Computer Engineering, Automotive Mechatronics Engineering, Industrial Engineering). The results revealed good performance of the proposed models based on the BERT embedding approach compared to models based on the Gensim Word2Vec embedding approach. Accordingly, the CNN-GRU/Bert model achieved slightly better accuracy with 0.9251 compared to other hybrid models.

show abstract

“…The shortcomings of such a method, are the short memory and vanishing gradient problems [35][36][37]. Moreover, several types of recurrent neural networks have shown success with seizure prediction, such as bidirectional long-short term memory (Bi-LSTM), which solves the problem of short memory by storing sequencing of necessary data and throwing away unneeded data [38][39][40]. Additionally, raw EEG signals are converted into images and used in CNNs, which act as a classifier [41,42], this method is closest to practice by medical practitioners where visual features of seizures could be extracted using various image classifiers such as ImageNet [43] and DenseNet [44].…”

Section: Related Workmentioning

confidence: 99%

General and patient-specific seizure classification using deep neural networks

Massoud

Abdelzaher

Kuhlmann

et al. 2023

Analog Integr Circ Sig Process

View full text Add to dashboard Cite

Seizure prediction algorithms have been central in the field of data analysis for the improvement of epileptic patients’ lives. The most recent advancements of which include the use of deep neural networks to present an optimized, accurate seizure prediction system. This work puts forth deep learning methods to automate the process of epileptic seizure detection with electroencephalogram (EEG) signals as input; both a patient-specific and general approach are followed. EEG signals are time structure series motivating the use of sequence algorithms such as temporal convolutional neural networks (TCNNs), and long short-term memory networks. We then compare this methodology to other prior pre-implemented structures, including our previous work for seizure prediction using machine learning approaches support vector machine and random under-sampling boost. Moreover, patient-specific and general seizure prediction approaches are used to evaluate the performance of the best algorithms. Area under curve (AUC) is used to select the best performing algorithm to account for the imbalanced dataset. The presented TCNN model showed the best patient-specific results than that of the general approach with, AUC of 0.73, while ML model had the best results for general classification with AUC of 0.75.

show abstract

A Review of Recurrent Neural Network Architecture for Sequence Learning: Comparison between LSTM and GRU

Cited by 33 publications

References 19 publications

Minimalist Deployment of Neural Network Equalizers in a Bandwidth-Limited Optical Wireless Communication System with Knowledge Distillation

Minimalist Deployment of Neural Network Equalizers in a Bandwidth-Limited Optical Wireless Communication System with Knowledge Distillation

Resumes Classification Using Neural Network Approaches Combined with Bert and Gensim: CVS of Moroccan Engineering Students

General and patient-specific seizure classification using deep neural networks

Contact Info

Product

Resources

About