A Formal Hierarchy of RNN Architectures

Merrill, William; Weiß, Gabriele; Goldberg, Yoav; Schwartz, Roy; Smith, Noah A.; Yahav, Eran

doi:10.18653/v1/2020.acl-main.43

Cited by 38 publications

(36 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…But these theoretical complexities do not have significant effect on real world applications, if parallel processing (e.g., a GPU) is used for running the matrix multiplication. Merrill et al [122] described a useful range between narrow upper and lower bounds of the space complexities for various models of neural networks. The space complexity of RNN, CNN, and HAN is O(1) [122].…”

Section: Time-space Complexities Of Algorithmsmentioning

confidence: 99%

See 1 more Smart Citation

Deep Sentiment Analysis: A Case Study on Stemmed Turkish Twitter Data

Shehu

Sharif

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Sentiment analysis using stemmed Twitter data from various languages is an emerging research topic. In this paper, we address three data augmentation techniques namely Shift, Shuffle, and Hybrid to increase the size of the training data; and then we use three key types of deep learning (DL) models namely recurrent neural network (RNN), convolution neural network (CNN), and hierarchical attention network (HAN) to classify the stemmed Turkish Twitter data for sentiment analysis. The performance of these DL models has been compared with the existing traditional machine learning (TML) models. The performance of TML models has been affected negatively by the stemmed data, but the performance of DL models has been improved greatly with the utilization of the augmentation techniques. Based on the simulation, experimental, and statistical results analysis deeming identical datasets, it has been concluded that the TML models outperform the DL models with respect to both training-time (TTM) and runtime (RTM) complexities of the algorithms; but the DL models outperform the TML models with respect to the most important performance factors as well as the average performance rankings.

show abstract

Section: Time-space Complexities Of Algorithmsmentioning

confidence: 99%

“…Merrill et al [122] described a useful range between narrow upper and lower bounds of the space complexities for various models of neural networks. The space complexity of RNN, CNN, and HAN is O(1) [122]. The DL algorithms (e.g., RNN) can use hidden layer as memory store to learn sequences.…”

Section: Time-space Complexities Of Algorithmsmentioning

confidence: 99%

Deep Sentiment Analysis: A Case Study on Stemmed Turkish Twitter Data

Shehu

Sharif

et al. 2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…In our experiment, as illustrated in figure 3, we simulate a putative divergence of a phonotactic grammar into sub-modules by feeding a corpus of Japanese words into a dynamic probabilistic model that is allowed to fork into two submodels. Mayer (2020), whose one-layer RNN of finite precision has been shown to be unable to learn unattested patterns such as a n b n (Weiss et al, 2018;Merrill et al, 2020). Each cell h i of the RNN is fed (a) a vector-encoding of the input segment x i and (b) the vector output of the previous hidden state h i−1 .…”

Section: The Experimentsmentioning

confidence: 99%

Lexical Strata and Phonotactic Perplexity Minimization

Rosen

2021

AMP

View full text Add to dashboard Cite

We present a model of gradient phonotactics that is shown to reduce overall phoneme uncertainty in a language when the phonotactic grammar is modularized in an unsupervised fashion to create more than one sub-grammar. Our model is a recurrent neural network language model (Elman 1990), which, when applied in two separate, randomly initialized modules to a corpus of Japanese words, learns lexical subdivisions that closely correlate with two of the main lexical strata for Japanese (Yamato and Sino-Japanese) proposed by Ito and Mester (1995). We find that the gradient phonotactics learned by the model, which are based on the entire prior context of a phoneme, reveal a continuum of gradient strata membership, similar to the gradient membership proposed by Hayes (2016) for the Native vs. Latinate stratification in English.

show abstract

“…Many recent works have explored the computational power of RNNs in practical settings. Several works (Merrill et al, 2020), (Weiss et al, 2018) recently studied the ability of RNNs to recognize counter-like languages. The capability of RNNs to recognize strings of balanced parantheses has also been studied (Sennhauser and Berwick, 2018;Skachkova et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

On the Computational Power of Transformers and Its Implications in Sequence Modeling

Bhattamishra

Patel

Goyal

2020

Proceedings of the 24th Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

Transformers are being used extensively across several sequence modeling tasks. Significant research effort has been devoted to experimentally probe the inner workings of Transformers. However, our conceptual and theoretical understanding of their power and inherent limitations is still nascent. In particular, the roles of various components in Transformers such as positional encodings, attention heads, residual connections, and feedforward networks, are not clear. In this paper, we take a step towards answering these questions. We analyze the computational power as captured by Turing-completeness. We first provide an alternate and simpler proof to show that vanilla Transformers are Turing-complete and then we prove that Transformers with only positional masking and without any positional encoding are also Turing-complete. We further analyze the necessity of each component for the Turing-completeness of the network; interestingly, we find that a particular type of residual connection is necessary. We demonstrate the practical implications of our results via experiments on machine translation and synthetic tasks.

show abstract

A Formal Hierarchy of RNN Architectures

Cited by 38 publications

References 20 publications

Deep Sentiment Analysis: A Case Study on Stemmed Turkish Twitter Data

Deep Sentiment Analysis: A Case Study on Stemmed Turkish Twitter Data

Lexical Strata and Phonotactic Perplexity Minimization

On the Computational Power of Transformers and Its Implications in Sequence Modeling

Contact Info

Product

Resources

About