A Neural Syntactic Language Model

Emami, Ahmad; Jelinek, F.

doi:10.1007/s10994-005-0916-y

Cited by 49 publications

(36 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, we could mention exponential models (such as "model M" (Chen, 2009)), syntactic models (Emami and Jelinek, 2005), Bayesian models (Teh, 2006), etc. For the sake of brevity, we omit further discussion of such techniques because they are largely orthogonal to deep learning.…”

Section: Deep Learning For Asr Language Modellingmentioning

confidence: 99%

State of the art in statistical methods for language and speech processing

Bellegarda

Monz

2016

Computer Speech & Language

View full text Add to dashboard Cite

Recent years have seen rapid growth in the deployment of statistical methods for computational language and speech processing. The current popularity of such methods can be traced to the convergence of several factors, including the increasing amount of data now accessible, sustained advances in computing power and storage capabilities, and ongoing improvements in machine learning algorithms. The purpose of this contribution is to review the state of the art in both areas, point out the top trends in statistical modelling across a wide range of problems, and identify their most salient characteristics. The paper concludes with some prognostications regarding the likely impact on the field going forward.

show abstract

Section: Deep Learning For Asr Language Modellingmentioning

confidence: 99%

State of the art in statistical methods for language and speech processing

Bellegarda

Monz

2016

Computer Speech & Language

View full text Add to dashboard Cite

show abstract

“…A well established efficiency trick assigns each possible output to a unique class and then uses a two-step process to find the probability of an MTU, instead of computing the probability of all possible outputs (Goodman, 2001;Emami and Jelinek, 2005;Mikolov et al, 2011b). Under this scheme we compute the probability of an MTU by multiplying the probability of its class c i t with the probability of the This factorization reduces the complexity of computing the output probabilities from O(|V |) to O(|C| + max i |c i |) where |C| is the number of classes and |c i | is the number of minimal units in class c i .…”

Section: Atomic Mtu Rnn Modelmentioning

confidence: 99%

Minimum Translation Modeling with Recurrent Neural Networks

Hu¹,

Auli²,

Gao³

et al. 2014

Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

We introduce recurrent neural networkbased Minimum Translation Unit (MTU) models which make predictions based on an unbounded history of previous bilingual contexts. Traditional back-off n-gram models suffer under the sparse nature of MTUs which makes estimation of highorder sequence models challenging. We tackle the sparsity problem by modeling MTUs both as bags-of-words and as a sequence of individual source and target words. Our best results improve the output of a phrase-based statistical machine translation system trained on WMT 2012 French-English data by up to 1.5 BLEU, and we outperform the traditional n-gram based MTU approach by up to 0.8 BLEU.

show abstract

“…Many existing formulations of (symbolic and connectionist) language processing models belong to this framework (e.g., [20] and the review therein), in the sense that they are classifiers for input sequences.…”

Section: B Recursive Temporal Abstraction For Sensory Inputsmentioning

confidence: 99%

Brain-Like Emergent Temporal Processing: Emergent Open States

Weng

Luciw

Zhang

2013

IEEE Trans. Auton. Mental Dev.

View full text Add to dashboard Cite

Abstract-Informed by brain anatomical studies, we present the Developmental Network (DN) theory on brain-like temporal information processing. The states of the brain are at its effector end, emergent and open. A Finite Automaton (FA) is considered an external symbolic model of brain's temporal behaviors but the FA uses handcrafted states and is without "internal" representations. The term "internal" means inside the network "skull". Using action-based state equivalence and the emergent state representations, the time driven processing of DN performs state-based abstraction and state-based skill transfer. Each state of DN, as a set of actions, is openly observable by the external environment (including teachers). Thus, the external environment can teach the state at every frame time. Through incremental learning and autonomous practice, the DN lumps (abstracts) infinitely many temporal context sequences into a single equivalent state. Using this state equivalence, a skill learned under one sequence is automatically transferred to other infinitely many state-equivalent sequences in the future without the need for explicit learning. Two experiments are shown as examples: The experiments for video processing showed almost perfect recognition rates in disjoint tests. The experiment for text language, using corpora from the Wall Street Journal, treated semantics and syntax in a unified interactive way.

show abstract

A Neural Syntactic Language Model

Cited by 49 publications

References 31 publications

State of the art in statistical methods for language and speech processing

State of the art in statistical methods for language and speech processing

Minimum Translation Modeling with Recurrent Neural Networks

Brain-Like Emergent Temporal Processing: Emergent Open States

Contact Info

Product

Resources

About