Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very good generalization performance in the over-parameterization regime, where DNNs can easily fit a random labeling of the training data. Very recently, a line of work explains in theory that with over-parameterization and proper random initialization, gradient-based methods can find the global minima of the training loss for DNNs. However, existing generalization error bounds are unable to explain the good generalization performance of over-parameterized DNNs. The major limitation of most existing generalization bounds is that they are based on uniform convergence and are independent of the training algorithm. In this work, we derive an algorithm-dependent generalization error bound for deep ReLU networks, and show that under certain assumptions on the data distribution, gradient descent (GD) with proper random initialization is able to train a sufficiently over-parameterized DNN to achieve arbitrarily small generalization error. Our work sheds light on explaining the good generalization performance of over-parameterized deep neural networks.
We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for a broad family of loss functions, with proper random weight initialization, both gradient descent and stochastic gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under mild assumption on the training data. The key idea of our proof is that Gaussian random initialization followed by (stochastic) gradient descent produces a sequence of iterates that stay inside a small perturbation region centering around the initial weights, in which the empirical loss function of deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of (stochastic) gradient descent. Our theoretical results shed light on understanding the optimization for deep learning, and pave the way for studying the optimization dynamics of training modern deep neural networks.
We introduce our efforts towards building a universal neural machine translation (NMT) system capable of translating between any language pair. We set a milestone towards this goal by building a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples. Our system demonstrates effective transfer learning ability, significantly improving translation quality of low-resource languages, while keeping high-resource language translation quality on-par with competitive bilingual baselines. We provide indepth analysis of various aspects of model building that are crucial to achieving quality and practicality in universal NMT. While we prototype a high-quality universal translation system, our extensive empirical analysis exposes issues that need to be further addressed, and we suggest directions for future research.
Estrogen has well-documented neuroprotective effects in a variety of clinical and experimental disorders of the CNS, including autoimmune inflammation, traumatic injury, stroke, and neurodegenerative diseases. The beneficial effects of estrogens in CNS disorders include mitigation of clinical symptoms, as well as attenuation of histopathological signs of neurodegeneration and inflammation. The cellular mechanisms that underlie these CNS effects of estrogens are uncertain, because a number of different cell types express estrogen receptors in the peripheral immune system and the CNS. Here, we investigated the potential roles of two endogenous CNS cell types in estrogen-mediated neuroprotection. We selectively deleted estrogen receptor-α (ERα) from either neurons or astrocytes using well-characterized Cre-loxP systems for conditional gene knockout in mice, and studied the effects of these conditional gene deletions on ERα ligand-mediated neuroprotective effects in a wellcharacterized model of adoptive experimental autoimmune encephalomyelitis (EAE). We found that the pronounced and significant neuroprotective effects of systemic treatment with ERα ligand on clinical function, CNS inflammation, and axonal loss during EAE were completely prevented by conditional deletion of ERα from astrocytes, whereas conditional deletion of ERα from neurons had no significant effect. These findings show that signaling through ERα in astrocytes, but not through ERα in neurons, is essential for the beneficial effects of ERα ligand in EAE. Our findings reveal a unique cellular mechanism for estrogen-mediated CNS neuroprotective effects by signaling through astrocytes, and have implications for understanding the pathophysiology of sex hormone effects in diverse CNS disorders.multiple sclerosis | astrogliosis | conditional knockout T he female sex hormone, estrogen, is neuroprotective in many clinical and experimental CNS disorders, including autoimmune conditions such as multiple sclerosis (MS), neurodegenerative conditions such as Alzheimer's and Parkinson diseases, and traumatic injury and stroke (1-4). Estrogen treatment has been shown to ameliorate clinical disease and decrease neuropathology in these disease models (1-4). Pharmacological studies have suggested roles for different estrogen receptors, but the cell types that mediate neuroprotective effects of estrogen are not known for any experimental or clinical condition. Identifying cells that bear specific estrogen receptor subtypes and are essential for specific estrogen-mediated effects is fundamental to elucidating and therapeutically exploiting the mechanisms that underlie estrogen-mediated neuroprotection. Toward this end, we used a genetic loss-of-function strategy. We selectively deleted estrogen receptor-α (ERα) from two different CNS cell types, neurons and astrocytes, and then determined the effects of these conditional gene deletions on the ability of ERα-ligand treatment to ameliorate disease severity of experimental autoimmune encephalomyelitis (EAE) in mice.EAE i...
End-to-end Speech Translation (ST) models have many potential advantages when compared to the cascade of Automatic Speech Recognition (ASR) and text Machine Translation (MT) models, including lowered inference latency and the avoidance of error compounding. However, the quality of end-to-end ST is often limited by a paucity of training data, since it is difficult to collect large parallel corpora of speech and translated transcript pairs. Previous studies have proposed the use of pre-trained components and multi-task learning in order to benefit from weakly supervised training data, such as speech-totranscript or text-to-foreign-text pairs. In this paper, we demonstrate that using pre-trained MT or text-to-speech (TTS) synthesis models to convert weakly supervised data into speech-to-translation pairs for ST training can be more effective than multi-task learning. Furthermore, we demonstrate that a high quality end-to-end ST model can be trained using only weakly supervised datasets, and that synthetic data sourced from unlabeled monolingual text or speech can be used to improve performance. Finally, we discuss methods for avoiding overfitting to synthetic speech with a quantitative ablation study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.