Deep feedforward and recurrent networks have achieved impressive results in many perception and language processing applications. This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks. The main motivation for these architectural innovations is that they capture better domain knowledge, and importantly are easier to optimize than more basic architectures. Recently, more complex architectures such as Neural Turing Machines and Memory Networks have been proposed for tasks including question answering and general computation, creating a new set of optimization challenges. In this paper, we discuss a low-overhead and easy-to-implement technique of adding gradient noise which we find to be surprisingly effective when training these very deep architectures. The technique not only helps to avoid overfitting, but also can result in lower training loss. This method alone allows a fully-connected 20-layer deep network to be trained with standard gradient descent, even starting from a poor initialization. We see consistent improvements for many complex models, including a 72% relative reduction in error rate over a carefully-tuned baseline on a challenging question-answering task, and a doubling of the number of accurate binary multiplication models learned across 7,000 random restarts. We encourage further application of this technique to additional complex modern architectures.
Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks. In this paper, we describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium to learn meaningful representations for downstream tasks. Inspired by recent advances in noise regularization, our pre-training objective is based on denoising. Relying on the well-known link between denoising autoencoders and score-matching, we also show that the objective corresponds to learning a molecular force field -arising from approximating the physical state distribution with a mixture of Gaussians -directly from equilibrium structures. Our experiments demonstrate that using this pre-training objective significantly improves performance on multiple benchmarks, achieving a new state-of-the-art on the majority of targets in the widely used QM9 dataset. Our analysis then provides practical insights into the effects of different factors -dataset sizes, model size and architecture, and the choice of upstream and downstream datasets -on pre-training.
We have deposited YBa2Cu3O7−δ(YBCO) films with low microwave surface resistance (Rs) on 5-cm-diam, oxide-buffered sapphire substrates by planar magnetron sputtering. MgO buffer layers are used on M-plane (101̄0) sapphire, and R-plane (11̄02) sapphire is buffered by CeO2. Rs values of 450–620 μΩ at 77 K and 10 GHz were measured across an entire 5-cm diam YBCO film on M-plane sapphire. For YBCO on R-plane sapphire, Rs values at 77 K and 10 GHz were 950 μΩ for a 5-cm-diam wafer and 700 μΩ for 1×1 cm2 samples.
Recent progress with tailored growth and post-process sorting enables carbon nanotube (CNT) assemblies with predominantly metallic or semi-conducting concentrations. Cryogenic and microwave measurements performed here show transport dimensionality and overall order increasing with increasing metallic concentration, even in atmospheric doping conditions. By 120 GHz, the conductivity of predominantly semi-conducting assemblies grew to 400% its DC value at an increasing growth rate, while other concentrations a growth rate that tapered off. A generalized Drude model fits to the different frequency dependent behaviors and yields useful quality control parameters such as plasma frequency, mean free path, and degree of localization. As one of the first demonstrations of waveguides fabricated from this material, sorted CNTs from both as-made and post-process sources were inserted into sections of practical micro-strip. With both sources, sorted CNT micro-strip increasingly outperformed the unsorted with increasing frequency-- illustrating that sorted CNT assemblies will be important for high frequency applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.