Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency. Non-AutoRegressive Translation (NART) models were proposed to reduce the inference time, but could only achieve inferior translation accuracy. In this paper, we proposed a novel approach to leveraging the hints from hidden states and word alignments to help the training of NART models. The results achieve significant improvement over previous NART models for the WMT14 En-De and De-En datasets and are even comparable to a strong LSTMbased ART baseline but one order of magnitude faster in inference. Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
The Transformer architecture is widely used in natural language processing. Despite its success, the design principle of the Transformer remains elusive. In this paper, we provide a novel perspective towards understanding the architecture: we show that the Transformer can be mathematically interpreted as a numerical Ordinary Differential Equation (ODE) solver for a convection-diffusion equation in a multi-particle dynamic system. In particular, how words in a sentence are abstracted into contexts by passing through the layers of the Transformer can be interpreted as approximating multiple particles' movement in the space using the Lie-Trotter splitting scheme and the Euler's method. Given this ODE's perspective, the rich literature of numerical analysis can be brought to guide us in designing effective structures beyond the Transformer. As an example, we propose to replace the Lie-Trotter splitting scheme by the Strang-Marchuk splitting scheme, a scheme that is more commonly used and with much lower local truncation errors. The Strang-Marchuk splitting scheme suggests that the self-attention and position-wise feed-forward network (FFN) sub-layers should not be treated equally. Instead, in each layer, two position-wise FFN sub-layers should be used, and the self-attention sub-layer is placed in between. This leads to a brand new architecture. Such an FFN-attention-FFN layer is "Macaron-like", and thus we call the network with this new architecture the Macaron Net. Through extensive experiments, we show that the Macaron Net is superior to the Transformer on both supervised and unsupervised learning tasks. The reproducible codes and pretrained models can be found at https://github.com/zhuohan123/macaron-net
An increase of static friction during stationary contacts of two solids due to interfacial chemical bonding has been reported in multiple experiments. However, the physics underlying such frictional aging is still not fully understood because it involves multiple physical and chemical effects coupled with each other, making direct interpretation of experimental results difficult. Here, we develop a multiphysics chemical aging model that combines contact mechanics, mechanochemistry, and interfacial chemical reaction kinetics. Our model predicts that aging is proportional to normal loads in a low-load regime and becomes nonlinear at higher loads. We also discovered a nonmonotonic temperature dependence of aging with a peak near room temperature. In addition, our simulations provide insights into contributions from specific physical/chemical effects on the overall aging. Our model shows quantitative agreement with available single-asperity experiments on silica-silica interfaces, and it provides a framework for building a chemical aging model for other material systems with arbitrary types of physical and chemical effects involved. Main textSolid-solid frictional interfaces can undergo significant evolution over the time they are held in a stationary contact prior to sliding. This so-called frictional aging [1-5] is known to play a critical role in nucleation and recurrence of earthquakes [5], and also has a large influence on the performance and durability of microelectromechanical systems [6][7][8]. In general, aging has been attributed either to a change in contact area due to plastic deformation and/or to the change in quality of the interface due to chemical strengthening of the interface. In this study we focus on the role of chemical aging in friction. Possible mechanisms behind this phenomenon discussed previously in literature include formation of covalent bonds [9] and capillary condensation [10]. Chemical aging in friction was isolated for the first time by Li et al. [9] in atomic force microscopy (AFM) experiments. In this work, the authors reported a logarithmic increase of static friction with the hold time between an amorphous silica tip and an amorphous silica substrate. The underlying mechanism was later revealed by a theoretical study [11] which showed that formation of siloxane bonds across the hydroxylated silica-silica interface [12] alone can lead to the logarithmic aging based on the following reaction Si-OH + Si-OH = Si-O-Si + H 2 O Recently, AFM experiments by Tian et al [13] revealed that the amount of frictional aging increases linearly with the applied normal load. This linear dependence was attributed to the contact mechanics effect, i.e., to an almost linear relationship between the contact area and the normal load at low loads. This explanation is plausible, however, if contact mechanics truly plays an important role in aging, there should be a non-linear dependence of aging on normal load at high loads, which effect was not observed within the range of normal loads reported in Ref. [...
Macroscale rate and state friction (RSF) laws include a memory distance, D c, which is considered to be the distance required for a population of frictional contacts to renew itself via slip, counteracting the effects of aging in slow or static contact. This concept connects static friction and kinetic friction. Here, we use atomic force microscopy to study interfacial chemical bond-induced kinetic friction and the memory distance at the nanoscale for single silica–silica nanocontacts. We observe a logarithmic trend of decreasing friction with sliding velocity (i.e., velocity-weakening) at low velocities and a transition to increasing friction with velocity at higher velocities (i.e., velocity-strengthening). We propose a physically based kinetic model for the nanoscale memory effect, the “activation-passivation loop” model, which accounts for the activation and passivation of chemical reaction sites and the formation of new chemical bonds from dangling bonds during sliding. In the model, we define the memory distance to be the average sliding distance that accrues before an activated reaction site becomes passivated. Results from numerical simulations based on this model match experimental friction data well in the velocity-weakening regime and show that D c is sensitive to the surface chemistry, and nearly independent of sliding velocity. The simulations also show values of D c that are consistent with those obtained from the experiments. We propose a semiquantitative physical explanation of the observed logarithmic velocity-weakening behavior based on the conservation of the number of interfacial bonds during sliding. We also extract from the experimental data physically reasonable values of the energy barriers to the activation of reaction sites. Our results provide one possible physical mechanism for the nanoscale memory distance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.