Transformer-based QA models use input-wide self-attention -i.e. across both the question and the input passage -at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without inputwide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at https://github.com/ StonyBrookNLP/deformer.
Ground-penetrating radar (GPR) has been used for asphalt concrete (AC) pavement density prediction for the past two decades. Recently, it has been considered as a method for pavement quality control and quality assurance. A numerical method to estimate asphalt pavement specific gravity from its dielectric properties was developed and validated. A three-phase numerical model considering aggregate, binder, and air void components was developed using an AC mixture generation algorithm. A take-and-add algorithm was used to generate the uneven air-void distribution in the three-phase model. The proposed three-phase model is capable of correlating pavement density and bulk and component dielectric properties. The model was validated using field data. Two methods were used to calculate the dielectric constant of the AC mixture, including reflection amplitude and two-way travel time methods. These were simulated and compared when vertical and longitudinal heterogeneity existed within the AC pavement layers. Results indicate that the reflection amplitude method is more sensitive to surface thin layers than the two-way travel time methods. Effect of air-void content, asphalt content, aggregate gradation, and aggregate dielectric constants on the GPR measurements were studied using the numerical model.
Accurate and reliable measurement of energy consumption is critical for making wellinformed design choices when choosing and training large scale NLP models. In this work, we show that existing software-based energy measurements are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption. We conduct energy measurement experiments with four different models for a question answering task. We quantify the error of existing software based energy measurements by using a hardware power meter that provides highly accurate energy measurements. Our key takeaway is the need for a more accurate energy estimation model that takes into account hardware variabilities and the non-linear relationship between resource utilization and energy consumption. We release the code and data at https://github. com/csarron/sustainlp2020-energy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.