Code prediction, more specifically autocomplete, has become an essential feature in modern IDEs. Autocomplete is more effective when the desired next token is at (or close to) the top of the list of potential completions offered by the IDE at cursor position. This is where the strength of the underlying machine learning system that produces a ranked order of potential completions comes into play.We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. Our work uses Transformers as the base neural architecture. We show that by making the Transformer architecture aware of the syntactic structure of code, we increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of several state-of-the-art next token prediction systems by margins ranging from 14% to 18%.We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a company internal Python corpus. Our code and data preparation pipeline will be available in open source.
We approach the problem of generalizing pretrained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subwordlevel word vector generation model that views words as bags of character n-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves stateof-the-art performances in English word similarity task and in joint prediction of part-ofspeech tag and morphosyntactic attributes in 23 languages, suggesting our model's ability in capturing the relationship between words' textual representations and their embeddings.
We study the problem of approximately counting matchings in hypergraphs of bounded maximum degree and maximum size of hyperedges. With an activity parameter λ, each matching M is assigned a weight λ |M | . The counting problem is formulated as computing a partition function that gives the sum of the weights of all matchings in a hypergraph. This problem unifies two extensively studied statistical physics models in approximate counting: the hardcore model (graph independent sets) and the monomer-dimer model (graph matchings).For this model, the critical activity λ c = d d k(d−1) d+1 is the threshold for the uniqueness of Gibbs measures on the infinite (d + 1)-uniform (k + 1)-regular hypertree. Consider hypergraphs of maximum degree at most k + 1 and maximum size of hyperedges at most d + 1. We show that when λ < λ c , there is an FPTAS for computing the partition function; and when λ = λ c , there is a PTAS for computing the log-partition function. These algorithms are based on the decay of correlation (strong spatial mixing) property of Gibbs distributions. When λ > 2λ c , there is no PRAS for the partition function or the log-partition function unless NP=RP.Towards obtaining a sharp transition of computational complexity of approximate counting, we study the local convergence from a sequence of finite hypergraphs to the infinite lattice with specified symmetry. We show a surprising connection between the local convergence and the reversibility of a natural random walk. This leads us to a barrier for the hardness result: The non-uniqueness of infinite Gibbs measure is not realizable by any finite gadgets.
Understanding the factors influencing tree productivity is central to forest ecology. However, the relative contributions of neighborhood interactions, tree species diversity, and tree size to larch (Larix principis-rupprechtii) productivity require further study. Three plots in the Guandi Mountains, Shanxi Province, were set up for each of the following forest types: natural pure larch forest (PL), mixed larch and birch (Betula platyphylla) forest (LB), and mixed larch and spruce (Picea asperata) forest (LS). Based on the tree size-stratified sampling method, a total of 318 tree core samples were collected. A linear mixed model was used to analyze the effects of tree size, dominance, mixing, and neighborhood competition on larch productivity. Birch and spruce promoted larch growth at the stand and individual tree levels, and birch exhibited a more significant facilitating effect. Intraspecific competition was the main factor affecting larch growth. When the intensity of competition among trees was low, the basal area increment (BAI) of larch in the mixed forests was higher than that in the pure forest. However, with increasing competition, the BAI of larch was lower in the mixed forests than in the pure forest. Factors including tree size, dominance, and mingling were positively correlated with the BAI of larch. With increasing tree size, the BAI of larch was higher in the mixed forests than in the pure forest and higher in LB than in LS. When the dominance was less than 0.5, the BAI of larch was higher in the pure forest than in the mixed forests and higher in LS than in LB. With increasing dominance, the BAI of larch was higher in the mixed forests than in the pure forest. The BAI of larch increased with an increasing mixing degree in the mixed forests, and the increasing trend of BAI was larger in LB than in LS. Larch productivity was influenced mainly by neighborhood interactions and stand structure. Improving neighborhood tree diversity and increasing the large tree proportion and dominance of larch will be helpful for improving larch productivity in mixed forests.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.