Jinman Zhao scite author profile

Code prediction, more specifically autocomplete, has become an essential feature in modern IDEs. Autocomplete is more effective when the desired next token is at (or close to) the top of the list of potential completions offered by the IDE at cursor position. This is where the strength of the underlying machine learning system that produces a ranked order of potential completions comes into play.We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. Our work uses Transformers as the base neural architecture. We show that by making the Transformer architecture aware of the syntactic structure of code, we increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of several state-of-the-art next token prediction systems by margins ranging from 14% to 18%.We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a company internal Python corpus. Our code and data preparation pipeline will be available in open source.

show abstract

Generalizing Word Embeddings using Bag of Subwords

Zhao¹,

Mudgal²,

Liang³

2018

View full text Add to dashboard Cite

We approach the problem of generalizing pretrained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subwordlevel word vector generation model that views words as bags of character n-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves stateof-the-art performances in English word similarity task and in joint prediction of part-ofspeech tag and morphosyntactic attributes in 23 languages, suggesting our model's ability in capturing the relationship between words' textual representations and their embeddings.

show abstract

Code Prediction by Feeding Trees to Transformers

Kim¹,

Zhao²,

Tian³

et al. 2020

Preprint

View full text Add to dashboard Cite

Counting hypergraph matchings up to uniqueness threshold

Song

Yin

Zhao

2019

Information and Computation

View full text Add to dashboard Cite

We study the problem of approximately counting matchings in hypergraphs of bounded maximum degree and maximum size of hyperedges. With an activity parameter λ, each matching M is assigned a weight λ |M | . The counting problem is formulated as computing a partition function that gives the sum of the weights of all matchings in a hypergraph. This problem unifies two extensively studied statistical physics models in approximate counting: the hardcore model (graph independent sets) and the monomer-dimer model (graph matchings).For this model, the critical activity λ c = d d k(d−1) d+1 is the threshold for the uniqueness of Gibbs measures on the infinite (d + 1)-uniform (k + 1)-regular hypertree. Consider hypergraphs of maximum degree at most k + 1 and maximum size of hyperedges at most d + 1. We show that when λ < λ c , there is an FPTAS for computing the partition function; and when λ = λ c , there is a PTAS for computing the log-partition function. These algorithms are based on the decay of correlation (strong spatial mixing) property of Gibbs distributions. When λ > 2λ c , there is no PRAS for the partition function or the log-partition function unless NP=RP.Towards obtaining a sharp transition of computational complexity of approximate counting, we study the local convergence from a sequence of finite hypergraphs to the infinite lattice with specified symmetry. We show a surprising connection between the local convergence and the reversibility of a natural random walk. This leads us to a barrier for the hardness result: The non-uniqueness of infinite Gibbs measure is not realizable by any finite gadgets.

show abstract

Effects of Neighborhood Competition and Stand Structure on the Productivity of Pure and Mixed Larix principis-rupprechtii Forests

Zhang

Zhao

Cheng

et al. 2022

Forests

View full text Add to dashboard Cite

Understanding the factors influencing tree productivity is central to forest ecology. However, the relative contributions of neighborhood interactions, tree species diversity, and tree size to larch (Larix principis-rupprechtii) productivity require further study. Three plots in the Guandi Mountains, Shanxi Province, were set up for each of the following forest types: natural pure larch forest (PL), mixed larch and birch (Betula platyphylla) forest (LB), and mixed larch and spruce (Picea asperata) forest (LS). Based on the tree size-stratified sampling method, a total of 318 tree core samples were collected. A linear mixed model was used to analyze the effects of tree size, dominance, mixing, and neighborhood competition on larch productivity. Birch and spruce promoted larch growth at the stand and individual tree levels, and birch exhibited a more significant facilitating effect. Intraspecific competition was the main factor affecting larch growth. When the intensity of competition among trees was low, the basal area increment (BAI) of larch in the mixed forests was higher than that in the pure forest. However, with increasing competition, the BAI of larch was lower in the mixed forests than in the pure forest. Factors including tree size, dominance, and mingling were positively correlated with the BAI of larch. With increasing tree size, the BAI of larch was higher in the mixed forests than in the pure forest and higher in LB than in LS. When the dominance was less than 0.5, the BAI of larch was higher in the pure forest than in the mixed forests and higher in LS than in LB. With increasing dominance, the BAI of larch was higher in the mixed forests than in the pure forest. The BAI of larch increased with an increasing mixing degree in the mixed forests, and the increasing trend of BAI was larger in LB than in LS. Larch productivity was influenced mainly by neighborhood interactions and stand structure. Improving neighborhood tree diversity and increasing the large tree proportion and dominance of larch will be helpful for improving larch productivity in mixed forests.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jinman Zhao

Code Prediction by Feeding Trees to Transformers

Generalizing Word Embeddings using Bag of Subwords

Code Prediction by Feeding Trees to Transformers

Counting hypergraph matchings up to uniqueness threshold

Effects of Neighborhood Competition and Stand Structure on the Productivity of Pure and Mixed Larix principis-rupprechtii Forests

Contact Info

Product

Resources

About