We introduce Codex, a GPT language model finetuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image↔text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depends on the compute budget through a power-law, with exponents that are nearly universal across all data domains. The cross-entropy loss has an information theoretic interpretation as S(True) + D KL (True||Model), and the empirical scaling laws suggest a prediction for both the true data distribution's entropy and the KL divergence between the true and model distributions. With this interpretation, billion-parameter Transformers are nearly perfect models of the YFCC100M image distribution downsampled to an 8 × 8 resolution, and we can forecast the model size needed to achieve any given reducible loss (ie D KL ) in nats/image for other resolutions. We find a number of additional scaling laws in specific domains: (a) we identify a scaling relation for the mutual information between captions and images in multimodal models, and show how to answer the question "Is a picture worth a thousand words?"; (b) in the case of mathematical problem solving, we identify scaling laws for model performance when extrapolating beyond the training distribution; (c) we finetune generative image models for ImageNet classification and find smooth scaling of the classification loss and error rate, even as the generative loss levels off. Taken together, these results strengthen the case that scaling laws have important implications for neural network performance, including on downstream tasks.
This is the second in a series of papers on rank decompositions of the matrix multiplication tensor. We present new rank 23 decompositions for the 3×3 matrix multiplication tensor M ⟨3⟩ . All our decompositions have symmetry groups that include the standard cyclic permutation of factors but otherwise exhibit a range of behavior. One of them has 11 cubes as summands and admits an unexpected symmetry group of order 12.We establish basic information regarding symmetry groups of decompositions and outline two approaches for finding new rank decompositions of M ⟨n⟩ for larger n.PO Box 7311,
We generalize two main theorems of matching polynomials of undirected simple graphs, namely, real-rootedness and the Heilmann-Lieb root bound. Viewing the matching polynomial of a graph G as the independence polynomial of the line graph of G, we determine conditions for the extension of these theorems to the independence polynomial of any graph. In particular, we show that a stability-like property of the multivariate independence polynomial characterizes claw-freeness. Finally, we give and extend multivariate versions of Godsil's theorems on the divisibility of matching polynomials of trees related to G.
The Generalized Lax Conjecture asks whether every hyperbolicity cone is a section of a semidefinite cone of sufficiently high dimension. We prove that the space of hyperbolicity cones of hyperbolic polynomials of degree d in n variables contains (n/d) Ω(d) pairwise distant cones in the Hausdorff metric, and therefore that any semidefinite representation of such polynomials must have dimension at least (n/d) Ω(d) (even allowing a small approximation error). The cones are perturbations of the hyperbolicity cones of elementary symmetric polynomials. Our proof contains several ingredients of independent interest, including the identification of a large subspace in which the elementary symmetric polynomials lie in the relative interior of the set of hyperbolic polynomials, and a quantitative generalization of the fact that a real-rooted polynomial with two consecutive zero coefficients must have a high multiplicity root at zero.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.