PyTorrent: A Python Library Corpus for Large-scale Language Models

Bahrami, Mehdi; Liu, Lei; Mizobuchi, Yuji; Fukuyori, Masahiro; Munakata, Kazuki; Menzies, Tim

doi:10.5281/zenodo.4451357

Search citation statements

Order By: Relevance

Paper Sections

Select...

How1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

Publication Types

Select...

Other1

Relationship

Self Cite1

Independent0

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also previously published PyTorrent [4] which includes 218,814 Python software package with more than 655M Line of Codes (LoC). PyTorrent is made available public here [2], [3] and a Python language model [1] 6 .…”

Section: Howmentioning

confidence: 99%

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Bahrami,

Shrikanth,

Mizobuchi

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from searching snippet codes to function codes. In this paper, we introduce Augmented Code (AugmentedCode) retrieval which takes advantage of existing information within the code and constructs augmented programming language to improve the code retrieval models' performance. We curated a large corpus of Python and showcased the the framework and the results of augmented programming language which outperforms on CodeSearchNet and CodeBERT with a Mean Reciprocal Rank (MRR) of 0.73 and 0.96, respectively. The outperformed fine-tuned augmented code retrieval model is published in HuggingFace at https://huggingface.co/Fujitsu/AugCode and a demonstration video is available at: https://youtu.be/mnZrUTANjGs. * Equal contributionPreprint. Under review.

show abstract

Section: Howmentioning

confidence: 99%