2021
DOI: 10.5281/zenodo.4451357
|View full text |Cite
|
Sign up to set email alerts
|

PyTorrent: A Python Library Corpus for Large-scale Language Models

Mehdi Bahrami,
Lei Liu,
Yuji Mizobuchi
et al.

Abstract: A large scale collection of both semantic and natural language resources is essential to leverage active Software Engineering research areas such as code reuse and code comprehensibility. Existing machine learning models ingest data from Open Source repositories (like GitHub projects) and forum discussions (like Stackoverflow.com), whereas, in this showcase, we took a step backward to orchestrate a corpus titled PyTorrent that contains 218,814 Python package libraries from PyPI and Anaconda environment. This i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…We also previously published PyTorrent [4] which includes 218,814 Python software package with more than 655M Line of Codes (LoC). PyTorrent is made available public here [2], [3] and a Python language model [1] 6 .…”
Section: Howmentioning
confidence: 99%
“…We also previously published PyTorrent [4] which includes 218,814 Python software package with more than 655M Line of Codes (LoC). PyTorrent is made available public here [2], [3] and a Python language model [1] 6 .…”
Section: Howmentioning
confidence: 99%