Proceedings of the 15th International Conference on Mining Software Repositories 2018
DOI: 10.1145/3196398.3196408
|View full text |Cite
|
Sign up to set email alerts
|

Learning to mine aligned code and natural language pairs from stack overflow

Abstract: For tasks like code synthesis from natural language, code retrieval, and code summarization, data-driven models have shown great promise. However, creating these models require parallel data between natural language (NL) and code with fine-grained alignments. Stack Overflow (SO) is a promising source to create such a data set: the questions are diverse and most of them have corresponding answers with high quality code snippets. However, existing heuristic methods (e.g., pairing the title of a post with the cod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
138
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 163 publications
(138 citation statements)
references
References 42 publications
0
138
0
Order By: Relevance
“…Although various large scale datasets Sutton, 2013, 2014;Allamanis et al, 2016; to study code generation have been created from Github, their development and test set are randomly created from the same dataset since human curation is prohibitively expensive. Similarly, Yin et al (2018) collect a large dataset from Stackoverflow.com (CoNaLa) for training, but only manage to curate a small portion (∼ 2,900 examples) of single line NL and code snippets for evaluation. We take advantage of nbgrader assignment notebooks to create an inexpensive highquality human-curated test set of 3,725 NL statements with interactive history.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Although various large scale datasets Sutton, 2013, 2014;Allamanis et al, 2016; to study code generation have been created from Github, their development and test set are randomly created from the same dataset since human curation is prohibitively expensive. Similarly, Yin et al (2018) collect a large dataset from Stackoverflow.com (CoNaLa) for training, but only manage to curate a small portion (∼ 2,900 examples) of single line NL and code snippets for evaluation. We take advantage of nbgrader assignment notebooks to create an inexpensive highquality human-curated test set of 3,725 NL statements with interactive history.…”
Section: Related Workmentioning
confidence: 99%
“…Existing tasks for mapping NL to source code primarily use a single NL utterance (Zettlemoyer and Collins, 2005;Iyer et al, 2017) to generate database queries (semantic parsing), single line python code (Yin et al, 2018;Oda et al, 2015), multi-line domain-specific code (Ling et al, 2016;Rabinovich et al, 2017), or sequences of API calls (Gu et al, 2016b). A recent task by on the CONCODE dataset maps a single utterance to an entire method, conditioned on environment variables and methods.…”
Section: Introductionmentioning
confidence: 99%
“…It uses the Floating Parser architecture, which is a grammar-based approach that provides more flexibility without requiring hand-engineering of lexicalized rules like synchronous CFG or CCG based semantic parsers [42]. This approach also provides more interpretable results and requires less training data than neural network approaches (e.g., [51,52]). The parser parses user utterances into expressions in a simple functional DSL we created for PUMICE.…”
Section: Semantic Parsingmentioning
confidence: 99%
“…11.9 y self. max entries = int(max entries) 8.2 CONALA x more pythonic alternative for getting a value in range not using min and max 9.7 y a = 1 if x < 1 else 10 if x > 10 else x 14.1 Figure 2: Sample natural language utterances and meaning representations from datasets used in this work: ATIS for dialogue management; DJANGO (Oda et al, 2015) and CONALA (Yin et al, 2018a) for code generation and summarization.…”
Section: Atismentioning
confidence: 99%
“…is reported 62.3 SNM (Yin and Neubig, 2017) 71.6 COARSE2FINE (Dong and Lapata, 2018) 74 (Hu et al, 2018) 65.9 for parser evaluation based on exact match, and BLEU-4 is adopted for generator evaluation. For the code generation task in CONALA, we use BLEU-4 following the setup in Yin et al (2018a).…”
Section: Experimental Setupsmentioning
confidence: 99%