Abstract-Nowadays, software developers are increasingly involved in GitHub and StackOverflow, creating a lot of valuable data in the two communities. Researchers mine the information in these software communities to understand developer behaviors, while previous work mainly focuses on mining data within a single community. In this paper, we propose a novel approach to mining developer behaviors across GitHub and StackOverflow. This approach links the accounts from two communities using a CART decision tree, leveraging the features from usernames, user behaviors and writing styles. Then, it explores cross-site developer behaviors through T-graph analysis, LDA-based topics clustering and cross-site tagging. We conducted several experiments to evaluate this approach. The results show that the precision and F-Score of our identity linkage method are higher than previous methods in software communities. Especially, we discovered that (1) active issue committers are also active question askers; (2) for most developers, the topics of their contents in GitHub are similar to that of their questions and answers in StackOverflow; (3) developers' concerns in StackOverflow shift over the time of their current participating projects in GitHub; (4) developers' concerns in GitHub are more relevant to their answers than questions and comments in StackOverflow.
NL2SQL advocates an idea of helping engineers and/or end users generate SQL statements from natural language queries. However, it still remains a strong challenge in improving its precision and scalability. This paper introduces MultiSQL, a multitask deep learning approach to performing NL2SQL. MultiSQL unifies the task representations and trains a model in parallel on multiple tasks, including NL2SQL, machine translation, etc. It employs a multitask question-answering network for jointly learning all tasks and transferring knowledge among tasks. We have evaluated MultiSQL on two query datasets: WikiSQL (an open sourced dataset) and CnSQL (a Chinese dataset we created). The evaluation results clearly show the effectiveness of MultiSQL. In particular, the accuracies achieved by MultiSQL approximate those achieved by the state-of-the-art NL2SQL methods on WikiSQL, and its accuracy is 78%, which is 17% higher than the "Chinese2English + NL2SQL" method on CnSQL.
Nowadays, software developers are increasingly involved in GitHub and StackOverflow, creating a lot of valuable data in the two communities. Researchers mine the information in these software communities to understand developer behaviors, while previous works mainly focus on mining data within a single community. In this paper, we propose a novel approach to developer identity linkage and behavior mining across GitHub and StackOverflow. This approach links the accounts from two communities using a CART decision tree, leveraging the features from usernames, user behaviors and writing styles. Then, it explores cross-site developer behaviors through [Formula: see text]-graph analysis, LDA-based topics clustering and cross-site tagging. We conducted several experiments to evaluate this approach. The results show that the precision and [Formula: see text]-score of our identity linkage method are higher than previous methods in software communities. Especially, we discovered that (1) active issue committers are also active question askers; (2) for most developers, the topics of their contents in GitHub are similar to those of those questions and answers in StackOverflow; (3) developers’ concerns in StackOverflow shift over the time of their current participating projects in GitHub; (4) developers’ concerns in GitHub are more relevant to their answers than questions and comments in StackOverflow.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.