Abstract. This paper will discuss the motivations and methods for collecting quantitative data about free, libre and open source (FLOSS) software projects. The paper also describes the current state of the art in collecting this data, and some of the problems with this process. Finally, the paper outlines the challenges data miners should look forward to when trying to improve the usefulness of their quantitative data streams.
Much of the data about free, libre, and open source (FLOSS) software development comes from studies of code repositories used for managing projects. This paper presents a method for integrating data about open source projects by way of matching projects (entities) and deleting duplicates across multiple code repositories. After a review of the relevant literature, a few of the methods are chosen and applied to the FLOSS domain, including a simple scoring system for confidence in pairwise project matches. Finally, the paper describes limitations of this approach and recommendations for future work.
This paper introduces a collaborative project OSSmole which collects, shares, and stores comparable data and analyses of free, libre and open source software (FLOSS) development for research purposes. The project is a clearinghouse for data from the ongoing collection and analysis efforts of many disparate research groups. A collaborative data repository reduces duplication and promote compatibility both across sources of FLOSS data and across research groups and analyses. The primary objective of OSSmole is to mine FLOSS source code repositories and provide the resulting data and summary analyses as open source products. However, the OSSmole data model additionally supports donated raw and summary data from a variety of open source researchers and other software repositories. The paper first outlines current difficulties with the typical quantitative FLOSS research process and uses these to develop requirements for such a collaborative data repository. Finally, the design of the OSSmole system is presented, as well as examples of current research and analyses using OSSmole.
This chapter explores the motivations and methods for mining (collecting, aggregating, distributing, and analyzing) data about free/libre open source software (FLOSS) projects. It first explores why there is a need for this type of data. Then the chapter outlines the current state-of-the art in collecting and using quantitative data about FLOSS project, focusing especially on the three main types of FLOSS data that have been gathered to date: data from large forges, data from small project sets, and survey data. Finally, the chapter will describe some possible areas for improvement and recommendations for the future of FLOSS data collection.
This article introduces and expands on previous work on a collaborative project, called FLOSSmole (formerly OSSmole), designed to gather, share, and store comparable data and analyses of free, libre, and open source software (FLOSS) development for academic research. The project draws on the ongoing collection and analysis efforts of many research groups, reducing duplication, and promoting compatibility both across sources of FLOSS data and across research groups and analyses. The article outlines current difficulties with the current typical quantitative FLOSS research process and uses these to develop requirements and presents the design of the system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.