2021
DOI: 10.7717/peerj-cs.601
|View full text |Cite
|
Sign up to set email alerts
|

GrimoireLab: A toolset for software development analytics

Abstract: Background After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. Goal To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 24 publications
(7 citation statements)
references
References 56 publications
(90 reference statements)
0
7
0
Order By: Relevance
“…The former use case is supported by platforms such as Sonarqube and the source{d} Community Edition. The latter use case is supported by research platforms such as MetricMiner [52] and GrimoireLab [53]. For metrics already measured by GitHub, there is also Google BigQuery for Github (https://cloud.google.com/blog/topics/publicdatasets/github-on-bigquery-analyze-all-the-open-source-code, accessed on 16 January 2024),which allows for access to data using an SQL interface.…”
Section: Software Analytics Systemsmentioning
confidence: 99%
“…The former use case is supported by platforms such as Sonarqube and the source{d} Community Edition. The latter use case is supported by research platforms such as MetricMiner [52] and GrimoireLab [53]. For metrics already measured by GitHub, there is also Google BigQuery for Github (https://cloud.google.com/blog/topics/publicdatasets/github-on-bigquery-analyze-all-the-open-source-code, accessed on 16 January 2024),which allows for access to data using an SQL interface.…”
Section: Software Analytics Systemsmentioning
confidence: 99%
“…The process is based on the ETL plugin that can be either API based, or executed on the locally cloned repositories. • Data Analysis: enables to integrate data-analysis plugins that will be executed in Apache Spark 13 , each using a specific methodology (Machine Learning/Statistical Analysis) to solve a specific task. • Dashboard: visualization tool based on Apache Superset 14 , used for inspecting and visualizing the data and the results of the analysis performed in the Data Analysis block.…”
Section: Pandoramentioning
confidence: 99%
“…Data Analysis. The data analysis process can be executed as python script in Apache Spark 13 . Spark enables to execute analysis on a distributed cluster and it is used to prepare data, train, test and validate a range of Machine Learning and statistical models.…”
Section: Pandoramentioning
confidence: 99%
See 1 more Smart Citation
“…Five items (6%) were published in other venues, including magazines and newsletters. We note that our initial query did retrieve numerous computer science research papers related to database migration, but after abstract review we determined that many of these were not relevant to this study, as they focused on topics like "software analytics toolset development" (Dueñas et al, 2021), or the description of domain-specific databases without any discussion of sustainability or curation (Tabakmakher et al, 2019). Similarly, most papers described database curation work taking place within an academic library (n=16) or institutional repository (n=23).…”
Section: The Corpusmentioning
confidence: 99%