Redundancy-free analysis of multi-revision software artifacts

Alexandru, Carol V.; Panichella, Sebastiano; Proksch, Sebastian; Gall, Harald C.

doi:10.1007/s10664-018-9630-9

Cited by 17 publications

(21 citation statements)

References 80 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Magic methods are naturally detected by looking for function definition nodes with the appropriate name. To make these detections, we utilized LISA, a framework for performing large-scale software analysis on abstract syntax trees [2]. We analyzed the most recent revision of all 1,000 projects, totalling 178,735 files containing 38,505,577 lines of Python code.…”

Section: Measuring the Prevalence Of Idioms Inmentioning

confidence: 99%

On the usage of pythonic idioms

Alexandru

Merchante

Panichella

et al. 2018

Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Sof

Self Cite

View full text Add to dashboard Cite

Developers discuss software architecture and concrete source code implementations on a regular basis, be it on questionanswering sites, online chats, mailing lists or face to face. In many cases, there is more than one way of solving a programming task. Which way is best may be decided based on case-specific circumstances and constraints, but also based on convention. Having strong conventions, and a common vocabulary to express them, simplifies communication and strengthens common understanding of software development problems and their solutions. While many programming ecosystems have a common vocabulary, Python's relationship to conventions and common language is a particularly pronounced. The "Zen of Python", a famous set of high-level coding conventions authored by Tim Peters, states "There should be one, and preferably only one, obvious way to do it". This 'one way to do it' is often referred to as the 'Pythonic' way: the ideal solution to a particular problem. Few other programming languages have coined a unique term to label the quality of craftsmanship gone into a software artifact. In this paper, we explore how Python developers understand the term 'Pythonic' by means of structured interviews, build a catalogue of 'pythonic idioms' gathered from literature, and conjecture on the effects of having a language-specific term for quality code, considering the potential it could hold for other programming languages and ecosystems. We find that while the term means different things to novice versus experienced Python developers, it encompasses not only concrete implementation, but a way of thinking -a culture -in general.

show abstract

Section: Measuring the Prevalence Of Idioms Inmentioning

confidence: 99%

On the usage of pythonic idioms

Alexandru

Merchante

Panichella

et al. 2018

Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Sof

Self Cite

View full text Add to dashboard Cite

show abstract

“…Depending on the intended application and field of use, provenance can be looked at various granularities [4,13]. On the finest granularity end of the spectrum, tracking the origin of programming building blocks like functions, methods or classes, code snippets, or even individual lines of code (SLOC) and abstract syntax trees (AST) [4], is useful when studying coding patterns across repositories [5,17]. On the opposite end, at the coarsest granularity, tracking the origin of whole repositories is useful when looking at the evolution of forks [7,28,42,53] or project popularity [8].…”

Section: Software Provenance Trackingmentioning

confidence: 99%

Software provenance tracking at the scale of public source code

2020

View full text Add to dashboard Cite

We study the possibilities to track provenance of software source code artifacts within the largest publicly accessible corpus of publicly available source code, the Software Heritage archive, with over 4 billions unique source code files and 1 billion commits capturing their development histories across 50 million software projects. We perform a systematic and generic estimate of the replication factor across the different layers of this corpus, analysing how much the same artifacts (e.g., SLOC, files or commits) appear in different contexts (e.g., files, commits or source code repositories). We observe a combinatorial explosion in the number of identical source code files across different commits. To discuss the implication of these findings, we benchmark different data models for capturing software provenance information at this scale, and we identify a viable solution, based on the properties of isochrone subgraphs, that is deployable on commodity hardware, is incremental and appears to be maintainable for the foreseeable future. Using these properties, we quantify, at a scale never achieved previously, the growth rate of original, i.e. neverseen-before, source code files and commits, and find it to be exponential over a period of more than 40 years.

show abstract

“…Underlying our method for visualizing graph evolution is an idea that originally relates primarily to the efficient computation of metrics and other analyses over the entire history of large software projects, although the technique applies to any kind of versioned graph, as we discuss in section V-D. To this end, we present a graph compression algorithm and numerous techniques for reducing redundancies when analyzing multiple revisions of the same project in previous work [5]. A central concept in that work is the idea of a "revision range".…”

Section: Background and Research Goalmentioning

confidence: 99%

“…Consequently, each revision to be measured is analyzed individually in a laborious, resource-intensive, and slow process. Research shows that more than 95% of data is redundantly analyzed when discounting the multi-revision nature of software and that, all other factors being equal, analyzing revisions individually can be over 50 times slower [5]. Short of expensively parallelizing the workload, a fairly simple improvement is to analyze revisions incrementally, recomputing values only in artifacts where changes occur.…”

Section: Background and Research Goalmentioning

confidence: 99%

“…But given that changes are usually small and localized, for example, at the method level, this still involves significant redundant computation. (5) shows how a method count is computed and stored. The top-level class, for example, counts two methods in the first revision, and then five in the second and third revisions, for which this value is stored only once.…”

Section: Background and Research Goalmentioning

confidence: 99%

See 1 more Smart Citation

Evo-Clocks: Software Evolution at a Glance

Alexandru

Proksch

Behnamghader

et al. 2019

2019 Working Conference on Software Visualization (VISSOFT)

Self Cite

View full text Add to dashboard Cite

Understanding the evolution of a project is crucial in reverse-engineering, auditing and otherwise understanding existing software. Visualizing how software evolves can be challenging, as it typically abstracts a multi-dimensional graph structure where individual components undergo frequent but localized changes. Existing approaches typically consider either only a small number of revisions or they focus on one particular aspect, such as the evolution of code metrics or architecture. Approaches using a static view with a time axis (such as line charts) are limited in their expressiveness regarding structure, and approaches visualizing structure quickly become cluttered with an increasing number of revisions and components. We propose a novel trade-off between displaying global structure over a large time period with reduced accuracy and visualizing fine-grained changes of individual components with absolute accuracy. We demonstrate how our approach displays changes by blending redundant visual features (such as scales or repeating data points) where they are not expressive. We show how using this approach to explore software evolution can reveal ephemeral information when familiarizing oneself with a new project. We provide a working implementation as an extension to our open-source library for fine-grained evolution analysis, LISA.

show abstract

Redundancy-free analysis of multi-revision software artifacts

Cited by 17 publications

References 80 publications

On the usage of pythonic idioms

On the usage of pythonic idioms

Software provenance tracking at the scale of public source code

Evo-Clocks: Software Evolution at a Glance

Contact Info

Product

Resources

About