High-Performance Computing with TeraStat

Bompiani, E.; Petrillo, Umberto Ferraro; Lasinio, Giovanna Jona; Palini, Francesco

doi:10.1109/dasc-picom-cbdcom-cyberscitech49142.2020.00092

Cited by 5 publications

(1 citation statement)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…N I has 116, 641 nodes and 768, 993 edges, N S has 19, 354 nodes and 11, 759, 454 edges. The tests have been performed on an HPC infrastructure equipped with 8 compute nodes running Linux, each equipped with 2 AMD Epyc 7452 processors and 256 GB RAM, for a total of 512 compute cores (see [18] for more details).…”

Section: Resultsmentioning

confidence: 99%

DIAMIN: a software library for the distributed analysis of large-scale molecular interaction networks

2022

View full text Add to dashboard Cite

Background Huge amounts of molecular interaction data are continuously produced and stored in public databases. Although many bioinformatics tools have been proposed in the literature for their analysis, based on their modeling through different types of biological networks, several problems still remain unsolved when the problem turns on a large scale. Results We propose , that is, a high-level software library to facilitate the development of applications for the efficient analysis of large-scale molecular interaction networks. relies on distributed computing, and it is implemented in Java upon the framework Apache Spark. It delivers a set of functionalities implementing different tasks on an abstract representation of very large graphs, providing a built-in support for methods and algorithms commonly used to analyze these networks. has been tested on data retrieved from two of the most used molecular interactions databases, resulting to be highly efficient and scalable. As shown by different provided examples, can be exploited by users without any distributed programming experience, in order to perform various types of data analysis, and to implement new algorithms based on its primitives. Conclusions The proposed has been proved to be successful in allowing users to solve specific biological problems that can be modeled relying on biological networks, by using its functionalities. The software is freely available and this will hopefully allow its rapid diffusion through the scientific community, to solve both specific data analysis and more complex tasks.

show abstract

Section: Resultsmentioning

confidence: 99%

DIAMIN: a software library for the distributed analysis of large-scale molecular interaction networks

2022

View full text Add to dashboard Cite

show abstract

A distributed approach for persistent homology computation on a large scale

Ceccaroni,

Di Rocco,

Ferraro Petrillo

et al. 2024

J Supercomput

View full text Add to dashboard Cite

Persistent homology (PH) is a powerful mathematical method to automatically extract relevant insights from images, such as those obtained by high-resolution imaging devices like electron microscopes or new-generation telescopes. However, the application of this method comes at a very high computational cost that is bound to explode more because new imaging devices generate an ever-growing amount of data. In this paper, we present PixHomology, a novel algorithm for efficiently computing zero-dimensional PH on images, optimizing memory and processing time. By leveraging the Apache Spark framework, we also present a distributed version of our algorithm with several optimized variants, able to concurrently process large batches of astronomical images. Finally, we present the results of an experimental analysis showing that our algorithm and its distributed version are efficient in terms of required memory, execution time, and scalability, consistently outperforming existing state-of-the-art PH computation tools when used to process large datasets.

show abstract