Optimisation and parallelisation of the partitioning around medoids function in R

Piotrowski, Michał; Forster, Thorsten; Dobrezelecki, Bartosz; Sloan, Terence; Mitchell, Lawrence; Ghazal, Peter; Mewsissen, Muriel; Petrou, Savvas; Trew, Arthur

doi:10.1109/hpcsim.2011.5999896

Cited by 3 publications

(4 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We clustered this distance matrix using the Partition Around Medoids (PAM) clustering algorithm (Park and Jun, 2009). We implemented clustering in R using the ConsensusClusterPlus package (Wilkerson and Hayes, 2010) from Bioconductor with the ppam function from Sprint package to perform parallel PAM (Piotrowski et al, 2011). We set the number of clusters to match the individual ADAGE model (e.g.…”

Section: Star Methodsmentioning

confidence: 99%

Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks

et al. 2017

View full text Add to dashboard Cite

Summary Cross experiment comparisons in public data compendia are challenged by unmatched conditions and technical noise. The ADAGE method, which performs unsupervised integration with denoising autoencoder neural networks, can identify biological patterns, but because ADAGE models, like many neural networks, are over-parameterized, different ADAGE models perform equally well. To enhance model robustness and better build signatures consistent with biological pathways, we developed an ensemble ADAGE (eADAGE) that integrated stable signatures across models. We applied eADAGE to a compendium of Pseudomonas aeruginosa gene expression profiling experiments performed in 78 media. eADAGE revealed a phosphate starvation response controlled by PhoB in media with moderate phosphate and predicted that a second stimulus provided by the sensor kinase, KinB, is required for this PhoB activation. We validated this relationship using both targeted and unbiased genetic approaches. eADAGE, which captures stable biological patterns, enables cross-experiment comparisons that can highlight measured but undiscovered relationships.

show abstract

Section: Star Methodsmentioning

confidence: 99%

Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Clearly Figure 3 shows that the solution adopted with the relatively fast EBS exposed to all nodes by the NFS on the master node is probably not efficient enough and is the main performance bottleneck. This can be seen when comparing these results with previous benchmarks conducted on a local cluster that achieved much better scaling [11]. These were performed on a shared memory cluster with 8 dual-core processors, where each process had simultaneous access to data stored on a disk.…”

Section: Resultsmentioning

confidence: 67%

“…Here all the functions show relatively good speedup as the number of the processes increases. Petrou [35] and Piotrowski et al [11] have previously shown that, with a suitably sized dataset, pcor and ppam exhibit near linear scaling on up to 32 processes on a Cray XT supercomputer consisting of 1416 compute blades each with four quad core processor sockets where the CPUs were AMD 2.3 GHz Opteron chips with 8 GB of memory. These blades were connected via the proprietary CRAY SEASTAR2 interconnect with access to high performance parallel I/O disk storage as opposed to network attached storage.…”

Section: Resultsmentioning

confidence: 99%

“…A number of parallel functions (namely correlation, clustering, rank product, permutation test, bootstrap, random forest classifier and the apply method) are available in SPRINT. The SPRINT package has been tested on several HPC architectures including the UK’s national supercomputer service HECToR and has exhibited very good performance and speedup up to 512 processors ([10], [11], [12]) The SPRINT package has therefore proven to be an efficient parallel solution for R users on dedicated HPC infrastructure.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploiting Parallel R in the Cloud with SPRINT

Piotrowski¹,

McGilvary²,

Sloan³

et al. 2013

Methods Inf Med

View full text Add to dashboard Cite

Background-Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need.Objectives-Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon's Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. Methods-The

show abstract