2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC) 2021
DOI: 10.1109/mlhpc54614.2021.00010
|View full text |Cite
|
Sign up to set email alerts
|

High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

7
1

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 70 publications
0
7
0
Order By: Relevance
“…Multimer-AF2 dataset. Drawing loose inspiration from [58], for this work, we assembled the new Multimer-AF2 (MAF2) dataset comprised of multimeric structures predicted by AlphaFold 2 [8] and AlphaFold-Multimer [7], using the latest structure prediction pipeline for the Summit supercomputer [59,60]. Originating from the EVCoupling [50] and DeepHomo [51] datasets, the proteins for which we predicted structures using a combination of both AlphaFold methods consist of heteromers and homomers, respectively.…”
Section: A Additional Resultsmentioning
confidence: 99%
“…Multimer-AF2 dataset. Drawing loose inspiration from [58], for this work, we assembled the new Multimer-AF2 (MAF2) dataset comprised of multimeric structures predicted by AlphaFold 2 [8] and AlphaFold-Multimer [7], using the latest structure prediction pipeline for the Summit supercomputer [59,60]. Originating from the EVCoupling [50] and DeepHomo [51] datasets, the proteins for which we predicted structures using a combination of both AlphaFold methods consist of heteromers and homomers, respectively.…”
Section: A Additional Resultsmentioning
confidence: 99%
“…We use the GPU computing resource on the Summit supercomputer provided by Oak Ridge National Laboratory (ORNL) to train the deep learning network above. The Summit cluster [40, 41] provides many compute nodes each having 6 GPUs and 16 GB of memory, which enables the distributed deep learning training. We train 2D U-Net, 1D U-Net, and the multi-head attention layer on three separate GPU nodes.…”
Section: Methodsmentioning
confidence: 99%
“…We employed the ESM-2 pretrained model, which comprises 33 layers and 650M parameters, to generate embedding dimensions of 1,280. This computation was performed on the CPU of the Andes supercomputer 39 , utilizing its 256 GB RAM. To accommodate memory constraints, we utilized a sequence cutoff of 2,750 residues for each protein sequence.…”
Section: Training and Testing Deep Transformer Models On Cryo2structdatamentioning
confidence: 99%