Protein Structural Alignments From Sequence

Jt, Morton; Strauss, Charlie E. M.; Blackwell, Robert; Berenberg, Daniel; Gligorijevic,; Bonneau, Richard

doi:10.1101/2020.11.03.365932

Cited by 21 publications

(39 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For researchers contributing new protein LMs, bio_embeddings can provide a unified interface to distribute their work to the community, requiring minimal changes for pipeline consumers to make use of new protein LMs. For researchers contributing downstream uses of protein LMs [e.g., for the visualization of attention maps (Vig et al., 2020), which are most closely related to protein contact maps, or for the alignment of protein sequences (Morton et al., 2020)], the bio_embeddings pipeline provides a flexible approach to incorporate their work and directly extends it to all the LMs supported by bio_embeddings . In the future, as we expect more protein LMs to be developed, the bio_embeddings pipeline could be combined with the TAPE (Rao et al., 2019) evaluation system to provide an intuition for protein LM researchers about the best use of their new representations.…”

Section: Commentarymentioning

confidence: 99%

Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets

et al. 2021

Self Cite

View full text Add to dashboard Cite

If you already have a Python installation with a different version (e.g., 2.7) that you must keep, consider installing Python 3.8 through Anaconda ("Anaconda Software Distribution," 2020): https:// docs.anaconda.com/ anaconda/ install. Download required files.Through your browser, navigate to http:// data.bioembeddings.com/ disprot and download the files: sequences.fasta, config.yml, and dis-prot_annotations.csv.Note that you might need to right click and select "Save Link As" to download the files.

show abstract

Section: Commentarymentioning

confidence: 99%

Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…On the other hand, difficulty identifying distant homologs that share low sequence similarity could obscure functional conservation. Advancement in sequence alignment methods, such as incorporating structural information, could help to discover these distant homologs ( Morton et al, 2020 ). This will probably be an iterative process, where increased resolution of gene expression at the highly resolved cellular or subcellular level could allow inferences about functional orthology across species in large gene families.…”

Section: Technical Challenges Faced In the Development Of The Pcamentioning

confidence: 99%

Vision, challenges and opportunities for a Plant Cell Atlas

Jha

Borowsky

Cole

et al. 2021

eLife

View full text Add to dashboard Cite

With growing populations and pressing environmental problems, future economies will be increasingly plant-based. Now is the time to reimagine plant science as a critical component of fundamental science, agriculture, environmental stewardship, energy, technology and healthcare. This effort requires a conceptual and technological framework to identify and map all cell types, and to comprehensively annotate the localization and organization of molecules at cellular and tissue levels. This framework, called the Plant Cell Atlas (PCA), will be critical for understanding and engineering plant development, physiology and environmental responses. A workshop was convened to discuss the purpose and utility of such an initiative, resulting in a roadmap that acknowledges the current knowledge gaps and technical challenges, and underscores how the PCA initiative can help to overcome them.

show abstract

“…The latter is relevant when using the method in a neural network pipeline requiring backprogation. [31] our JAX implementations of smooth Smith-Waterman (green), smooth Needleman-Wunsch (orange) and a naive non-vectorized Needleman-Wunsch (blue). Top plots report time for a forward pass, and the bottom plots report time for a forward and backward pass.…”

Section: B Smooth Smith-waterman B1 Speed Testmentioning

confidence: 99%

“…Later, a differentiable kernel-based method for alignment was introduced [38]. More recently, Morton et al implemented a differentiable version of the Needleman-Wunsch algorithm for global pairwise alignment [34, 31]. Our implementation has several advantages: (i) vectorization makes our code faster (Appendices B.1 and B.4)), (ii) we implemented local alignment and an affine gap penalty (Appendix B.5), and (iii) due to the way gaps are parameterized, the output of [31] can not be interpreted as an expected alignment (Appendix B.3).…”

Section: Introductionmentioning

confidence: 99%

“…Bepler et al first pretrain a bidirectional RNN language model, then freeze this model and train a downstream model using a pseudo-alignment loss [5]. Similarly, Morton et al use a pretrained language model to parametrize the the alignment scoring function [31]. Their loss, however, is purely supervised based on ground-truth structural alignments.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman

Petti¹,

Bhattacharya

Rao

et al. 2021

Preprint

View full text Add to dashboard Cite

Multiple Sequence Alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for. Here, we implement a smooth and differentiable version of the Smith-Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF mildly improves contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing the predicted confidence metric, we can learn MSAs that improve structure predictions over the initial MSAs. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment.

show abstract

Protein Structural Alignments From Sequence

Cited by 21 publications

References 54 publications

Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets

Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets

Vision, challenges and opportunities for a Plant Cell Atlas

End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman

Contact Info

Product

Resources

About