2020
DOI: 10.1101/2020.11.03.365932
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Protein Structural Alignments From Sequence

Abstract: Computing sequence similarity is a fundamental task in biology, with alignment forming the basis for the annotation of genes and genomes and providing the core data structures for evolutionary analysis. Standard approaches are a mainstay of modern molecular biology and rely on variations of edit distance to obtain explicit alignments between pairs of biological sequences. However, sequence alignment algorithms struggle with remote homology tasks and cannot identify similarities between many pairs of proteins w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(39 citation statements)
references
References 54 publications
0
36
0
Order By: Relevance
“…For researchers contributing new protein LMs, bio_embeddings can provide a unified interface to distribute their work to the community, requiring minimal changes for pipeline consumers to make use of new protein LMs. For researchers contributing downstream uses of protein LMs [e.g., for the visualization of attention maps (Vig et al., 2020), which are most closely related to protein contact maps, or for the alignment of protein sequences (Morton et al., 2020)], the bio_embeddings pipeline provides a flexible approach to incorporate their work and directly extends it to all the LMs supported by bio_embeddings . In the future, as we expect more protein LMs to be developed, the bio_embeddings pipeline could be combined with the TAPE (Rao et al., 2019) evaluation system to provide an intuition for protein LM researchers about the best use of their new representations.…”
Section: Commentarymentioning
confidence: 99%
“…For researchers contributing new protein LMs, bio_embeddings can provide a unified interface to distribute their work to the community, requiring minimal changes for pipeline consumers to make use of new protein LMs. For researchers contributing downstream uses of protein LMs [e.g., for the visualization of attention maps (Vig et al., 2020), which are most closely related to protein contact maps, or for the alignment of protein sequences (Morton et al., 2020)], the bio_embeddings pipeline provides a flexible approach to incorporate their work and directly extends it to all the LMs supported by bio_embeddings . In the future, as we expect more protein LMs to be developed, the bio_embeddings pipeline could be combined with the TAPE (Rao et al., 2019) evaluation system to provide an intuition for protein LM researchers about the best use of their new representations.…”
Section: Commentarymentioning
confidence: 99%
“…On the other hand, difficulty identifying distant homologs that share low sequence similarity could obscure functional conservation. Advancement in sequence alignment methods, such as incorporating structural information, could help to discover these distant homologs ( Morton et al, 2020 ). This will probably be an iterative process, where increased resolution of gene expression at the highly resolved cellular or subcellular level could allow inferences about functional orthology across species in large gene families.…”
Section: Technical Challenges Faced In the Development Of The Pcamentioning
confidence: 99%
“…The latter is relevant when using the method in a neural network pipeline requiring backprogation. [31] our JAX implementations of smooth Smith-Waterman (green), smooth Needleman-Wunsch (orange) and a naive non-vectorized Needleman-Wunsch (blue). Top plots report time for a forward pass, and the bottom plots report time for a forward and backward pass.…”
Section: B Smooth Smith-waterman B1 Speed Testmentioning
confidence: 99%
“…Later, a differentiable kernel-based method for alignment was introduced [38]. More recently, Morton et al implemented a differentiable version of the Needleman-Wunsch algorithm for global pairwise alignment [34, 31]. Our implementation has several advantages: (i) vectorization makes our code faster (Appendices B.1 and B.4)), (ii) we implemented local alignment and an affine gap penalty (Appendix B.5), and (iii) due to the way gaps are parameterized, the output of [31] can not be interpreted as an expected alignment (Appendix B.3).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation