Accurately
predicting the impact of point mutation on protein stability
has crucial roles in protein design and engineering. In this study,
we proposed a novel method (BoostDDG) to predict stability changes
upon point mutations from protein sequences based on the extreme gradient
boosting. We extracted features comprehensively from evolutional information
and predicted structures and performed feature selection by a strategy
of sequential forward selection. The features and parameters were
optimized by homologue-based cross-validation to avoid overfitting.
Finally, we found that 14 features from six groups led to the highest
Pearson correlation coefficient (PCC) of 0.535, which is consistent
with the 0.540 on an independent test. Our method was indicated to
consistently outperform other sequence-based methods on three precompiled
test sets, and 7363 variants on two proteins (PTEN and TPMT). These
results highlighted that BoostDDG is a powerful tool for predicting
stability changes upon point mutations from protein sequences.
Oxford Nanopore sequencing is fastly becoming an active field in genomics, and it's critical to basecall nucleotide sequences from the complex electrical signals. Many efforts have been devoted to developing new basecalling tools over the years. However, the basecalled reads still suffer from a high error rate and slow speed. Here, we developed an open-source basecalling method, CATCaller, by simultaneously capturing global context through Attention and modeling local dependencies through dynamic convolution. The method was shown to consistently outperform the ONT default basecaller Albacore, Guppy, and a recently developed attention-based method SACall in read accuracy. More importantly, our method is fast through a heterogeneously computational model to integrate both CPUs and GPUs. When compared to SACall, the method is nearly 4 times faster on a single GPU, and is highly scalable in parallelization with a further speedup of 3.3 on a four-GPU node.
Oxford Nanopore sequencing is fastly becoming an active field in genomics, and it’s critical to basecall nucleotide sequences from the complex electrical signals. Many efforts have been devoted to developing new basecalling tools over the years. However, the basecalled reads still suffer from a high error rate and slow speed. Here, we developed an open-source basecalling method, CATCaller, by simultaneously capturing global context through Attention and modeling local dependencies through dynamic convolution. The method was shown to consistently outper-form the ONT default basecaller Albacore, Guppy, and a recently developed attention-based method SACall in read accuracy. More importantly, our method is fast through a heterogeneously computational model to integrate both CPUs and GPUs. When compared to SACall, the method is nearly 4 times faster on a single GPU, and is highly scalable in parallelization with a further speedup of 3.3 on a four-GPU node.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.