Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural similarities with other proteins. The homology modeling process is done in sequential steps where sequence/structure alignment is optimized, then a backbone is built and later, side-chains are added. Once the low-homology loops are modeled, the whole 3D structure is optimized and validated. In the past three decades, a few collective and collaborative initiatives allowed for continuous progress in both homology and
ab initio
modeling. Critical Assessment of protein Structure Prediction (CASP) is a worldwide community experiment that has historically recorded the progress in this field. Folding@Home and Rosetta@Home are examples of crowd-sourcing initiatives where the community is sharing computational resources, whereas RosettaCommons is an example of an initiative where a community is sharing a codebase for the development of computational algorithms. Foldit is another initiative where participants compete with each other in a protein folding video game to predict 3D structure. In the past few years, contact maps deep machine learning was introduced to the 3D structure prediction process, adding more information and increasing the accuracy of models significantly. In this review, we will take the reader in a journey of exploration from the beginnings to the most recent turnabouts, which have revolutionized the field of homology modeling. Moreover, we discuss the new trends emerging in this rapidly growing field.
Side-chain rotamer prediction is one of the most critical
late
stages in protein 3D structure building. Highly advanced and specialized
algorithms (e.g., FASPR, RASP, SCWRL4, and SCWRL4v) optimize this
process by use of rotamer libraries, combinatorial searches, and scoring
functions. We seek to identify the sources of key rotamer errors as
a basis for correcting and improving the accuracy of protein modeling
going forward. In order to evaluate the aforementioned programs, we
process 2496 high-quality single-chained all-atom filtered 30% homology
protein 3D structures and use discretized rotamer analysis to compare
original with calculated structures. Among 513,024 filtered residue
records, increased amino acid residue-dependent rotamer errorsassociated
in particular with polar and charged amino acid residues (ARG, LYS,
and GLN)clearly correlate with increased amino acid residue
solvent accessibility and an increased residue tendency toward the
adoption of non-canonical off rotamers which modeling programs struggle
to predict accurately. Understanding the impact of solvent accessibility
now appears key to improved side-chain prediction accuracies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.