1Predicting structure-dependent functionalities of biomolecules is crucial for accelerating 2 a wide variety of applications in drug-screening, biosensing, disease-diagnosis, and 3 therapy. Although the commonly used structural "fingerprints" work for biomolecules in 4 traditional informatics implementations, they remain impractical in a wide range of 5 machine learning approaches where the model is restricted to make data-driven 6 decisions. Although peptides, proteins, and oligonucleotides have sequence-related 7 propensities, representing them as sequences of letters, e.g., in bioinformatics studies, 8 causes a loss of most of their structure-related functionalities. Biomolecules lacking 9 sequence, such as polysaccharides, lipids, and their peptide conjugates, cannot be 10 screened with models using the letter-based fingerprints. Here we introduce a new 11 fingerprint derived from valence shell electron pair repulsion structures for small peptides 12 that enables construction of structural feature-maps for a given biomolecule, regardless 13 of the sequence or conformation. The feature-map introduced here uses a simple 14 encoding derived from the molecular graph -atoms, bonds, distances, bond angles, etc., 15 that make up each of the amino acids in the sequence, allowing a Residual Neural 16 network model to take greater advantage of information in molecular structure. We make 17 use of the short peptides binding to Major-Histocompatibility-Class-I protein alleles that 18 are encoded in terms of their extended structures to predict allele-specific binding-19 affinities of test-peptides. Predictions are consistent, without appreciable loss in accuracy 20 between models for different length sequences, marking an improvement over the current 21 models. Biological processes are heterogeneous interactions, which justifies encoding all 22 biomolecules universally in terms of structures and relating them to their functionality. The 23 capabilities facilitated by the model expands the paradigm in establishing structure-24 function correlations among small molecules, short and longer sequences including large 25 biomolecules, and genetic conjugates that may include polypeptides, polynucleotides, 26RNAs, lipids, peptidoglycans, peptido-lipids, and other biomolecules that could be 27 implemented in a wide range of medical and nanobiotechnological applications in the 28 future. 29
22 42 of among the peptides with different sequences and lengths in a wide range of 43 biotechnology, nanomedicine and bioinformatics applications.44 45 46
Proteins have evolved over millions of years to mediate and carry-out biological processes efficiently. Directed evolution approaches have been used to genetically engineer proteins with desirable functions such as catalysis, mineralization, and target-specific binding. Next-generation sequencing technology offers the capability to discover a massive combinatorial sequence space that is costly to sample experimentally through traditional approaches. Since the permutation space of protein sequence is virtually infinite, and evolution dynamics are poorly understood, experimental verifications have been limited. Recently, machine-learning approaches have been introduced to guide the evolution process that facilitates a deeper and denser search of the sequence-space. Despite these developments, however, frequently used high-fidelity models depend on massive amounts of properly labeled quality data, which so far has been largely lacking in the literature. Here, we provide a preliminary high-throughput peptide-selection protocol with functional scoring to enhance the quality of the data. Solid binding dodecapeptides have been selected against molybdenum disulfide substrate, a two-dimensional atomically thick semiconductor solid. The survival rate of the phage-clones, upon successively stringent washes, quantifies the binding affinity of the peptides onto the solid material. The method suggested here provides a fast generation of preliminary data-pool with ∼2 million unique peptides with 12 amino-acids per sequence by avoiding amplification. Our results demonstrate the importance of data-cleaning and proper conditioning of massive datasets in guiding experiments iteratively. The established extensive groundwork here provides unique opportunities to further iterate and modify the technique to suit a wide variety of needs and generate various peptide and protein datasets. Prospective statistical models developed on the datasets to efficiently explore the sequence-function space will guide towards the intelligent design of proteins and peptides through deep directed evolution. Technological applications of the future based on the peptide-single layer solid based bio/nano soft interfaces, such as biosensors, bioelectronics, and logic devices, is expected to benefit from the solid binding peptide dataset alone. Furthermore, protocols described herein will also benefit efforts in medical applications, such as vaccine development, that could significantly accelerate a global response to future pandemics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.