Newly Born proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by expression of novel Open Reading Frames, and appear throughout evolution. We used the recently developed programs for predicting protein structures, RoseTTAFold and AlphaFold2, to compare such Newly Born proteins to random polypeptides generated by shuffling sequences of native proteins, which have been called ‘Never Born’ proteins. The two programs were used to compare the structures of two sets of four Never Born proteins, one set that had been expressed and shown to be intrinsically disordered, and a second set that had been shown experimentally to possess substantial secondary structure. Since the programs rely to a large extent on multisequence alignment, the models generated were scored as being of low quality. However, a significant pattern emerged when the models generated by RoseTTAFold were examined. Specifically, all four members of Group 1 were shown to be very extended, as would be expected for intrinsically disordered proteins. In contrast, all four members of Group 2 appeared to be compact, and possessed substantial secondary structure. As a further control, both programs predicted unfolded structures for three well characterized intrinsically disordered proteins. The two programs were used to predict the structures of two orphan proteins whose crystal structures have been solved, both of which display novel folds. RoseTTAFold predicted both structures very well, whereas AlphaFold2 predicted only one well. The two programs were used to predict the structures of five orphan proteins with well-identified biological functions, one of which is predicted to be intrinsically disordered, and four to be folded. Both programs displayed the intrinsically disordered protein as an unfolded structure. RoseTTAFold displayed all four of those predicted to be folded as compact folded structures, with apparent novel folds, as determined by Dali and Foldseek. It is plausible that new biological functions may be implemented by orphan proteins due to their novel folds.
“Newly Born” proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by the expression of novel open reading frames, and appear throughout evolution. We were curious if three recently developed programs for predicting protein structures, namely, AlphaFold2, RoseTTAFold, and ESMFold, might be of value for comparison of such “Newly Born” proteins to random polypeptides with amino acid content similar to that of native proteins, which have been called “Never Born” proteins. The programs were used to compare the structures of two sets of “Never Born” proteins that had been expressed—Group 1, which had been shown experimentally to possess substantial secondary structure, and Group 3, which had been shown to be intrinsically disordered. Overall, although the models generated were scored as being of low quality, they nevertheless revealed some general principles. Specifically, all four members of Group 1 were predicted to be compact by all three algorithms, in agreement with the experimental data, whereas the members of Group 3 were predicted to be very extended, as would be expected for intrinsically disordered proteins, again consistent with the experimental data. These predicted differences were shown to be statistically significant by comparing their accessible surface areas. The three programs were then used to predict the structures of three orphan proteins whose crystal structures had been solved, two of which display novel folds. Surprisingly, only for the protein which did not have a novel fold, and was taxonomically restricted, rather than being a true orphan, did all three algorithms predict very similar, high‐quality structures, closely resembling the crystal structure. Finally, they were used to predict the structures of seven orphan proteins with well‐identified biological functions, whose 3D structures are not known. Two proteins, which were predicted to be disordered based on their sequences, are predicted by all three structure algorithms to be extended structures. The other five were predicted to be compact structures with only two exceptions in the case of AlphaFold2. All three prediction algorithms make remarkably similar and high‐quality predictions for one large protein, HCO_11565, from a nematode. It is conjectured that this is due to many homologs in the taxonomically restricted family of which it is a member, and to the fact that the Dali server revealed several nonrelated proteins with similar folds. An animated Interactive 3D Complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:Proteins:3
Newly Born proteins, devoid of detectable homology to any other proteins, known as orphan proteins, occur in a single species or within a taxonomically restricted gene family. They are generated by expression of novel Open Reading Frames, and appear throughout evolution. We used the recently developed programs for predicting protein structures, RoseTTAFold and AlphaFold2, to compare such Newly Born proteins to random polypeptides generated by shuffling sequences of native proteins, which have been called ‘ Never Born’ proteins. The two programs were used to compare the structures of two sets of four Never Born proteins, one set that had been expressed and shown to be intrinsically disordered, and a second set that had been shown experimentally to possess substantial secondary structure. Since the programs rely to a large extent on multisequence alignment, the models generated were scored as being of low quality. However, a significant pattern emerged when the models generated by RoseTTAFold were examined. Specifically, all four members of Group 1 were shown to be very extended, as would be expected for intrinsically disordered proteins. In contrast, all four members of Group 2 appeared to be compact, and possessed substantial secondary structure. As a further control, both programs predicted unfolded structures for three well characterized intrinsically disordered proteins. The two programs were used to predict the structures of two orphan proteins whose crystal structures have been solved, both of which display novel folds. RoseTTAFold predicted both structures very well, whereas AlphaFold2 predicted only one well. The two programs were used to predict the structures of five orphan proteins with well-identified biological functions, one of which is predicted to be intrinsically disordered, and four to be folded. Both programs displayed the intrinsically disordered protein as an unfolded structure. RoseTTAFold displayed all four of those predicted to be folded as compact folded structures, with apparent novel folds, as determined by Dali and Foldseek. It is plausible that new biological functions may be implemented by orphan proteins due to their novel folds.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.