In order to investigate the factors responsible for protein thermostability, we performed a comparative analysis. For this study, we prepared a new dataset composed of 47 homologous pairs of thermophilic and mesophilic proteins. It is the largest comparative study dataset ever presented. The frequency and substitution preference of each amino acid type in the dataset were analyzed. Two kinds of residual structural states were considered, i.e. surface (solvent-exposed) and core (buried) regions. On the surface of thermophilic proteins, higher frequencies were observed for Arg, Glu, and Tyr. Analysis of substitution preference also suggests that these often appear by replacement of other amino acid types. The results indicate that Arg, Glu, and Tyr are suitable for location on the surface of thermophilic proteins. On the other hand, at the core of thermophilic proteins, Ala is often appeared. In addition, our t-test analysis provides the first quantitative information about trends in the frequencies and substitution preferences for Cys, Gln, Met, and Ser. The results indicate that Gln and Met on the surface and Cys and Ser in the core are disadvantageous for protein thermostability. q
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.