The assessment of binding affinity between ligands and the target proteins plays an essential role in drug discovery and design process. As an alternative to widely used scoring approaches, machine learning methods have also been proposed for fast prediction of the binding affinity with promising results, but most of them were developed as all-purpose models despite of the specific functions of different protein families, since proteins from different function families always have different structures and physicochemical features. In this study, we proposed a random forest method to predict the protein-ligand binding affinity based on a comprehensive feature set covering protein sequence, binding pocket, ligand structure and intermolecular interaction. Feature processing and compression was respectively implemented for different protein family datasets, which indicates that different features contribute to different models, so individual representation for each protein family is necessary. Three family-specific models were constructed for three important protein target families of HIV-1 protease, trypsin and carbonic anhydrase respectively. As a comparison, two generic models including diverse protein families were also built. The evaluation results show that models on family-specific datasets have the superior performance to those on the generic datasets and the Pearson and Spearman correlation coefficients (R p and Rs) on the test sets are 0.740, 0.874, 0.735 and 0.697, 0.853, 0.723 for HIV-1 protease, trypsin and carbonic anhydrase respectively. Comparisons with the other methods further demonstrate that individual representation and model construction for each protein family is a more reasonable way in predicting the affinity of one particular protein family.
Knowledge about the impact of altitude and ethnicity on human oral microbiota is currently limited. To obtain the baseline of normal salivary microbiota, we analyzed the bacteria and fungi composition in Tibetan (HY group) and Han population (CD group) living at different altitudes by using next-generation sequencing (NGS) technology combined with PICRUSt and FUNGuild analyses. There were significant differences in oral microbiota composition between the two groups at phylum and genus levels. At the phylum level, the HY group had higher relative abundances of Firmicutes and Ascomycota, whereas the Bacteroidetes and Basidiomycota in the CD group were richer. These changes at the phylum level reflected different dominant genus compositions. Compared with the Han population, Candida, Fusarium, Zopfiella, Streptococcus, Veillonella and Rothia in Tibetan were higher. Surprisingly, the Zopfiella was found almost exclusively in the Tibetan. The PICRUSt and FUNGuild analysis also indicated that the function of the bacterial and fungal communities was altered between the two groups. In conclusion, our results suggest that there are significant differences in oral microbial structure and metabolic characteristics and trophic modes among Tibetan and Han population living at different altitudes. We first established the oral microbiota framework and represented a critical step for determining the diversity of oral microbiota in the Tibetan and Han population.
With the development of network technology, more and more data are transmitted over the network and privacy issues have become a research focus. In this paper, we study the privacy in health data collection of preschool children and present a new identity-based encryption protocol for privacy protection. The background of the protocol is as follows. A physical examination for preschool children is needed every year out of consideration for the children's health. After the examination, data are transmitted through the Internet to the education authorities for analysis. In the process of data collection, it is unnecessary for the education authorities to know the identities of the children. Based on this, we designed a privacy-preserving protocol, which delinks the children's identities from the examination data. Thus, the privacy of the children is preserved during data collection. We present the protocol in detail and prove the correctness of the protocol.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.