The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design.
In order to get the extracted lung region from CT images more accurately, a model that contains lung
region extraction and edge boundary correction is proposed. Firstly, a new edge detection function is presented with
the help of the classic structure tensor theory. Secondly, the initial lung mask is automatically extracted by an improved
active contour model which combines the global intensity information, local intensity information, the new edge
information, and an adaptive weight. It is worth noting that the objective function of the improved model is converted
to a convex model, which makes the proposed model get the global minimum. Then, the central airway was excluded
according to the spatial context messages and the position relationship between every segmented region and the
rib. Thirdly, a mesh and the fractal theory are used to detect the boundary that surrounds the juxtapleural nodule.
Finally, the geometric active contour model is employed to correct the detected boundary and reinclude juxtapleural
nodules. We also evaluated the performance of the proposed segmentation and correction model by comparing with
their popular counterparts. Efficient computing capability and robustness property prove that our model can correct
the lung boundary reliably and reproducibly.
Antioxidant proteins can terminate a chain of reactions caused by free radicals and protect cells from damage. To identify antioxidant proteins rapidly, a computational model was proposed based on the optimized recoding scheme, sequence information and machine learning methods. First, over 600 recoding schemes were collected to build a scheme set. Then, the original sequence was recoded as a reduced expression whose g‐gap dipeptides (g = 0, 1, 2) were used as the features of proteins. Furthermore, a random forest method was used to evaluate the classification ability of the obtained dipeptide features. After going through all schemes, the best predictive performance scheme was chosen as the optimized reduction scheme. Finally, for the RF method, a grid search strategy was used to select a better parameter combination to identify antioxidant proteins. In the experiment, the present method correctly recognized 90.13–99.87% of the antioxidant samples. Other experimental results also proved that the present method was efficient to identify antioxidant proteins. Finally, we also developed a web server that was freely accessible to researchers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.