Bioluminescent proteins (BLPs) are a class of proteins that widely distributed in many living organisms with various mechanisms of light emission including bioluminescence and chemiluminescence from luminous organisms. Bioluminescence has been commonly used in various analytical research methods of cellular processes, such as gene expression analysis, drug discovery, cellular imaging, and toxicity determination. However, the identification of bioluminescent proteins is challenging as they share poor sequence similarities among them. In this paper, we briefly reviewed the development of the computational identification of BLPs and subsequently proposed a novel predicting framework for identifying BLPs based on eXtreme gradient boosting algorithm (XGBoost) and using sequence-derived features. To train the models, we collected BLP data from bacteria, eukaryote, and archaea. Then, for getting more effective prediction models, we examined the performances of different feature extraction methods and their combinations as well as classification algorithms. Finally, based on the optimal model, a novel predictor named iBLP was constructed to identify BLPs. The robustness of iBLP has been proved by experiments on training and independent datasets. Comparison with other published method further demonstrated that the proposed method is powerful and could provide good performance for BLP identification. The webserver and software package for BLP identification are freely available at http://lin-group.cn/server/iBLP.
Purpose
To investigate the use of radiomics in the in‐depth identification of epidermal growth factor receptor (EGFR) mutation status in patients with lung adenocarcinoma.
Methods
Computed tomography images of 438 patients with lung adenocarcinoma were collected in two different institutions, and 496 radiomic features were extracted. In the training set, lasso logistic regression was used to establish radiomic signatures. Combining radiomic index and clinical features, five machine learning methods, and a tenfold cross‐validation strategy were used to establish combined models for EGFR+ vs EGFR−, and 19Del vs L858R, groups. The predictive power of the models was then evaluated using an independent external validation cohort.
Results
In the EGFR+ vs EGFR− and 19Del vs L858R groups, radiomic signatures consisting of 12 and 7 radiomic features were established, respectively; the area under the curves (AUCs) of the lasso logistic regression model on the validation set was 0.76 and 0.71, respectively. After inclusion of the clinical features, the maximum AUC of combined models on the validation set was 0.79 and 0.74, respectively. Logistic regression analysis showed good performance in the two groups, with AUCs of 0.79 and 0.71 on the validation set. Additionally, the AUC of combined models in the EGFR+ vs EGFR− group was higher than that of the 19Del vs L858R group.
Conclusions
Our study shows the potential of radiomics to predict EGFR mutation status. There are imaging phenotypic differences between EGFR+ and EGFR−, and between 19Del and L858R; these can be used to allow patients with lung adenocarcinoma to choose more appropriate and personalized treatment options.
DNase I hypersensitive site (DHS) refers to the hypersensitive region of chromatin for the DNase I enzyme. It is an important part of the noncoding region and contains a variety of regulatory elements, such as promoter, enhancer, and transcription factor-binding site, etc. Moreover, the related locus of disease (or trait) are usually enriched in the DHS regions. Therefore, the detection of DHS region is of great significance. In this study, we develop a deep learning-based algorithm to identify whether an unknown sequence region would be potential DHS. The proposed method showed high prediction performance on both training datasets and independent datasets in different cell types and developmental stages, demonstrating that the method has excellent superiority in the identification of DHSs. Furthermore, for the convenience of related wet-experimental researchers, the user-friendly web-server iDHS-Deep was established at http://lin-group.cn/server/iDHS-Deep/, by which users can easily distinguish DHS and non-DHS and obtain the corresponding developmental stage ofDHS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.