Feature selection from DNA microarray data is a major challenge due to high dimensionality in expression data. The number of samples in the microarray data set is much smaller compared to the number of genes. Hence the data is improper to be used as the training set of a classifier. Therefore it is important to select features prior to training the classifier. It should be noted that only a small subset of genes from the data set exhibits a strong correlation with the class. This is because finding the relevant genes from the data set is often non-trivial. Thus there is a need to develop robust yet reliable methods for gene finding in expression data. We describe the use of several hybrid feature selection approaches for gene finding in expression data. These approaches include filtering (filter out the best genes from the data set) and wrapper (best subset of genes from the data set) phases. The methods use information gain (IG) and Pearson Product Moment Correlation (PPMC) as the filtering parameters and biogeography based optimization (BBO) as the wrapper approach. K nearest neighbour algorithm (KNN) and back propagation neural network are used for evaluating the fitness of gene subsets during feature selection. Our analysis shows that an impressive performance is provided by the IG-BBO-KNN combination in different data sets with high accuracy (>90%) and low error rate.
Rab11 is an important protein subfamily in the RabGTPase family. These proteins physiologically function as key regulators of intracellular membrane trafficking processes. Pathologically, Rab11 proteins are implicated in many diseases including cancers, neurodegenerative diseases and type 2 diabetes. Although they are medically important, no previous study has found Rab11 allosteric binding sites where potential drug candidates can bind to. In this study, by employing multiple clustering approaches integrating principal component analysis, independent component analysis and locally linear embedding, we performed structural analyses of Rab11 and identified eight representative structures. Using these representatives to perform binding site mapping and virtual screening, we identified two novel binding sites in Rab11 and small molecules that can preferentially bind to different conformations of these sites with high affinities. After identifying the binding sites and the residue interaction networks in the representatives, we computationally showed that these binding sites may allosterically regulate Rab11, as these sites communicate with switch 2 region that binds to GTP/GDP. These two allosteric binding sites in Rab11 are also similar to two allosteric pockets in Ras that we discovered previously.
Rab proteins represent the largest family of the Rab superfamily guanosine triphosphatase (GTPase). Aberrant human Rab proteins are associated with multiple diseases, including cancers and neurological disorders. Rab subfamily members display subtle conformational variations that render specificity in their physiological functions and can be targeted for subfamily-specific drug design. However, drug discovery efforts have not focused much on targeting Rab allosteric non-nucleotide binding sites which are subjected to less evolutionary pressures to be conserved, hence are likely to offer subfamily specificity and may be less prone to undesirable off-target interactions and side effects. To discover druggable allosteric binding sites, Rab structural dynamics need to be first incorporated using multiple experimentally and computationally obtained structures. The high-dimensional structural data may necessitate feature extraction methods to identify manageable representative structures for subsequent analyses. We have detailed state-of-the-art computational methods to (i) identify binding sites using data on sequence, shape, energy, etc., (ii) determine the allosteric nature of these binding sites based on structural ensembles, residue networks and correlated motions and (iii) identify small molecule binders through structure- and ligand-based virtual screening. To benefit future studies for targeting Rab allosteric sites, we herein detail a refined workflow comprising multiple available computational methods, which have been successfully used alone or in combinations. This workflow is also applicable for drug discovery efforts targeting other medically important proteins. Depending on the structural dynamics of proteins of interest, researchers can select suitable strategies for allosteric drug discovery and design, from the resources of computational methods and tools enlisted in the workflow.
Catalytic proteins such as human protein tyrosine phosphatase 1B (PTP1B), with conserved and highly polar active sites, warrant the discovery of druggable nonactive sites, such as allosteric sites, and potentially, therapeutic small molecules that can bind to these sites. Catalyzing the dephosphorylation of numerous substrates, PTP1B is physiologically important in intracellular signal transduction pathways in diverse cell types and tissues. Aberrant PTP1B is associated with obesity, diabetes, cancers, and neurodegenerative disorders. Utilizing clustering methods (based on root mean square deviation, principal component analysis, nonnegative matrix factorization, and independent component analysis), we have examined multiple PTP1B structures. Using the resulting representative structures in different conformational states, we determined consensus clustroids and used them to identify both known and novel binding sites, some of which are potentially allosteric. We report several lead compounds that could potentially bind to the novel PTP1B binding sites and can be further optimized. Considering the possibility for drug repurposing, we discovered homologous binding sites in other proteins, with ligands that could potentially bind to the novel PTP1B binding sites.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.