A novel approach (TOMOCOMD-CARDD) to computer-aided "rational" drug design is illustrated. This approach is based on the calculation of the non-stochastic and stochastic linear indices of the molecular pseudograph's atom-adjacency matrix representing molecular structures. These TOMOCOMD-CARDD descriptors are introduced for the computational (virtual) screening and "rational" selection of new lead antibacterial agents using linear discrimination analysis. The two structure-based antibacterial-activity classification models, including non-stochastic and stochastic indices, classify correctly 91.61% and 90.75%, respectively, of 1525 chemicals in training sets. These models show high Matthews correlation coefficients (MCC=0.84 and 0.82). An external validation process was carried out to assess the robustness and predictive power of the model obtained. These QSAR models permit the correct classification of 91.49% and 89.31% of 505 compounds in an external test set, yielding MCCs of 0.84 and 0.79, respectively. The TOMOCOMD-CARDD approach compares satisfactorily with respect to nine of the most useful models for antimicrobial selection reported to date. Finally, an in silico screening of 87 new chemicals reported in the anti-infective field with antibacterial activities is developed showing the ability of the TOMOCOMD-CARDD models to identify new lead antibacterial compounds.
Digital Signal Processing (DSP) applications in Bioinformatics have received great attention in recent years, where new effective methods for genomic sequence analysis, such as the detection of coding regions, have been developed. The use of DSP principles to analyze genomic sequences requires defining an adequate representation of the nucleotide bases by numerical values, converting the nucleotide sequences into time series. Once this has been done, all the mathematical tools usually employed in DSP are used in solving tasks such as identification of protein coding DNA regions, identification of reading frames, and others. In this article we present an overview of the most relevant applications of DSP algorithms in the analysis of genomic sequences, showing the main results obtained by using these techniques, analyzing their relative advantages and drawbacks, and providing relevant examples. We finally analyze some perspectives of DSP in Bioinformatics, considering recent research results on algebraic structures of the genetic code, which suggest other new DSP applications in this field, as well as the new field of Genomic Signal Processing.
Due to the non-uniform distribution of codons in coding regions, a three-periodicity is present in most of genome coding regions which, after a previous numeric conversion, show a notable peak at frequency component N/3 when calculating the Fourier Transform. Taking into account the veracity of this result, the Short Time Fourier Transform has been applied to large DNA sequences to predict coding regions. This paper presents a new approach to reduce the computational burden associated with STFT computation, for coding regions detection purposes. Experimental results show significant savings in computation time when the proposed algorithm is employed.
In this paper we investigate the usage of a clustering algorithm as a feature extraction technique to find new features to represent the protein sequence. In particular, our work focuses on the prediction of HIV protease resistance to drugs. We use a biologically motivated similarity function based on the contact energy of the amino acid and the position in the sequence. The performance measure was computed taking into account the clustering reliability and the classification validity. An SVM using 10-fold crossvalidation and the k-means algorithm were used for classification and clustering respectively. The best results were obtained by reducing an initial set of 99 features to a lower dimensional feature set of 36-66 features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.