The development of genome sequencing and DNA microarray analysis of gene expression gives rise to the demand for data-mining tools. BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs. BioProspector uses zero to third-order Markov background models whose parameters are either given by the user or estimated from a specified sequence file. The significance of each motif found is judged based on a motif score distribution estimated by a Monte Carlo method. In addition, BioProspector modifies the motif model used in the earlier Gibbs samplers to allow for the modeling of gapped motifs and motifs with palindromic patterns. All these modifications greatly improve the performance of the program. Although testing and development are still in progress, the program has shown preliminary success in finding the binding motifs for Saccharomyces cerevisiae RAP1, Bacillus subtilis RNA polymerase, and Escherichia coli CRP. We are currently working on combining BioProspector with a clustering program to explore gene expression networks and regulatory mechanisms. For a copy of the program and documentation for UNIX systems, please contact xliu@smi.stanford.edu.
Chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP-array) has become a popular procedure for studying genome-wide protein-DNA interactions and transcription regulation. However, it can only map the probable protein-DNA interaction loci within 1-2 kilobases resolution. To pinpoint interaction sites down to the base-pair level, we introduce a computational method, Motif Discovery scan (MDscan), that examines the ChIP-array-selected sequences and searches for DNA sequence motifs representing the protein-DNA interaction sites. MDscan combines the advantages of two widely adopted motif search strategies, word enumeration and position-specific weight matrix updating, and incorporates the ChIP-array ranking information to accelerate searches and enhance their success rates. MDscan correctly identified all the experimentally verified motifs from published ChIP-array experiments in yeast (STE12, GAL4, RAP1, SCB, MCB, MCM1, SFF, and SWI5), and predicted two motif patterns for the differential binding of Rap1 protein in telomere regions. In our studies, the method was faster and more accurate than several established motif-finding algorithms. MDscan can be used to find DNA motifs not only in ChIP-array experiments but also in other experiments in which a subgroup of the sequences can be inferred to contain relatively abundant motif sites. The MDscan web server can be accessed at http://BioProspector.stanford.edu/MDscan/.
Understanding the genetic basis of HIV-1 drug resistance is essential to developing new antiretroviral drugs and optimizing the use of existing drugs. This understanding, however, is hampered by the large numbers of mutation patterns associated with crossresistance within each antiretroviral drug class. We used five statistical learning methods (decision trees, neural networks, support vector regression, least-squares regression, and least angle regression) to relate HIV-1 protease and reverse transcriptase mutations to in vitro susceptibility to 16 antiretroviral drugs. Learning methods were trained and tested on a public data set of genotype-phenotype correlations by 5-fold cross-validation. For each learning method, four mutation sets were used as input features: a complete set of all mutations in >2 sequences in the data set, the 30 most common data set mutations, an expert panel mutation set, and a set of nonpolymorphic treatment-selected mutations from a public database linking protease and reverse transcriptase sequences to antiretroviral drug exposure. The nonpolymorphic treatment-selected mutations led to the best predictions: 80.1% accuracy at classifying sequences as susceptible, low͞intermediate resistant, or highly resistant. Least angle regression predicted susceptibility significantly better than other methods when using the complete set of mutations. The three regression methods provided consistent estimates of the quantitative effect of mutations on drug susceptibility, identifying nearly all previously reported genotype-phenotype associations and providing strong statistical support for many new associations. Mutation regression coefficients showed that, within a drug class, crossresistance patterns differ for different mutation subsets and that cross-resistance has been underestimated. antiviral therapy ͉ HIV ͉ linear regression ͉ machine learning T wenty antiretroviral drugs are approved for treating HIV-1 infection: eight protease inhibitors (PIs), seven nucleoside and one nucleotide reverse transcriptase (RT) inhibitors (NRTIs), three nonnucleoside RT inhibitors (NNRTIs), and one fusion inhibitor. Resistance to these drugs is caused by mutations in their molecular targets. Understanding the genetic basis of cross-resistance is essential for designing new antiviral drugs and for using genotypic drug resistance testing to select optimal therapy. Despite the large number of PIs and RT inhibitors, therapy is challenging because drug resistance arises from complex patterns of mutations and because of the high degree of cross-resistance within each drug class.Approaches for using HIV-1 drug resistance mutations to predict changes in drug susceptibility have included decision trees (1), linear regression (2), linear discriminant analysis (3), neural networks (4), and support vector regression (SVR) (5). Here, we compare five statistical learning methods each using four different sets of input mutations to develop quantitative models associating HIV-1 protease and RT mutations with changes in susce...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.