Modern biology has moved from a science of individual measurements to a science where data are collected on an industrial scale. Foremost, among the new tools for biochemistry are chip arrays which, in one operation, measure hundreds of thousands or even millions of DNA sequences or RNA transcripts. While this is impressive, increasingly sophisticated analysis tools have been required to convert gene array data into gene expression levels. Despite the assumption that noise levels are low, since the number of measurements for an individual gene is small, identifying which signals are affected by noise is a priority. High-density oligonucleotide array (HDONAs) from NCBI GEO shows that, even in the best Human GeneChips 1/4 percent of data are affected by spatial noise. Earlier designs are noisier and spatial defects may affect more than 25 percent of probes. BioConductor R code is available as supplementary material which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.108 and via http://bioinformatics.essex.ac.uk/users/wlangdon/TCBB-2007-11-0161.tar.gz.
Abstract-Protein families can be used to reconstruct evolutionary histories of organisms. The accuracy of protein assignment to such families is critical for the success of such studies. Here we investigate the automatic aggregation of motif-defined homologous protein families for further reconstruction of their evolutionary histories. We propose a method that utilises only parameters that can be adjusted by using the data. The building blocks of the method include: (a) a majority rule for combining protein homologous neighbourhood lists into that for a family, and (b) a robust clustering procedure whose only parameter, the similarity shift, can be estimated from information on proteins with known function. The method is applied to a herpesvirus protein dataset leading to insights into the composition of ancestors of herpesvirus superfamilies. Comparison of the computational reconstructions with more comprehensive analyses also show how alignment-based betweenprotein similarity scoring can be improved by using data on gene arrangements.
SummaryBackground: A chimeric transcript is a single RNA sequence which results from the transcription of two adjacent genes. Recent studies estimate that at least 4% of tandem human gene pairs may form chimeric transcripts. Affymetrix GeneChip data are used to study the expression patterns of tens of thousands of genes and the probe sequences used in these microarrays can potentially map to exotic RNA sequences such as chimeras. Results: We have studied human chimeras and investigated their expression patterns using large surveys of Affymetrix microarray data obtained from the Gene Expression Omnibus. We show that for six probe sets, a unique probe mapping to a transcript produced by one of the adjacent genes can be used to identify the expression patterns of readthrough transcripts. Furthermore, unique probes mapping to an intergenic exon present only in the MASK-BP3 chimera can be used directly to study the expression levels of this transcript.
Conclusions:We have attempted to implement a new method for identifying tandem chimerism. In this analysis unambiguous probes are needed to measure run-off transcription and probes that map to intergenic exons are particularly valuable for identifying the expression of chimeras.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.