Differences between individual DNA sequences provide the basis for human genetic variability. Forms of genetic variation include single-nucleotide polymorphisms, insertions/duplications, deletions, and inversions/translocations. The genome of human embryonic stem cells (hESCs) has been characterized mainly by karyotyping and comparative genomic hybridization (CGH), techniques whose relatively low resolution at 2-10 megabases (Mb) cannot accurately determine most copy number variability, which is estimated to involve 10%-20% of the genome. In this brief technical study, we examined HSF1 and HSF6 hESCs using arraycomparative genomic hybridization (aCGH) to determine copy number variants (CNVs) as a higher-resolution method for characterizing hESCs. Our approach used five samples for each hESC line and showed four consistent CNVs for HSF1 and five consistent CNVs for HSF6. These consistent CNVs included amplifications and deletions that ranged in size from 20 kilobases to 1.48 megabases, involved seven different chromosomes, were both shared and unique between hESCs, and were maintained during neuronal stem/ progenitor cell differentiation or drug selection. Thirty HSF1 and 40 HSF6 less consistently scored but still highly significant candidate CNVs were also identified. Overall, aCGH provides a promising approach for uniquely identifying hESCs and their derivatives and highlights a potential genomic source for distinct differentiation and functional potentials that lower-resolution karyotype and CGH techniques could miss.
We discuss a novel atomic force microscope-based method for identifying individual short DNA molecules (,5000 bp) within a complex mixture by measuring the intra-molecular spacing of a few sequence-specific topographical labels in each molecule. Using this method, we accurately determined the relative abundance of individual DNA species in a 15-species mixture, with fewer than 100 copies per species sampled. To assess the scalability of our approach, we conducted a computer simulation, with realistic parameters, of the hypothetical problem of detecting abundance changes in individual gene transcripts between two single-cell human messenger RNA samples, each containing roughly 9000 species. We found that this approach can distinguish transcript species abundance changes accurately in most cases, including transcript isoforms which would be challenging to quantitate with traditional methods. Given its sensitivity and procedural simplicity, our approach could be used to identify transcript-derived complementary DNAs, where it would have substantial technical and practical advantages versus established techniques in situations where sample material is scarce.
There are many examples of problems in pattern analysis for which it is often possible to obtain systematic characterizations, if in addition a small number of useful features or parameters of the image are known a priori or can be estimated reasonably well. Often, the relevant features of a particular pattern analysis problem are easy to enumerate, as when statistical structures of the patterns are well understood from the knowledge of the domain. We study a problem from molecular image analysis, where such a domain-dependent understanding may be lacking to some degree and the features must be inferred via machine-learning techniques. In this paper, we propose a rigorous, fully automated technique for this problem. We are motivated by an application of atomic force microscopy (AFM) image processing needed to solve a central problem in molecular biology, aimed at obtaining the complete transcription profile of a single cell, a snapshot that shows which genes are being expressed and to what degree. Reed et al. (“Single molecule transcription profiling with AFM,” Nanotechnology, vol. 18, no. 4, 2007) showed that the transcription profiling problem reduces to making high-precision measurements of biomolecule backbone lengths, correct to within 20–25 bp (6–7.5 nm). Here, we present an image processing and length estimation pipeline using AFM that comes close to achieving these measurement tolerances. In particular, we develop a biased length estimator on trained coefficients of a simple linear regression model, biweighted by a Beaton–Tukey function, whose feature universe is constrained by James–Stein shrinkage to avoid overfitting. In terms of extensibility and addressing the model selection problem, this formulation subsumes the models we studied.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.