Single nucleotide polymorphism (SNP) is the most frequent form of DNA variation. The set of SNP's present in a chromosome (called the haplotype) is of interest in a wide area of applications in molecular biology and biomedicine, including diagnostic and medical therapy. In this paper we propose a new heuristic method for the problem of haplotype reconstruction for (portions of) a pair of homologous human chromosomes from a single individual (SIH). The problem is well known in literature and exact algorithms have been proposed for the case when no (or few) gaps are allowed in the input fragments. These algorithms, though exact and of polynomial complexity, are slow in practice. When gaps are considered no exact method of polynomial complexity is known. The problem is also hard to approximate with guarantees. Therefore fast heuristics have been proposed. In this paper we describe SpeedHap, a new heuristic method that is able to tackle the case of many gapped fragments and retains its effectiveness even when the input fragments have high rate of reading errors (up to 20%) and low coverage (as low as 3). We test SpeedHap on real data from the HapMap Project.
Source code and datasets will be available upon paper acceptance at the URL: bioalgo.iit.cnr.it/Dot2dot.
Polymorphic Tandem Repeat (PTR) is a common form of polymorphism in the human genome. A PTR consists in a variation found in an individual (or in a population) of the number of repeating units of a Tandem Repeat (TR) locus of the genome with respect to the reference genome. Several phenotypic traits and diseases have been discovered to be strongly associated with or caused by specific PTR loci. PTR are further distinguished in two main classes: Short Tandem Repeats (STR) when the repeating unit has size up to 6 base pairs, and Variable Number Tandem Repeats (VNTR) for repeating units of size above 6 base pairs. As larger and larger populations are screened via high throughput sequencing projects, it becomes technically feasible and desirable to explore the association between PTR and a panoply of such traits and conditions. In order to facilitate these studies, we have devised a method for compiling catalogs of PTR from assembled genomes, and we have produced a catalog of PTR for genic regions (exons, introns, UTR and adjacent regions) of the human genome (GRCh38). We applied four different TR discovery software tools to uncover in the first phase 55,223,485 TR (after duplicate removal) in GRCh38, of which 373,173 were determined to be PTR in the second phase by comparison with five assembled human genomes. Of these, 263,266 are not included by state-of-the-art PTR catalogs. The new methodology is mainly based on a hierarchical and systematic application of alignment-based sequence comparisons to identify and measure the polymorphism of TR. While previous catalogs focus on the class of STR of small total size, we remove any size restrictions, aiming at the more general class of PTR, and we also target fuzzy TR by using specific detection tools. Similarly to other previous catalogs of human polymorphic loci, we focus our catalog toward applications in the discovery of disease-associated loci. Validation by cross-referencing with existing catalogs on common clinically-relevant loci shows good concordance. Overall, this proposed census of human PTR in genic regions is a shared resource (web accessible), complementary to existing catalogs, facilitating future genome-wide studies involving PTR.
MicroRNAs are small non-coding RNAs that influence gene expression by binding to the 3’ UTR of target mRNAs in order to repress protein synthesis. Soon after discovery, microRNA dysregulation has been associated to several pathologies. In particular, they have often been reported as differentially expressed in healthy and tumor samples. This fact suggested that microRNAs are likely to be good candidate biomarkers for cancer diagnosis and personalized medicine. With the advent of Next-Generation Sequencing (NGS), measuring the expression level of the whole miRNAome at once is now routine. Yet, the collaborative effort of sharing data opens to the possibility of population analyses. This context motivated us to perform an in-silico study to distill cancer-specific panels of microRNAs that can serve as biomarkers. We observed that the problem of finding biomarkers can be modeled as a two-class classification task where, given the miRNAomes of a population of healthy and cancerous samples, we want to find the subset of microRNAs that leads to the highest classification accuracy. We fulfill this task leveraging on a sensible combination of data mining tools. In particular, we used: differential evolution for candidate selection, component analysis to preserve the relationships among miRNAs, and SVM for sample classification. We identified 10 cancer-specific panels whose classification accuracy is always higher than 92%. These panels have a very little overlap suggesting that miRNAs are not only predictive of the onset of cancer, but can be used for classification purposes as well. We experimentally validated the contribution of each of the employed tools to the selection of discriminating miRNAs. Moreover, we tested the significance of each panel for the corresponding cancer type. In particular, enrichment analysis showed that the selected miRNAs are involved in oncogenesis pathways, while survival analysis proved that miRNAs can be used to evaluate cancer severity. Summarizing: results demonstrated that our method is able to produce cancer-specific panels that are promising candidates for a subsequent in vitro validation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.