2019
DOI: 10.1093/bioinformatics/btz942
|View full text |Cite
|
Sign up to set email alerts
|

A haplotype-awarede novoassembly of related individuals using pedigree sequence graph

Abstract: Motivation Reconstructing high-quality haplotype-resolved assemblies for related individuals has important applications in Mendelian diseases and population genomics. Through major genomics sequencing efforts such as the Personal Genome Project, the Vertebrate Genome Project (VGP) and the Genome in a Bottle project (GIAB), a variety of sequencing datasets from trios of diploid genomes are becoming available. Current trio assembly approaches are not designed to incorporate long- and short-read… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
14
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 24 publications
(14 citation statements)
references
References 28 publications
0
14
0
Order By: Relevance
“…The separate assembly of maternal and paternal haplotypes into distinct haplotigs can occur when divergence between similar assembly fragments is mistakenly inferred to represent paralogy (duplication) rather than heterozygosity and if not recognized can inflate the size of a haploid reference assembly. However, haplotype-aware graph-based assembly algorithms have improved tolerance of heterozygosity and are being applied to new anopheline assembly projects 69 , 70 . Further, the DNA input requirements for long-read sequencing have become smaller in recent years, enabling long-read sequencing libraries to be made from a single mosquito instead of a pool of mosquitoes, reducing genetic variation in the sequencing template 71 .…”
Section: Challenges In Generating Genomic Datamentioning
confidence: 99%
“…The separate assembly of maternal and paternal haplotypes into distinct haplotigs can occur when divergence between similar assembly fragments is mistakenly inferred to represent paralogy (duplication) rather than heterozygosity and if not recognized can inflate the size of a haploid reference assembly. However, haplotype-aware graph-based assembly algorithms have improved tolerance of heterozygosity and are being applied to new anopheline assembly projects 69 , 70 . Further, the DNA input requirements for long-read sequencing have become smaller in recent years, enabling long-read sequencing libraries to be made from a single mosquito instead of a pool of mosquitoes, reducing genetic variation in the sequencing template 71 .…”
Section: Challenges In Generating Genomic Datamentioning
confidence: 99%
“…It is possible to switch to a noisy read assembler and to add Illumina data for SNP calling, but assembly accuracy may be reduced due to the elevated sequencing error rate. Second, starting with an unphased assembly, we may miss highly heterozygous regions involving long SVs, as demonstrated in our previous works on small genomes 5,8 . A potential solution is to retain heterozygous events in the initial assembly graph and to scaffold and dissect these events later to generate a phased assembly.…”
Section: Nature Biotechnologymentioning
confidence: 95%
“…FALCON-Phase 6 , which extends FALCON-Unzip, uses Hi-C to connect phased sequence blocks and can generate longer haplotypes, but it cannot achieve chromosome-long phasing. Trio binning 7,8 is the only published method that can do this, plus the assembly and phasing of entire chromosomes. It uses sequence reads from both parents to partition the offspring's long reads and then assemble each partition separately.…”
mentioning
confidence: 99%
“…Yet, centromeres play an important role in cancer genomics [8] while short tandem repeat (STR) expansions associate with a number of genetic diseases [9]. LRS technologies have also enabled de novo haplotype-resolved assemblies with very few contig breaks [10,11]. Finally, LRS technologies overcome chemistry limitations of SRS, in particular GC bias [12] and PCR amplification artifacts [13] causing uneven coverages for reads produced by Illumina platforms.…”
Section: Introductionmentioning
confidence: 99%