2018
DOI: 10.1093/gigascience/giy029
|View full text |Cite
|
Sign up to set email alerts
|

Improving draft genome contiguity with reference-derived in silico mate-pair libraries

Abstract: BackgroundContiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available.FindingsIn order to improve genome contiguity, we have developed Cross-Species Scaffolding—a new pipeline that imports long-range distance information direct… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(32 citation statements)
references
References 40 publications
0
32
0
Order By: Relevance
“…We assembled 2,350,959,615 base pairs (bp) of the narwhal nuclear genome (2,156,711,577 bp excluding missing data [N's]) with a scaffold N50 of 1,483,363 bp and contig N50 of 10,481 bp, in a total of 21,007 scaffolds. We used ∼100× coverage of multiple short insert and ∼32× of mate paired Illumina libraries and in silico mate paired libraries (Grau et al., 2018) constructed using the beluga reference genome (Jones et al., 2017) (Genbank: GCA_002288925.2) (Table S1). Investigations into the completeness of the assembly using Benchmarking Universal Single-Copy Orthologs (BUSCO) analyses and the mammalian BUSCO gene set showed a high level of complete BUSCO scores (93%) (Table S2), indicating a fairly complete and high-quality genome.…”
Section: Resultsmentioning
confidence: 99%
“…We assembled 2,350,959,615 base pairs (bp) of the narwhal nuclear genome (2,156,711,577 bp excluding missing data [N's]) with a scaffold N50 of 1,483,363 bp and contig N50 of 10,481 bp, in a total of 21,007 scaffolds. We used ∼100× coverage of multiple short insert and ∼32× of mate paired Illumina libraries and in silico mate paired libraries (Grau et al., 2018) constructed using the beluga reference genome (Jones et al., 2017) (Genbank: GCA_002288925.2) (Table S1). Investigations into the completeness of the assembly using Benchmarking Universal Single-Copy Orthologs (BUSCO) analyses and the mammalian BUSCO gene set showed a high level of complete BUSCO scores (93%) (Table S2), indicating a fairly complete and high-quality genome.…”
Section: Resultsmentioning
confidence: 99%
“…However, as the assembly was scaffolded using in silico mate-pair libraries generated by a closely related species, it also has some limitations. The assembly could contain some mis-assemblies caused by changes in the genomic architecture of the fin and minke whale after they diverged ~10.5Ma [2] meaning it may be inadequate for the study of gene copy number variation, chromosomal structural variation, and synteny between species [13].…”
Section: Discussionmentioning
confidence: 99%
“…We trimmed Illumina adapter sequences from reads and removed reads shorter than 30bp using skewer [12]. From these trimmed reads we constructed in-silico mate paired library reads with insert sizes 1kb, 2kb, 5kb, 10kb, and 20kb using the repeatmasked minke whale genome as a reference (GCA_000493695.1) [11] and Cross-Species Scaffolding, specifying default parameters (100bp reads of ~10x coverage) [13]. Specific read numbers and information can be found in S1 Table.…”
Section: Methodsmentioning
confidence: 99%
“…For example, the reliability of analyses such as operon structure identification, gene regulation, and comparative genomic studies is enhanced by the availability of complete genomes. Additionally, the finishing process can substantially improve the quality of data available to the community by identifying and correcting incorrect assemblies and low coverage regions [ 13 , 14 ].…”
Section: Introductionmentioning
confidence: 99%
“…Currently, several features of sequencing data, particularly increased depth of coverage and error reduction in sequencing libraries, are useful for genome finishing steps. Thus, draft genomes can be combined with additional information from new sequencing and mapping studies to reduce the effort of the finalization process [ 13 ].…”
Section: Introductionmentioning
confidence: 99%