2020
DOI: 10.21203/rs.3.rs-32139/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Impact of short-read sequencing on the misassembly of a plant genome

Abstract: Background Availability of plant genome sequences has led to significant advances. However, with few exceptions, the great majority of existing genome assemblies are derived from short read sequencing technologies with highly uneven read coverages indicative of sequencing and assembly issues that could significantly impact any downstream analysis of plant genomes. In tomato for example, 0.6% (5.1 Mb) and 9.7% (79.6 Mb) of short-read based assembly had significantly higher and lower coverage compared to backgr… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…Repetitive elements made up nearly 60% of all three Fraxinus genome assemblies (Table S3), while they made up about only about 45% of the F. pennsylvanica reference assembly, which was based on Illumina short reads. Since short-read assemblies have the propensity to collapse duplicated sequences, this difference was likely due to better read-through of repetitive DNA on long ONT single molecules [24]. The largest group of retroelements discovered were Ty1/Copia and Gypsy/DIRS LTR elements, each comprising about 15% of all repetitive elements characterized.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Repetitive elements made up nearly 60% of all three Fraxinus genome assemblies (Table S3), while they made up about only about 45% of the F. pennsylvanica reference assembly, which was based on Illumina short reads. Since short-read assemblies have the propensity to collapse duplicated sequences, this difference was likely due to better read-through of repetitive DNA on long ONT single molecules [24]. The largest group of retroelements discovered were Ty1/Copia and Gypsy/DIRS LTR elements, each comprising about 15% of all repetitive elements characterized.…”
Section: Resultsmentioning
confidence: 99%
“…The assemblies from this study were also closer to genome sizes estimated by ow cytometry [22], with the reference-guided assemblies being over 100 MB smaller than estimated genome sizes. Again, these disparities may be partly due to better repeat detection in long-read genome assemblies, which likely contain fewer collapses of identical or near-identical sequences [24].…”
Section: Resultsmentioning
confidence: 99%
“…Alternatively, the observed differences may represent a methodological artifact. For example, repeat regions are known to affect the bioinformatic process of mapping sequence reads to a reference genome (Schbath et al, 2012; Thankaswamy-Kosalai et al, 2017; Wang et al, 2021). In Bowtie2, for example, reads are mapped to a reference genome based on a mapping score that is identical across exact repeats, but pseudo-random numbers are used to break ties between repeat locations (Langmead and Salzberg, 2012).…”
Section: Discussionmentioning
confidence: 99%
“…For example, equality in length and sequence of the IRs of a plastid genome can be used as an indicator of its overall assembly quality (Gruenstaeudl and Jenke, 2020; Zheng et al, 2020), as IR equality is maintained naturally and rarely deviated from (Goulding et al, 1996; Ruhlman et al, 2017). Third, researchers can measure the depth of sequence coverage of a genome assembly (Sims et al, 2014), with both the average sequencing depth and the evenness of coverage indicative of assembly quality (Chen et al, 2013; Wang et al, 2021). Average sequencing depth can assist in identifying the overall adequacy of sequence coverage, whereas the precise distribution of sequencing depth can help to identify regions of inadequate coverage.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation