Optimization of enzymatic fragmentation is crucial to maximize genome coverage: a comparison of library preparation methods for Illumina sequencing

Ribarska, Teodora; Bjørnstad, Pål Marius; Sundaram, Arvind; Gilfillan, Gregor D.

doi:10.1186/s12864-022-08316-y

Cited by 11 publications

(6 citation statements)

References 39 publications

(31 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…PCR amplification has long been an important step of sample preparation, ensuring the adequate amount of DNA fragments for sequencing. However, recently adapted techniques offer the possibility for PCR-free library preparation, which might help to alleviate the problem of PCR-related errors, such as duplicates and erroneous fragments [ 12 , 26 ].…”

Section: Discussionmentioning

confidence: 99%

An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies

Rádai,

Váradi,

Takács

et al. 2024

BMC Genomics

View full text Add to dashboard Cite

Background Parameters adversely affecting the contiguity and accuracy of the assemblies from Illumina next-generation sequencing (NGS) are well described. However, past studies generally focused on their additive effects, overlooking their potential interactions possibly exacerbating one another’s effects in a multiplicative manner. To investigate whether or not they act interactively on de novo genome assembly quality, we simulated sequencing data for 13 bacterial reference genomes, with varying levels of error rate, sequencing depth, PCR and optical duplicate ratios. Results We assessed the quality of assemblies from the simulated sequencing data with a number of contiguity and accuracy metrics, which we used to quantify both additive and multiplicative effects of the four parameters. We found that the tested parameters are engaged in complex interactions, exerting multiplicative, rather than additive, effects on assembly quality. Also, the ratio of non-repeated regions and GC% of the original genomes can shape how the four parameters affect assembly quality. Conclusions We provide a framework for consideration in future studies using de novo genome assembly of bacterial genomes, e.g. in choosing the optimal sequencing depth, balancing between its positive effect on contiguity and negative effect on accuracy due to its interaction with error rate. Furthermore, the properties of the genomes to be sequenced also should be taken into account, as they might influence the effects of error sources themselves.

show abstract

Section: Discussionmentioning

confidence: 99%

An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies

Rádai,

Váradi,

Takács

et al. 2024

BMC Genomics

View full text Add to dashboard Cite

show abstract

“…Tagmentation-based libraries are cost-effective for large numbers of samples, while ligation-based PCR-free workflows are free from transposase binding site bias and PCR artefacts, and so are more suitable for obtaining a high-quality validation set. While this differential processing could introduce systematic differences in sensitivity and precision in detecting variants ( Ribarska et al 2022 ), we think it does not substantially influence the parameter selection and coverage/sample size assessment performed in this study because: (1) a predetermined set of variants is used for imputation, and so false positives in the low-coverage samples have no impact on the imputation process; (2) the high-coverage samples are only used to validate the imputation process, and are not directly compared to the low-coverage samples; (3) the relative imputation accuracy between two imputation runs depends on the respective ancestral haplotype reconstructions, which are the same for both sets of samples. However, it must be noted that the

values reported in this work refer to the high-coverage samples only and may not be fully representative of the imputation performance in samples prepared with a different methodology.…”

Section: Discussionmentioning

confidence: 99%

Genotype imputation in F2 crosses of inbred lines

Pierotti,

Welz,

Osuna-López

et al. 2024

Bioinformatics Advances

View full text Add to dashboard Cite

Motivation Crosses among inbred lines are a fundamental tool for the discovery of genetic loci associated with phenotypes of interest. In organisms for which large reference panels or SNP chips are not available, imputation from low-pass whole-genome sequencing is an effective method for obtaining genotype data from a large number of individuals. To date, a structured analysis of the conditions required for optimal genotype imputation has not been performed. Results We report a systematic exploration of the effect of several design variables on imputation performance in F2 crosses of inbred medaka lines using the imputation software STITCH. We determined that, depending on the number of samples, imputation performance reaches a plateau when increasing the per-sample sequencing coverage. We also systematically explored the trade-offs between cost, imputation accuracy, and sample numbers. We developed a computational pipeline to streamline the process, enabling other researchers to perform a similar cost-benefit analysis on their population of interest. Availability The source code for the pipeline is available at https://github.com/birneylab/stitchimpute. While our pipeline has been developed and tested for an F2 population, the software can also be used to analyse populations with a different structure. Supplementary information Supplementary data are available at Bioinformatics Advances online.

show abstract

“…Tagmentation-based libraries are cost-effective for large numbers of samples, while ligation-based PCR-free workflows are free from transposase binding site bias and PCR artefacts, and so are more suitable for obtaining a high-quality validation set. While this differential processing could introduce systematic differences in sensitivity and precision in detecting variants (Ribarska et al ., 2022), we think it does not substantially influence the parameter selection and coverage/sample size assessment performed in this study because: 1) a predetermined set of variants is used for imputation, and so false positives in the low-coverage samples have no impact on the imputation process; 2) the high-coverage samples are only used to validate the imputation process, and are not directly compared to the low-coverage samples; 3) the relative imputation accuracy between two imputation runs depends on the respective ancestral haplotype reconstructions, which are the same for both sets of samples. However, it must be noted that the r 2 values reported in this work refer to the high-coverage samples only and may not be fully representative of the imputation performance in samples prepared with a different methodology.…”

Section: Discussionmentioning

confidence: 99%

Genotype imputation in F2 crosses of inbred lines

Pierotti,

Welz,

Lopez

et al. 2023

Preprint

View full text Add to dashboard Cite

MotivationCrosses among inbred lines are a fundamental tool for the discovery of genetic loci associated with phenotypes of interest. In organisms for which large reference panels or SNP chips are not available, imputation from low-pass whole-genome sequencing is an effective method for obtaining genotype data from a large number of individuals. To date, a structured analysis of the conditions required for optimal genotype imputation has not been performed.ResultsWe report a systematic exploration of the effect of several design variables on imputation performance in F2 crosses of inbred medaka lines. We determined that, depending on the number of samples, imputation performance reaches a plateau when increasing the per-sample sequencing coverage. We also systematically explored the tradeoffs between cost, imputation accuracy, and sample numbers. We developed a computational pipeline to streamline the process, enabling other researchers to perform a similar cost-benefit analysis on their population of interest.Availability and implementationThe source code for the pipeline is available athttps://github.com/birneylab/stitchimpute.

show abstract

Optimization of enzymatic fragmentation is crucial to maximize genome coverage: a comparison of library preparation methods for Illumina sequencing

Cited by 11 publications

References 39 publications

An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies

An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies

Genotype imputation in F2 crosses of inbred lines

Genotype imputation in F2 crosses of inbred lines

Contact Info

Product

Resources

About