15The emergence of third generation sequencing (3GS; long-reads) is making closer the goal of 16 chromosome-size fragments in de novo genome assemblies. This allows the exploration of new 17 and broader questions on genome evolution for a number of non-model organisms. However, long-18 read technologies result in higher sequencing error rates and therefore impose an elevated cost of 19 sufficient coverage to achieve high enough quality. In this context, hybrid assemblies, combining 20 short-reads and long-reads provide an alternative efficient and cost-effective approach to generate 21 de novo, chromosome-level genome assemblies. The array of available software programs for 22 hybrid genome assembly, sequence correction and manipulation is constantly being expanded and 23 improved. This makes it difficult for non-experts to find efficient, fast and tractable computational 24 solutions for genome assembly, especially in the case of non-model organisms lacking a reference 25 genome or one from a closely related species. In this study, we review and test the most recent 26 pipelines for hybrid assemblies, comparing the model organism Drosophila melanogaster to a non-27 model cactophilic Drosophila, D. mojavensis. We show that it is possible to achieve excellent 28 contiguity on this non-model organism using the DBG2OLC pipeline. 29
30Identifying the pipeline most optimized to one's needs is one obstacle, and applying it to the actual 50 data is another one, especially in the absence of bioinformatic expertise, since guidelines and 51 practical implementations remain limited. In addition, many of those pipelines are not tested on 52 non-model organisms and assume that the samples are from model organisms where extreme 53Page 4 of 41 inbreeding and high homozygosity is commonly feasible. In the present study, we reviewed the 54 most recent whole genome assembly pipelines, and selected a promising pipeline relying on hybrid 55 technology (Chakraborty, Baldwin-Brown, Long & Emerson, 2016). We tested it thoroughly with 56 the aim of an optimized assembly, using DNA data from both D. melanogaster as a model species, 57 and D. mojavensis from the Sonora, Mexico population as a non-model species. Ultimately, this 58 new D. mojavensis assembly from Sonora will be used in a much larger upcoming genomic study 59 using de novo assemblies of multiple cactophilic species and populations (Matzkin, unpublished 60 data). We provide here an analysis of the effects of different parameters on the quality of the final 61 assembly, assessed by a combination of universal tools (contigs length and N50 as a measure of 62 contiguity; BUSCO score as a measure of quality and completeness (Waterhouse et al., 2017) and 63 a reference-based tool, Quast (Gurevich, Saveliev, Vyahhi & Tesler, 2013) which compares the 64 assembly to a reference genome. We show a significant improvement of assembly quality 65 compared with results from Chakraborty et al. (2016) simply by tuning parameters and we provide 66 guide parameters for assemblies with similar...