20Sequencing technology and assembly algorithms have matured to the point that high-21 quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse 22 transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for 23 each class of elements with unknown relative performance metrics. We benchmarked existing 24 programs based on a curated library of rice TEs. Using the most robust programs, we created a 25 comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a 26 condensed TE library for annotations of structurally intact and fragmented elements. EDTA is 27 open-source and freely available: https://github.com/oushujun/EDTA. 28 Keywords 29 Transposable element; Annotation; Genome; Benchmarking; Pipeline 30 31Long-read sequencing (e.g., PacBio and Oxford Nanopore) and assembly scaffolding 50 (e.g., Hi-C and BioNano) techniques have progressed rapidly within the last few years. These 51 innovations have been critical for high-quality assembly of the repetitive fraction of genomes. In 52 fact, Ou et al. [8] demonstrated that the assembly contiguity of repetitive sequences in recent 53 long-read assemblies is even better than traditional BAC-based reference genomes. With these 54 developments, inexpensive and high-quality assembly of an entire genome is now possible. 55Knowing where features (i.e., genes, TEs, etc.) exist in a genome assembly is important 56 4 information for using these assemblies for biological findings. However, unlike the relatively 57 straightforward and comprehensive pipelines established for gene annotation [9][10][11], current 58 methods for TE annotation can be piecemeal, inaccurate, and are highly specific to classes of 59 transposable elements. 60Transposable elements fall into two major classes. Class I elements, also known as 61 retrotransposons, use an RNA intermediate in their "copy and paste" mechanism of 62 transposition [12]. Class I elements can be further divided into long terminal repeat (LTR) 63 retrotransposons, as well as those that lack LTRs (non-LTRs), which include long interspersed 64 nuclear elements (LINEs), and short interspersed nuclear elements (SINEs). Structural features 65 of these elements can facilitate automated de novo annotation in a genome assembly. For 66 example, LTR elements have a 5-bp target site duplication (TSD), while non-LTRs have either 67 variable length TSDs or lack TSDs entirely, being instead associated with deletion of flanking 68 sequences upon insertion [13]. There are also standard terminal sequences associated with 69 LTR elements (i.e., 5'-TG…C/G/TA-3' for LTR-Copia and 5'-TG…CA-3' for LTR-Gypsy 70 elements), and non-LTRs often have a terminal poly-A tail at the 3' end of the element (see [14] 71 for a complete description of structural features of each superfamily). 72The second major class of TEs, Class II elements, also known as DNA transposons, use 73 a DNA intermediate in their "cut and paste" mechanism of transposition [15]. As with Class I 74...