Endogenous viral elements (EVEs) are remnants of viral genetic material endogenized into the host genome. They have, in the last decades, attracted attention for their role as potential contributors to pathogenesis, drivers of selective advantage for the host, and genomic remnants of ancient viruses. EVEs have a nuanced and complex influence on both host health and evolution, and can offer insights on the deep evolutionary history of viruses. As an emerging field of research, several factors limit a comprehensive understanding of EVEs: they are currently underestimated and periodically overlooked in studies of the host genome, transcriptome, and virome. The absence of standardized guidelines for ensuring EVE-related data availability and accessibility following the FAIR (‘findable, accessible, interoperable, and reusable’) principles obstructs our ability to gather and connect information.
Here, we discuss challenges to the availability and accessibility of EVE-related data and propose potential solutions. We identified the biological and research focus imbalance between different types of EVEs, and their overall biological complexity as genomic loci with viral ancestry, as potential challenges that can be addressed with the development of a user-oriented identification tool. In addition, reports of EVE identification are scattered between different subfields under different keywords, and EVE sequences and associated data are not properly gathered in databases. While developing an open and dedicated database might be ideal, targeted improvements of generalist databases might provide a pragmatic solution to EVE data and metadata accessibility.
The implementation of these solutions, as well as the collective effort by the EVE scientific community in discussing and setting guidelines, is now drastically needed to lead the development of EVE research and offer insights into host-virus interactions and their evolutionary history.