Our ability to reconstruct genomes from metagenomic datasets has rapidly evolved over the past decade, leading to publications presenting 1,000s, and even more than 100,000 metagenome-assembled genomes (MAGs) from 1,000s of samples. While this wealth of genomic data is critical to expand our understanding of microbial diversity, evolution, and ecology, various issues have been observed in some of these datasets that risk obfuscating scientific inquiry. In this perspective we focus on the issue of identical or highly similar genomes assembled from independent datasets. While obtaining multiple genomic representatives for a species is highly valuable, multiple copies of the same or highly similar genomes complicates downstream analysis. We analyzed data from recent studies to show the levels of redundancy within these datasets, the highly variable performance of commonly used dereplication tools, and to point to existing approaches to account and leverage repeated sampling of the same/similar populations.While initially, the reconstruction of MAGs was only achievable in lower-diversity or highly uneven communities (1), in the past five years reports on the reconstruction of hundreds to thousands of MAGs have become routine (2-5). In the past year, highly automated assembly and binning pipelines have accelerated this trend (6, 7). While these advances open up exciting prospects for addressing questions regarding the physiology, ecology, and evolution of microbial life, MAGs are inherently less reliable than isolate genomes due to their assembly and binning from DNA sequences originating from a mixed community. Various reports have highlighted issues associated with MAGs, including how misassemblies and/or incorrect binning can lead to composite genomes (8,9) and how fragmented assembly due to strain variation can lead to incomplete genomes that lead to wrong conclusions (10, 11). The latter is a reason why independent assembly of each individual sample is often preferable to avoid assembly fragmentation due to genomic variation between conspecific populations in different samples.However, this often leads to highly similar or identical MAGs being generated across the sample dataset. Multiple tools have been developed to remove redundant MAGs, mainly based on average nucleotide identity between MAGs after sequence alignment using blastn (e.g., pyANI (12)), or faster algorithms combining Mash (13) and gANI (14) or ANIm (15) (e.g., as implemented in dRep (16)).Why dereplicate? Dereplication is the reduction of a set of genomes, typically assembled from metagenomic data, based on high sequence similarity between these genomes. The main reason to do so is that when redundancy in a database of genomes is maintained, the subsequent step of mapping sequencing reads back to this database of genomes leads to sequencing reads having multiple high quality alignments which, depending on the software used and parameters chosen, leads to reads being randomly distributed across the redundant genomes with one random alignment report...