Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for de novo assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory. We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for de novo assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly. metagenomics | compression D e novo assembly of shotgun sequencing reads into longer contiguous sequences plays an important role in virtually all genomic research (1). However, current computational methods for sequence assembly do not scale well to the volume of sequencing data now readily available from next-generation sequencing machines (1, 2). In particular, the deep sequencing required to sample complex microbial environments easily results in datasets that surpass the working memory of available computers (3, 4).Deep sequencing and assembly of short reads is particularly important for the sequencing and analysis of complex microbial ecosystems, which can contain millions of different microbial species (5, 6). These ecosystems mediate important biogeochemical processes but are still poorly understood at a molecular level, in large part because they consist of many microbes that cannot be cultured or studied individually in the lab (5, 7). Ensemble sequencing ("metagenomics") of these complex environments is one of the few ways to render them accessible, and has resulted in substantial early progress in understanding the microbial composition and function of the ocean, human gut, cow rumen, and permafrost soil (3,4,8,9). However, as sequencing capacity grows, the assembly of sequences from these complex samples has become increasingly computationally challenging. Current methods for short-read assembly rely on inexact data reduction in which reads from low-abundance organisms are discarded, biasing analyses towards high-abundance organisms (3, 4, 9).The predominant assembly formalism applied to short-read sequencing datasets is a de Bruijn graph (10-12). In a de Bruijn graph approach, sequencing reads are decomposed into fixedlength words, or k-mers, and used to build a connectivity graph. This graph is then traversed to determine contiguous...