Population-level diversity of natural microbiomes represent a biotechnological resource for biomining, biorefining and synthetic biology but requires the recovery of the exact DNA sequence (or "haplotype") of the genes and genomes of every individual present. Computational haplotype reconstruction is extremely difficult, complicated by environmental sequencing data (metagenomics). Current approaches cannot choose between alternative haplotype reconstructions and fail to provide biological evidence of correct predictions. To overcome this, we present Hansel and Gretel: a novel probabilistic framework that reconstructs the most likely haplotypes from complex microbiomes, is robust to sequencing error and uses all available evidence from aligned reads, without altering or discarding observed variation. We provide the first formalisation of this problem and propose "metahaplome" as a definition for the set of haplotypes for any genomic region of interest within a metagenomic dataset. Finally, we demonstrate using long-read sequencing, biological evidence of novel haplotypes of industrially important enzymes computationally predicted from a natural microbiome.Keywords: 'metagenome', 'haplotypes', 'long read sequencing', 'algorithm' Running Title: Haplotype recovery from natural microbiomes Contact: msn@aber.ac.uk (+441970 622 424) 1 . CC-BY 4.0 International license not peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was . http://dx.doi.org/10.1101/223404 doi: bioRxiv preprint first posted online Nov. 22, 2017; It has become clear that population-level genetic variation drives competitiveness and niche specialisation in microbial communities [1]. Novel combinations of variants in individuals (haplotypes) are filtered by natural selection so that those that confer an advantage are retained [2]. Recovering the haplotypes of enzyme isoforms for a given gene across all organisms in a microbiome (the "metahaplome") would offer great biotechnological potential [3,4] and allow unprecedented insights into microbial ecosystems [5].Similar goals in humans are being achieved by the International HapMap Project which aims to describe the common patterns of human genetic variation that affect health, disease, responses to drugs and environmental factors [6]. However, microbial research has so far focused on higher-level characterisations of diversity, for example: the gene-set of all strains of a species (the pangenome) [7], or quantification of individual SNPs found in microbial communities (variome) [8] or in viruses, the strains related by mutations in a highly mutagenic environment (the quasispecies) [9].Reconstructing population-level variation in microbial communities is limited by our inability to culture in vitro many microbes from the environment. Researchers must instead rely on DNA isolated and sequenced directly from an environment (metagenomics) which generally results in highly fragmented and incomplete data containing sequencing errors. This...