Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.
The Arctic Ocean is experiencing unprecedented changes because of climate warming, necessitating detailed analyses on the ecology and dynamics of biological communities to understand current and future ecosystem shifts. Here, we generated a four-year, high-resolution amplicon dataset along with one annual cycle of PacBio HiFi read metagenomes from the East Greenland Current (EGC), and combined this with datasets spanning different spatiotemporal scales (Tara Arctic and MOSAiC) to assess the impact of Atlantic water influx and sea-ice cover on bacterial communities in the Arctic Ocean. Densely ice-covered polar waters harboured a temporally stable, resident microbiome. Atlantic water influx and reduced sea-ice cover resulted in the dominance of seasonally fluctuating populations, resembling a process of “replacement” through advection, mixing and environmental sorting. We identified bacterial signature populations of distinct environmental regimes, including polar night and high-ice cover, and assessed their ecological roles. Dynamics of signature populations were consistent across the wider Arctic; e.g. those associated with dense ice cover and winter in the EGC were abundant in the central Arctic Ocean in winter. Population- and community-level analyses revealed metabolic distinctions between bacteria affiliated with Arctic and Atlantic conditions; the former with increased potential to use bacterial- and terrestrial-derived substrates or inorganic compounds. Our evidence on bacterial dynamics over spatiotemporal scales provides novel insights into Arctic ecology and indicates a progressing Biological Atlantification of the warming Arctic Ocean, with consequences for food webs and biogeochemical cycles.
Ice-binding proteins (IBPs) are a group of ecologically and biotechnologically relevant enzymes produced by psychrophilic organisms. Although putative IBPs containing the domain of unknown function (DUF) 3494 have been identified in many taxa of polar microbes, our knowledge of their genetic and structural diversity in natural microbial communities is limited. Here, we used samples from sea ice and sea water collected in the central Arctic Ocean as part of the MOSAiC expedition for metagenome sequencing and the subsequent analyses of metagenome-assembled genomes (MAGs). By linking structurally diverse IBPs to particular environments and potential functions, we reveal that IBP sequences are enriched in interior ice, have diverse genomic contexts and cluster taxonomically. Their diverse protein structures may be a consequence of domain shuffling, leading to variable combinations of protein domains in IBPs and probably reflecting the functional versatility required to thrive in the extreme and variable environment of the central Arctic Ocean.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.