The ongoing COVID-19 pandemic is leading to the discovery of hundreds of novel SARS-CoV-2 variants on a daily basis. While most variants do not impact the course of the pandemic, some variants pose significantly increased risk when the acquired mutations allow better evasion of antibody neutralisation in previously infected or vaccinated subjects, or increased transmissibility. Early detection of such high risk variants (HRVs) is paramount for the proper management of the pandemic. However, experimental assays to determine immune evasion and transmissibility characteristics of new variants are resource-intensive and time-consuming, potentially leading to delayed appropriate responses by decision makers. Here we present a novel in silico approach combining Spike protein structure modelling and large protein transformer language models on Spike protein sequences, to accurately rank SARS-CoV-2 variants for immune escape and fitness potential. We validate our immune escape and fitness metrics with in vitro pVNT and binding assays. These metrics can be combined into an automated Early Warning System (EWS) capable of evaluating new variants in minutes and risk monitoring variant lineages in near real-time. The EWS flagged 12 out of 13 variants, designated by the World Health Organisation (WHO, Alpha-Omicron) as potentially dangerous, on average two months ahead of them being designated as such, demonstrating its ability to help increase preparedness against future variants. Omicron was flagged by the EWS on the day its sequence was made available, with immune evasion and binding metrics subsequently confirmed through our in vitro experiments.
Combinatorial optimisation problems framed as mixed integer linear programmes (MILPs) are ubiquitous across a range of real-world applications. The canonical branch-and-bound (B&B) algorithm seeks to exactly solve MILPs by constructing a search tree of increasingly constrained sub-problems. In practice, its solving time performance is dependent on heuristics, such as the choice of the next variable to constrain ('branching'). Recently, machine learning (ML) has emerged as a promising paradigm for branching. However, prior works have struggled to apply reinforcement learning (RL), citing sparse rewards, difficult exploration, and partial observability as significant challenges. Instead, leading ML methodologies resort to approximating high quality handcrafted heuristics with imitation learning (IL), which precludes the discovery of novel policies and requires expensive data labelling. In this work, we propose retro branching; a simple yet effective approach to RL for branching. By retrospectively deconstructing the search tree into multiple paths each contained within a sub-tree, we enable the agent to learn from shorter trajectories with more predictable next states. In experiments on four combinatorial tasks, our approach enables learning-to-branch without any expert guidance or pre-training. We outperform the current state-of-the-art RL branching algorithm by 3-5× and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables, with ablations verifying that our retrospectively constructed trajectories are essential to achieving these results. * Work completed during internship at InstaDeep Preprint. Under review.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.