Motivation: Constraint-based modeling of metabolic networks helps researchers gain insight into the metabolic processes of many organisms, both prokaryotic and eukaryotic. Minimal Cut Sets (MCSs) are minimal sets of reactions whose inhibition blocks a target reaction in a metabolic network. Most approaches for finding the MCSs in constrained-based models require, either as an intermediate step or as a byproduct of the calculation, the computation of the set of elementary flux modes (EFMs), a convex basis for the valid flux vectors in the network. Recently, Ballerstein et al.[BvKKH11] proposed a method for computing the MCSs of a network without first computing its EFMs, by creating a dual network whose EFMs are a superset of the MCSs of the original network. However, their dual network is always larger than the original network and depends on the target reaction. Here we propose the construction of a different dual network, which is typically smaller than the original network and is independent of the target reaction, for the same purpose. We prove the correctness of our approach, MCS 2 , and describe how it can be modified to compute the few smallest MCSs for a given target reaction. Results: We compare MCS 2 to the method of Ballerstein et al. and two other existing methods. We show that MCS 2 succeeds in calculating the full set of MCSs in many models where other approaches cannot finish within a reasonable amount of time. Thus, in addition to its theoretical novelty, our approach provides a practical advantage over existing methods. Availability: MCS 2 is freely available at https://github.com/RezaMash/MCS under the GNU 3.0 license.
15Motivation: The prediction of drug resistance and the identification of its mechanisms in bacteria 16 such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. 17 Modern methods based on testing against a catalogue of previously identified mutations often yield 18 poor predictive performance. On the other hand, machine learning techniques have demonstrated 19 high predictive accuracy, but lack interpretability to aid in identifying specific mutations which lead 20 to resistance. We propose a novel technique, inspired by the group testing problem and Boolean 21 compressed sensing, which yields highly accurate predictions and interpretable results at the same 22 time. 23Results: We develop a modified version of the Boolean compressed sensing problem for identifying 24 drug resistance, and implement its formulation as an integer linear program. This allows us to 25 characterize the predictive accuracy of the technique and select an appropriate metric to optimize. 26 A simple adaptation of the problem also allows us to quantify the sensitivity-specificity trade-off of 27 our model under different regimes. We test the predictive accuracy of our approach on a variety 28 of commonly used antibiotics in treating tuberculosis and find that it has accuracy comparable to 29 that of standard machine learning models and points to several genes with previously identified 30 association to drug resistance. 31 Availability: https://github.com/WGS-TB/DrugResistance/tree/RB_learning 32 Contact: hooman_zabeti@sfu.ca 33 34 2012 ACM Subject Classification Applied computing -Life and medical sciences -Computational 35 biology -Molecular sequence analysis 36 1 Introduction 43 Drug resistance is the phenomenon by which an infectious organism (also known as pathogen) 44 develops resistance to one or more drugs that are commonly used in treatment [36]. In 45 this paper we focus our attention on Mycobacterium tuberculosis, the etiological agent of 46 tuberculosis, which is the largest infectious killer in the world today, responsible for over 10 47 million new cases and 2 million deaths every year [37]. 48 The development of resistance to common drugs used in treatment is a serious public health 49 threat, not only in low and middle-income countries, but also in high-income countries where 50 it is particularly problematic in hospital settings [39]. It is estimated that, without the urgent 51 development of novel antimicrobial drugs, the total mortality due to drug resistance will 52 exceed 10 million people a year by 2050, a number exceeding the annual mortality due to 53 cancer today [35]. 54 Existing models for predicting drug resistance from whole-genome sequence (WGS) data 55 broadly fall into two classes. The first, which we refer to as "catalogue methods," involves 56 testing the WGS data of an isolate for the presence of point mutations (typically single-57 nucleotide polymorphisms, or SNPs) associated with known drug resistance. If one or 58more such mutations is identified,...
Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.
Abstract. Min hash is a probabilistic method for estimating the similarity of two sets in terms of their Jaccard index, defined as the ration of the size of their intersection to their union. We demonstrate that this method performs best when the sets under consideration are of similar size and the performance degrades considerably when the sets are of very different size. We introduce a new and efficient approach, called the containment min hash approach, that is more suitable for estimating the Jaccard index of sets of very different size. We accomplish this by leveraging another probabilistic method (in particular, Bloom filters) for fast membership queries. We derive bounds on the probability of estimate errors for the containment min hash approach and show it significantly improves upon the classical min hash approach. We also show significant improvements in terms of time and space complexity. As an application, we use this method to detect the presence/absence of organisms in a metagenomic data set, showing that it can detect the presence of very small, low abundance microorganisms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.