We introduce a new dissimilarity measure between a pair of “clonal trees”, each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree dissimilarity (MLTD) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximum common tree. We show that the MLTD measure can be computed efficiently in polynomial time and it captures the similarity between trees of different clonal granularity well.
The CRISPR/Cas9-sgRNA system has recently become a popular tool for genome editing and a very hot topic in the field of medical research. In this system, Cas9 protein is directed to a desired location for gene engineering and cleaves target DNA sequence which is complementary to a 20-nucleotide guide sequence found within the sgRNA. A lot of experimental efforts, ranging from in vivo selection to in silico modeling, have been made for efficient designing of sgRNAs in CRISPR/Cas9 system. In this article, we present a novel tool, called CRISPRpred, for efficient in silico prediction of sgRNAs on-target activity which is based on the applications of Support Vector Machine (SVM) model. To conduct experiments, we have used a benchmark dataset of 17 genes and 5310 guide sequences where there are only 20% true values. CRISPRpred achieves Area Under Receiver Operating Characteristics Curve (AUROC-Curve), Area Under Precision Recall Curve (AUPR-Curve) and maximum Matthews Correlation Coefficient (MCC) as 0.85, 0.56 and 0.48, respectively. Our tool shows approximately 5% improvement in AUPR-Curve and after analyzing all evaluation metrics, we find that CRISPRpred is better than the current state-of-the-art. CRISPRpred is enough flexible to extract relevant features and use them in a learning algorithm. The source code of our entire software with relevant dataset can be found in the following link: https://github.com/khaled-buet/CRISPRpred.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.