Mai Alzamel scite author profile

Motivation Computing the uniqueness of k-mers for each position of a genome while allowing for up to e mismatches is computationally challenging. However, it is crucial for many biological applications such as the design of guide RNA for CRISPR experiments. More formally, the uniqueness or (k, e)-mappability can be described for every position as the reciprocal value of how often this k-mer occurs approximately in the genome, i.e. with up to e mismatches. Results We present a fast method GenMap to compute the (k, e)-mappability. We extend the mappability algorithm, such that it can also be computed across multiple genomes where a k-mer occurrence is only counted once per genome. This allows for the computation of marker sequences or finding candidates for probe design by identifying approximate k-mers that are unique to a genome or that are present in all genomes. GenMap supports different formats such as binary output, wig and bed files as well as csv files to export the location of all approximate k-mers for each genomic position. Availability and implementation GenMap can be installed via bioconda. Binaries and C++ source code are available on https://github.com/cpockrandt/genmap.

show abstract

Faster algorithms for 1-mappability of a sequence

Alzamel

Charalampopoulos

Iliopoulos

et al. 2020

Theoretical Computer Science

View full text Add to dashboard Cite

In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. There exists an algorithm to solve this problem for k = 1 requiring time O(mn log n/ log log n) using space O(n). Here we present two new algorithms that require worst-case time O(mn) and O(n log n log log n), respectively, and space O(n), thus greatly improving the previous result. Moreover, we present another algorithm that requires average-case time and space O(n) for integer alphabets of size σ if m = (log σ n). Notably, we show that this algorithm is generalizable for arbitrary k, requiring average-case time O(kn) and space O(n) if m = (k log σ n), assuming that the letters are independent and uniformly distributed random variables. Finally, we provide an experimental evaluation of our average-case algorithm demonstrating its competitiveness to the state-of-the-art implementation.

show abstract

How to Answer a Small Batch of RMQs or LCA Queries in Practice

Alzamel

Charalampopoulos

Iliopoulos

et al. 2018

View full text Add to dashboard Cite

In the Range Minimum Query (RMQ) problem, we are given an array A of n numbers and we are asked to answer queries of the following type: for indices i and j between 0 and n − 1, query RMQ A (i, j) returns the index of a minimum element in the subarray A[i . . j]. Answering a small batch of RMQs is a core computational task in many realworld applications, in particular due to the connection with the Lowest Common Ancestor (LCA) problem. With small batch, we mean that the number q of queries is o(n) and we have them all at hand. It is therefore not relevant to build an Ω(n)-sized data structure or spend Ω(n) time to build a more succinct one. It is well-known, among practitioners and elsewhere, that these data structures for online querying carry high constants in their pre-processing and querying time. We would thus like to answer this batch efficiently in practice. With efficiently in practice, we mean that we (ultimately) want to spend n + O(q) time and O(q) space. We write n to stress that the number of operations per entry of A should be a very small constant. Here we show how existing algorithms can be easily modified to satisfy these conditions. The presented experimental results highlight the practicality of this new scheme. The most significant improvement obtained is for answering a small batch of LCA queries. A library implementation of the presented algorithms is made available.

show abstract

Quasi-Linear-Time Algorithm for Longest Common Circular Factor

Alzamel

Crochemore

Iliopoulos

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mai Alzamel

GenMap: ultra-fast computation of genome mappability

Faster algorithms for 1-mappability of a sequence

How to Answer a Small Batch of RMQs or LCA Queries in Practice

Quasi-Linear-Time Algorithm for Longest Common Circular Factor

Contact Info

Product

Resources

About