The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Srivastava et al. define a new and improved way to predict immunoprotective cancer neoepitopes based in part on the difference in MHC-binding scores between the mutant epitope and its wild-type counterpart. Remarkably, all neoepitopes that elicited tumor regression bound to class I MHC molecules with very low affinity.
In this paper we study the problem of assigning transmission ranges to the nodes of a static ad hoc wireless network so as to minimize the total power consumed under the constraint that enough power is provided to the nodes to ensure that the network is connected. We focus on the MIN-POWER SYMMETRIC CONNECTIVITY problem, in which the bidirectional links established by the transmission ranges are required to form a connected graph. Implicit in previous work on transmission range assignment under asymmetric connectivity requirements is the proof that MIN-POWER SYMMETRIC CONNECTIVITY is NP-hard and that the MST algorithm has a performance ratio of 2. In this paper we make the following contributions: (1) we show that the related MIN-POWER SYMMETRIC UNICAST problem can be solved efficiently by a shortest-path computation in an appropriately constructed auxiliary graph; (2) we give an exact branch and cut algorithm based on a new integer linear program formulation solving instances with up to 35-40 nodes in 1 hour; (3) we establish the similarity between MIN-POWER SYMMETRIC CONNECTIVITY and the classic STEINER TREE problem in graphs, and use this similarity to give a polynomial-time approximation scheme with performance ratio approaching 5/3 as well as a more practical approximation algorithm with approximation factor 11/6; and (4) we give the results of a comprehensive experimental study comparing new and previously proposed heuristics with the above exact and approximation algorithms.
BackgroundMassively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging.ResultsIn this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at http://dna.engr.uconn.edu/software/IsoEM/.ConclusionsEmpirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.