Phenotypic variability in a population of cells can work as the bet-hedging of the cells under an unpredictably changing environment, the typical example of which is the bacterial persistence. To understand the strategy to control such phenomena, it is indispensable to identify the phenotype of each cell and its inheritance. Although recent advancements in microfluidic technology offer us useful lineage data, they are insufficient to directly identify the phenotypes of the cells. An alternative approach is to infer the phenotype from the lineage data by latent-variable estimation. To this end, however, we must resolve the bias problem in the inference from lineage called survivorship bias. In this work, we clarify how the survivor bias distorts statistical estimations. We then propose a latent-variable estimation algorithm without the survivorship bias from lineage trees based on an expectation-maximization (EM) algorithm, which we call Lineage EM algorithm (LEM). LEM provides a statistical method to identify the traits of the cells applicable to various kinds of lineage data. 1 2 3 4 5 6 7 8 2generations (1-3). Such phenotypic variety leads to behavioral individuality of each cell, which, in turn, generates complicated 3 population phenomena. One example is bacterial persistence in which a fraction of cells in a population survives when the 4 population experiences an antibiotic exposure even though the other fraction dies out (4-8). Persistence is also recently 5 recognized relevant to the drug-resistance of cancers (9, 10). While the survivors, which are also called persisters, were originally 6 conjectured as dormant and thereby drug-insensitive cells in a population, recent bioimaging analysis revealed that persistence 7 is a more intricate phenomenon, which involves resistant but still growing cells (5, 6). Because drug-resistance is tightly related 8 to the manner how the drug is incorporated into a cell and interferes with the self-replicating process, quantitative analysis 9 of persistence requires a characterization of growth states of individual cells and their competition in a population. More 10 generally, the heterogeneities in the growth speed and the death rate as well as their inheritance from a mother to daughter 11 cells constitute Darwinian natural selection among the cells. Such natural selection at the cellular level is also highly relevant 12 to drug-resistances of pathogens and cancers, immunological memories, cell competitions in tissues, and induction of iPS cells 13 (9-13). Moreover, the population dynamics under selection is integratively determined by the growth states of cells, their 14 statistical property, and their inheritance dynamics over generations. Therefore, identification of growth states of cells from 15 data is crucial for predicting and controlling those selection-driven phenomena (14, 15).
16To determine the growth states and their dynamics, we can take advantage of recent bioimaging and microfluidic technology, 17 which provide abundant but incomplete data of the popul...