Identification of genes associated with specific biological phenotypes is a fundamental step toward understanding the molecular basis underlying development and pathogenesis. Although RNAibased high-throughput screens are routinely used for this task, false discovery and sensitivity remain a challenge. Here we describe a computational framework for systematic integration of published gene expression data to identify genes defining a phenotype of interest. We applied our approach to rank-order all genes based on their likelihood of determining ES cell (ESC) identity. RNAi-mediated loss-of-function experiments on top-ranked genes unearthed many novel determinants of ESC identity, thus validating the derived gene ranks to serve as a rich and valuable resource for those working to uncover novel ESC regulators. Underscoring the value of our gene ranks, functional studies of our top-hit Nucleolin (Ncl), abundant in stem and cancer cells, revealed Ncl's essential role in the maintenance of ESC homeostasis by shielding against differentiation-inducing redox imbalance-induced oxidative stress. Notably, we report a conceptually novel mechanism involving a Nucleolin-dependent Nanog-p53 bistable switch regulating the homeostatic balance between self-renewal and differentiation in ESCs. Our findings connect the dots on a previously unknown regulatory circuitry involving genes associated with traits in both ESCs and cancer and might have profound implications for understanding cell fate decisions in cancer stem cells. The proposed computational framework, by helping to prioritize and preselect candidate genes for tests using complex and expensive genetic screens, provides a powerful yet inexpensive means for identification of key cell identity genes.C ell identity is governed by a set of key regulators, which maintain the gene expression program characteristic of that cell state while restricting the induction of alternate programs that could lead to a new cell state. Identification of cell identity genes is a fundamental step toward understanding the mechanisms that underlie cellular homeostasis, differentiation, development, and pathogenesis. RNAi-based high-throughput screening has become a widely used method for identification of new components of diverse biological processes, including signal transduction, cancer, and host cell responses to infection (1, 2). Genome-scale RNAi screens have led to identification of tumor suppressors (3), oncogenes (4), therapeutic targets (5), and regulators of ES cell (ESC) maintenance (6-10), tissue regeneration (11), viral infection (12), and antiviral response (13).Despite the success of RNAi screens, false discovery and sensitivity remain a significant and difficult problem to address, with surprisingly small overlap among screen hits from independent but related screens (1). For example, multiple genomescale RNAi screens for host proteins required for HIV infection/ replication resulted in a limited overlap among screen hits at the gene level (1). Similarly, screens performed in mous...