We formulate a statistical model for the regulation of global gene expression by multiple regulatory programs and propose a thresholding singular value decomposition (T-SVD) regression method for learning such a model from data. Extensive simulations demonstrate that this method offers improved computational speed and higher sensitivity and specificity over competing approaches. The method is used to analyze microRNA (miRNA) and long noncoding RNA (lncRNA) data from The Cancer Genome Atlas (TCGA) consortium. The analysis yields previously unidentified insights into the combinatorial regulation of gene expression by noncoding RNAs, as well as findings that are supported by evidence from the literature.regulatory program | SVD | sparse | multivariate | regression T he development of microarray and next-generation sequencing technologies has enabled rapid quantification of various genome-wide features (DNA sequences, gene expressions, noncoding RNA expressions, methylation, etc.) in a population of samples (1, 2). Large consortia have compiled genetic and molecular profiling data in an enormous number of tumors across hundreds of samples (3,4). A common challenge arising from these large-scale genomic studies is the inference of regulatory relationships between different genome-wide measurements from the complex biological systems where the number of predictors and responses often far exceeds the sample size.To formulate a statistical model for such regulatory relations, consider the situation depicted in Fig. 1 (see Fig. S1 for more detailed illustration of the model schema), where p regulators x = ðx 1 ; . . . ; x p Þ regulate q responses y = ðy 1 ; . . . ; y q Þ through r regulatory programs that are represented by hidden nodes, e.g., h 1 ; . . . ; h r . The activity h j of the jth program depends on the regulators connected to hidden node j, and h j in turn affects the level of the responses that are connected to node j. To express this model mathematically, we denote by u j and v j the unit vectors corresponding respectively to the input weights fa ij , i = 1; . . . ; pg and the output weights fb jk , k = 1; . . . ; qg of the jth program. Then the regulatory relations are represented as h j = σðxu j Þ and y = P r j=1 d j h j v′ j , where x, y are regarded as row vectors, u, v are regarded as column vectors, and σ() is a sigmoidal function. The aforementioned is a standard single-layer neural network model that is widely used in predictive modeling but could be impossible to learn in biological studies where sample size n is much smaller than p or q. Thus, we first simplify the model by taking σ to be the identity function. Then our model becomes h j = xu j , y = P r j=1 d j h j v′ j . We make the biologically plausible assumption that only a small subset of regulators is contributing to any program and that each program regulates only a small subset of responses. Under this assumption, u j and v j are sparse vectors in R p and R q , respectively. The magnitude of the output weight vector (denoted by d j ) repr...