RNA-Seq is a recently developed technology for transcriptome profiling. Numerous advantages of RNA-Seq suggest that it will be the platform of choice for genome-wide expression studies. RNA-Seq generates large volumes of data which require statistical methods for data processing and accurate inference. This article reviews the RNA-Seq technologies followed by a detailed discussion of current statistical methods for normalization and differential expression analysis.
Key wordsRNA-Seq, Normalization, Biological counts, Differential expression
Transcriptome profilingOver 99.9% of genome sequences are the same in all humans [1,2] , yet individuals show great distinction from each other.In a multicellular organism nearly all cells contain the same genome, but they develop into different tissues. A major source for many of these variations is the different gene expression patterns [3] . In control of gene expression, the transcriptome is the complete set of ribonucleic acid (RNA) transcripts, including messenger RNA (mRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), and other non-coding RNA in a given cell type. It is the connection between genes and phenotype. The constitution of the transcriptome for an organism varies at different developmental stages or under different physiological conditions. As a quantitatively cataloged transcriptome provides us information on the underlying genetic mechanisms, the transcriptome is studied in relation to many diseases, e.g., cancers. One of the main goals in a cancer transcriptome study is quantifying the changes in expression levels of all the transcripts in tumor cells. In this manuscript, we discuss the cutting edge methods for quantifying the transcriptome and how the resulting data is used to determine significantly differentially expressed transcripts.
MicroarraysMicroarrays have been the primary technology for quantitative transcriptome analysis since the mid-1990s [4] , and they have discovered many results in cancer research [5][6][7][8] . Although expression studies by microarrays have been very successful in the last decade, there are at least three intrinsic limitations to this hybridization-based technology. First, the background noise in microarrays is large due to cross-hybridization of closely related genes. Second, the microarray signal often reaches a limit of detection or saturation, therefore microarrays have a limited dynamic range (a few hundredfold) [9] .