Abstract. DNA microarrays, which are among the most popular genomic tools, are widely applied in biology and medicine. Boutique arrays, which are small, spotted, dedicated microarrays, constitute an inexpensive alternative to whole-genome screening methods. The data extracted from each microarray-based experiment must be transformed and processed prior to further analysis to eliminate any technical bias. The normalization of the data is the most crucial step of microarray data pre-processing and this process must be carefully considered as it has a profound effect on the results of the analysis. Several normalization algorithms have been developed and implemented in data analysis software packages. However, most of these methods were designed for whole-genome analysis. In this study, we tested 13 normalization strategies (ten for double-channel data and three for single-channel data) available on R Bioconductor and compared their effectiveness in the normalization of four boutique array datasets. The results revealed that boutique arrays can be successfully normalized using standard methods, but not every method is suitable for each dataset. We also suggest a universal seven-step workflow that can be applied for the selection of the optimal normalization procedure for any boutique array dataset. The described workflow enables the evaluation of the investigated normalization methods based on the bias and variance values for the control probes, a differential expression analysis and a receiver operating characteristic curve analysis. The analysis of each component results in a separate ranking of the normalization methods. A combination of the ranks obtained from all the normalization procedures facilitates the selection of the most appropriate normalization method for the studied dataset and determines which methods can be used interchangeably.
IntroductionDespite the dynamic development of deep sequencing technologies, microarrays are still commonly used in genomic research (1-5). Currently, DNA microarrays are mainly used for genotyping (6-9), gene expression profiling (10-12) and microRNA screening (13-15). In medicine, microarrays are used to determine the complexity and heterogeneity of diseases, to facilitate disease classification and to predict therapeutic outcomes (8,(16)(17)(18)(19)(20)(21)(22).Microarrays provide a large amount of useful information, but are accompanied by inherent noise and systematic errors (23)(24)(25)(26). No microarray experiment is free from variation introduced during sample preparation, hybridization, washing and scanning (24,27,28). Spotted arrays are burdened with technical defects that occur during their printing; these defects manifest as differences in spot size and shape and/or shifts of spots, rows or whole print-tips (24,28). In two-color assays, additional bias is introduced by uneven dye incorporation and by differences in the signal dynamic range and the sensitivity of dyes to photobleaching (23,24,29). Therefore, the major challenge in microarray analysis is data pr...