Cytometry by time-of-flight (CyTOF) has emerged as a high-throughput single cell technology able to provide large samples of protein readouts. Already, there exists a large pool of advanced high-dimensional analysis algorithms that explore the observed heterogeneous distributions making intriguing biological inferences. A fact largely overlooked by these methods, however, is the effect of the established data preprocessing pipeline to the distributions of the measured quantities. In this article, we focus on randomization, a transformation used for improving data visualization, which can negatively affect multivariate data analysis methods such as dimensionality reduction, clustering, and network reconstruction algorithms. Our results indicate that randomization should be used only for visualization purposes, but not in conjunction with high-dimensional analytical tools.Single cell cytometry allows the detection of cell components in a high-throughput fashion. One of its latest versions is Cytometry by Time Of Flight (CyTOF) (1). The advantage of CyTOF compared to traditional flow cytometry is that high atomic weight metal reporters typically not found in a biological sample are employed for cell tagging, allowing the quantification of more than 40 cell parameters simultaneously. Such large number of parameters enables this technology to provide multivariate data sets with emerging properties that are well suited to advanced computational analysis (2,3). For example, unsupervised learning techniques like clustering and dimensionality reduction are typically used for cell phenotyping (4,5). In combination with statistical tests or supervised learning approaches, these methods are also employed for associating phenotypes or clinical outcomes to relevant cell subsets or protein markers (6,7). Clustering and dimensionality reduction are also commonly employed to visualize patterns in the data, marker relationships in the high-dimensional space or the phenotypic progression trajectories of cell subsets (8). Very recently, network-based methods have also been applied on CyTOF data, for automatic cell population identification (9) and the prediction of protein signaling networks using automated causal discovery algorithms (10).Despite the accelerated development of CyTOF-dedicated analysis methods, there is still no well-established data preprocessing consensus. This is mainly because the preceding standardization of experimental procedures is still in its infancy (11,12). In general, there are at least three distinct sources of technical variation in CyTOF. The first is the drop in the instrument sensitivity and the change in oxidation rate over long sample running times that causes signal fluctuations (13). Second,