Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.
The number of markers measured in both flow and mass cytometry keeps increasing steadily. Although this provides a wealth of information, it becomes infeasible to analyze these datasets manually. When using 2D scatter plots, the number of possible plots increases exponentially with the number of markers and therefore, relevant information that is present in the data might be missed. In this article, we introduce a new visualization technique, called FlowSOM, which analyzes Flow or mass cytometry data using a Self-Organizing Map. Using a two-level clustering and star charts, our algorithm helps to obtain a clear overview of how all markers are behaving on all cells, and to detect subsets that might be missed otherwise. R code is available at https://github.com/SofieVG/FlowSOM and will be made available at Bioconductor. Key termsKey terms: polychromatic flow cytometry; mass cytometry; exploratory data analysis; visualization method; self-organizing map; bioinformatics AT the moment, many flow cytometry experiments are performed with seven colors or more. For mass cytometry experiments, this number is even higher. Analyzing these high-dimensional datasets is not always easy, as traditional gating relies on selection of defined cell populations. It is difficult and time-consuming to keep an overview of how markers are behaving for all these defined cell types. In practice, not all combinations of markers are examined and therefore, valuable information can remain unexamined and unnoticed.A solution to this problem is the use of advanced visualization techniques in which more information is provided than in the traditionally used scatter plots.Examples of new visualization techniques developed specifically for this purpose are Visne (1) and SPADE (2). Whereas Visne will plot all cells in a transformed twodimensional space, SPADE will cluster cells in many groups and visualize the results in a minimal spanning tree. SPADE is, however, quite slow, especially for larger datasets. For both Visne and SPADE, many plots need to be investigated to get a correct annotation of cluster boundaries and cell types.Completely automatic clustering algorithms like flowMeans, SWIFT and others (3-10) are another solution that might be considered. Yet, even when using these algorithms, it is necessary to visualize the results clearly to interpret them correctly. The problems we described before are intrinsic to using scatter plots, so the same problems remain as with traditional gating if these automatic techniques are not combined with new visualization algorithms.A self-organizing map (SOM) is an unsupervised technique for clustering and dimensionality reduction, in which a discretized representation of the input space is trained. This technique has already been used on flow cytometry data by the Flow-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.