An alternative form to multidimensional projections for the visual analysis of data represented in multidimensional spaces is the deployment of similarity trees, such as Neighbor Joining trees. They organize data objects on the visual plane emphasizing their levels of similarity with high capability of detecting and separating groups and subgroups of objects. Besides this similarity-based hierarchical data organization, some of their advantages include the ability to decrease point clutter; high precision; and a consistent view of the data set during focusing, offering a very intuitive way to view the general structure of the data set as well as to drill down to groups and subgroups of interest. Disadvantages of similarity trees based on neighbor joining strategies include their computational cost and the presence of virtual nodes that utilize too much of the visual space. This paper presents a highly improved version of the similarity tree technique. The improvements in the technique are given by two procedures. The first is a strategy that replaces virtual nodes by promoting real leaf nodes to their place, saving large portions of space in the display and maintaining the expressiveness and precision of the technique. The second improvement is an implementation that significantly accelerates the algorithm, impacting its use for larger data sets. We also illustrate the applicability of the technique in visual data mining, showing its advantages to support visual classification of data sets, with special attention to the case of image classification. We demonstrate the capabilities of the tree for analysis and iterative manipulation and employ those capabilities to support evolving to a satisfactory data organization and classification.
Abstract-Automatic data classification is a computationally intensive task that presents variable precision and is considerably sensitive to the classifier configuration and to data representation, particularly for evolving data sets. Some of these issues can best be handled by methods that support users' control over the classification steps. In this paper, we propose a visual data classification methodology that supports users in tasks related to categorization such as training set selection; model creation, application and verification; and classifier tuning. The approach is then well suited for incremental classification, present in many applications with evolving data sets. Data set visualization is accomplished by means of point placement strategies, and we exemplify the method through multidimensional projections and Neighbor Joining trees. The same methodology can be employed by a user who wishes to create his or her own ground truth (or perspective) from a previously unlabeled data set. We validate the methodology through its application to categorization scenarios of image and text data sets, involving the creation, application, verification, and adjustment of classification models.
Abstract-Similarity-based layouts generated by multidimensional projections or other dimension reduction techniques are commonly used to visualize high-dimensional data. Many projection techniques have been recently proposed addressing different objectives and application domains. Nonetheless, very little is known about the effectiveness of the generated layouts from a user's perspective, how distinct layouts from the same data compare regarding the typical visualization tasks they support, or how domain-specific issues affect the outcome of the techniques. Learning more about projection usage is an important step towards both consolidating their role in high-dimensional data analysis and taking informed decisions when choosing techniques. This work provides a contribution towards this goal. We describe the results of an investigation on the performance of layouts generated by projection techniques as perceived by their users. We conducted a controlled user study to test against the following hypotheses: (1) projection performance is task-dependent; (2) certain projections perform better on certain types of tasks; (3) projection performance depends on the nature of the data; and (4) subjects prefer projections with good segregation capability. We generated layouts of high-dimensional data with five techniques representative of different projection approaches. As application domains we investigated image and document data. We identified eight typical tasks, three of them related to segregation capability of the projection, three related to projection precision, and two related to incurred visual cluttering. Answers to questions were compared for correctness against 'ground truth' computed directly from the data. We also looked at subject confidence and task completion times. Statistical analysis of the collected data resulted in Hypotheses 1 and 3 being confirmed, Hypothesis 2 being confirmed partially and Hypotheses 4 could not be confirmed. We discuss our findings in comparison with some numerical measures of projection layout quality. Our results offer interesting insight on the use of projection layouts in data visualization tasks and provide a departing point for further systematic investigations.
Dimensionality reduction is employed for visual data analysis as a way to obtaining reduced spaces for high dimensional data or to mapping data directly into 2D or 3D spaces. Although techniques have evolved to improve data segregation on reduced or visual spaces, they have limited capabilities for adjusting the results according to user's knowledge. In this paper, we propose a novel approach to handling both dimensionality reduction and visualization of high dimensional data, taking into account user's input. It employs Partial Least Squares (PLS), a statistical tool to perform retrieval of latent spaces focusing on the discriminability of the data. The method employs a training set for building a highly precise model that can then be applied to a much larger data set very effectively. The reduced data set can be exhibited using various existing visualization techniques. The training data is important to code user's knowledge into the loop. However, this work also devises a strategy for calculating PLS reduced spaces when no training data is available. The approach produces increasingly precise visual mappings as the user feeds back his or her knowledge and is capable of working with small and unbalanced training sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.