We introduce a distance (similarity)-based mapping for the visualization of high-dimensional patterns and their relative relationships. The mapping preserves exactly the original distances between points with respect to any two reference patterns in a special two-dimensional coordinate system, the relative distance plane (RDP). As only a single calculation of a distance matrix is required, this method is computationally efficient, an essential requirement for any exploratory data analysis. The data visualization afforded by this representation permits a rapid assessment of class pattern distributions. In particular, we can determine with a simple statistical test whether both training and validation sets of a 2-class, high-dimensional dataset derive from the same class distributions. We can explore any dataset in detail by identifying the subset of reference pairs whose members belong to different classes, cycling through this subset, and for each pair, mapping the remaining patterns. These multiple viewpoints facilitate the identification and confirmation of outliers. We demonstrate the effectiveness of this method on several complex biomedical datasets. Because of its efficiency, effectiveness, and versatility, one may use the RDP representation as an initial, data mining exploration that precedes classification by some classifier. Once final enhancements to the RDP mapping software are completed, we plan to make it freely available to researchers.
In many biomedical research laboratories, data analysis and visualization algorithms are typical prototypes using an interpreted programming language. If performance becomes an issue, they are ported to C and integrated with interpreted systems, not fully utilizing object-oriented software development. This paper presents an overview of Scopira, an open source C++ framework suitable for biomedical data analysis and visualization. Scopira provides high-performance end-to-end application development features, in the form of an extensible C++ library. This library provides general programming utilities, numerical matrices and algorithms, parallelization facilities, and graphical user interface elements. A. B. DEMKO AND N. J. PIZZI FORTRAN. Although well suited for algorithm prototyping and ad hoc data visualization, interpreted languages are simply not suitable for application development. Conversely, C and FORTRAN, although efficient, lack basic and expected language features such as object orientation or basic memory management required for building large-scale applications. C++ was chosen to straddle the two extremes, and even though it has been somewhat overshadowed by newer languages such as Java or C#, it is still the only language with features such as generics and object orientation that compile into efficient machine code.Our motivation behind the design of Scopira was to satisfy the needs of three categories of users within the biomedical research community: developers, scientists/technologists, and data analysts. With the design, implementation, and validation of new biomedical data analysis software, developers typically need to incorporate legacy systems often written in interpreted languages. When this is coupled with the facts that, in a research environment, user requirements often change (sometimes radically) and that biomedical data are becoming ever more complex and voluminous, a development framework must be versatile, extensible, and exploit distributed, generic, and objectoriented programming paradigms. For the biomedical scientist or technologist, data analysis tools must be intuitive with responsive interfaces that operate both effectively and efficiently. Finally, the biomedical data analyst has requirements straddling those of the developer and the scientist. With an intermediate level of programming competence, they require a relatively intuitive development environment that can hide some of the low-level programming details, while at the same time allowing them to easily set up and conduct numerical experiments that involve parameter tuning and high-level looping/decision constructs. As a result of this motivation, the emphasis with Scopira [6] has been on high-performance, open-source development and the ability to easily integrate other C/C++ libraries used in the biomedical data analysis field by providing a common OOP API for applications. This library provides a large breadth of services that fall into the following four component categories:Scopira Tools provide extensive program...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.