Daniel A. Keim scite author profile

In recent years, the effect of the curse of high dimensionality has been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from a efficiency and/or effectiveness perspective. Recent research results show that in high dimensional space, the concept of proximity, distance or nearest neighbor may not even be qualitatively meaningful. In this paper, we view the dimensionality curse from the point of view of the distance metrics which are used to measure the similarity between objects. We specifically examine the behavior of the commonly used L k norm and show that the problem of meaningfulness in high dimensionality is sensitive to the value of k. For example, this means that the Manhattan distance metric (L 1 norm) is consistently more preferable than the Euclidean distance metric (L2 norm) for high dimensional data mining applications. Using the intuition derived from our analysis, we introduce and examine a natural extension of the L k norm to fractional distance metrics. We show that the fractional distance metric provides more meaningful results both from the theoretical and empirical perspective. The results show that fractional distance metrics can significantly improve the effectiveness of standard clustering algorithms such as the k-means algorithm.

show abstract

Information visualization and visual data mining

Keim

2002

IEEE Trans. Visual. Comput. Graphics

1,302

748

View full text Add to dashboard Cite

Visual Analytics: Definition, Process, and Challenges

et al.

View full text Add to dashboard Cite

A General Approach to Clustering in Large Databases with Noise

Hinneburg¹,

Keim

2003

Knowledge and Information Systems

507

538

View full text Add to dashboard Cite

Searching in high-dimensional spaces

Böhm¹,

Berchtold²,

2001

View full text Add to dashboard Cite

During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography, or molecular biology. An important research issue in the field of multimedia databases is the content based retrieval of similar multimedia objects such as images, text, and videos. However, in contrast to searching data in a relational database, a content based retrieval requires the search of similar objects as a basic functionality of the database system. Most of the approaches addressing similarity search use a so-called feature transformation which transforms important properties of the multimedia objects into high-dimensional points (feature vectors). Thus, the similarity search is transformed into a search of points in the feature space which are close to a given query point in the high-dimensional feature space. Query Processing in high-dimensional spaces has therefore been a very active research area over the last few years. A number of new index structures and algorithms have been proposed. It has been shown that the new index structures considerably improve the performance in querying large multimedia databases. Based on recent tutorials [BK98, BK 00], in this survey we provide an overview of the current state-of-the-art in querying multimedia databases, describing the index structures and algorithms for an efficient query processing in high-dimensional spaces. We identify the problems of processing queries in high-dimensional space, and we provide an overview of the proposed approaches to overcome these problems. Indexing Multimedia DatabasesMultimedia databases are of high importance in many application areas such as geography, CAD, medicine, or molecular biology. Depending on the application, the multimedia databases need to have different properties and need to support different types of queries. In contrast to traditional database applications, where point, range, and partial match queries are very important, multimedia databases require a search for all objects in the database which are similar (or complementary) to a given search object. In the following, we describe the notion of similarity queries and the feature-based approach to process those queries in multimedia databases in more detail.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daniel A. Keim

On the Surprising Behavior of Distance Metrics in High Dimensional Space

Information visualization and visual data mining

Visual Analytics: Definition, Process, and Challenges

A General Approach to Clustering in Large Databases with Noise

Searching in high-dimensional spaces

Contact Info

Product

Resources

About