André Skupin scite author profile

BackgroundWe investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.MethodologyWe used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models – BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.ConclusionsPubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts.

show abstract

Spatialization Methods: A Cartographic Research Agenda for Non-geographic Information Visualization

Skupin¹,

Fabrikant²

2003

Cartography and Geographic Information Science

157

106

View full text Add to dashboard Cite

A cartographic approach to visualizing conference abstracts

Skupin

2002

IEEE Comput. Grap. Appl.

106

View full text Add to dashboard Cite

The world of geography: Visualizing a knowledge domain with cartographic means

Skupin

2004

Proc. Natl. Acad. Sci. U.S.A.

107

View full text Add to dashboard Cite

From an informed critique of existing methods to the development of original tools, cartographic engagement can provide a unique perspective on knowledge domain visualization. Along with a discussion of some principles underlying a cartographically informed visualization methodology, results of experiments involving several thousand conference abstracts will be sketched and their plausibility reflected on.

show abstract

Visualizing Demographic Trajectories with Self-Organizing Maps

Skupin

Hagelman

2005

Geoinformatica

View full text Add to dashboard Cite

In recent years, the proliferation of multi-temporal census data products and the increased capabilities of geospatial analysis and visualization techniques have encouraged longitudinal analyses of socioeconomic census data. Traditional cartographic methods for illustrating socioeconomic change tend to rely either on comparison of multiple temporal snapshots or on explicit representation of the magnitude of change occurring between different time periods. This paper proposes to add another perspective to the visualization of temporal change, by linking multi-temporal observations to a geometric configuration that is not based on geographic space, but on a spatialized representation of n-dimensional attribute space. The presented methodology aims at providing a cognitively plausible representation of changes occurring inside census areas by representing their attribute space trajectories as line features traversing a two-dimensional display space. First, the self-organizing map (SOM) method is used to transform n-dimensional data such that the resulting two-dimensional configuration can be represented with standard GIS data structures. Then, individual census observations are mapped onto the neural network and linked as temporal vertices to represent attribute space trajectories as directed graphs. This method is demonstrated for a data set containing 254 counties and 32 demographic variables. Various transformations and visual results are presented and discussed in the paper, from the visualization of individual component planes and trajectory clusters to the mapping of different attributes onto temporal trajectories.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

André Skupin

Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

Spatialization Methods: A Cartographic Research Agenda for Non-geographic Information Visualization

A cartographic approach to visualizing conference abstracts

The world of geography: Visualizing a knowledge domain with cartographic means

Visualizing Demographic Trajectories with Self-Organizing Maps

Contact Info

Product

Resources

About