•
PurposeThe goal of the research is to explore whether the use of higher-level semantic features can help us build better SOM representation as measured from a human-centered perspective. We also explore an automatic evaluation method that utilizes human expert knowledge encapsulated in the structure of traditional textbooks to determine map representation quality.• Design/methodology/approach Two types of document representations involving semantic features have been explored: 1) using only one individual semantic feature, and 2) combining a semantic feature with keywords. A set of experiments were conducted to investigate the impact of semantic representation quality on the map. The experiments were performed on data collections that included single book corpus and multiple book corpus.•
FindingsCombining keywords with certain semantic features achieves significant improvement of representation quality over the keywords-only approach in a relatively homogeneous single book corpus. Changing the ratios of the combined different features also affects the performance.While semantic mixtures can work well in single book corpus, they lose their increased effectiveness over keywords in the multiple-book corpus. This raises a concern about whether the semantic representations in the multiple book corpus are homogeneous and coherent enough to apply semantic features. The terminology issue among textbooks negatively impacts the ability of the SOM to generate a high quality map for heterogeneous collections.•
Originality/valueWe explored the use of higher-level document representation features for the development of better-quality SOM. In addition, we piloted a specific method for evaluating the SOM quality based on the organization of information content in the map.
IntroductionInformation maps (Kohonen, 1982) are becoming popular as interfaces to view and access large data collections such as digital libraries (DL). Unlike traditional search-based access, which provides selective and fragmented access to information, information maps allow users to comprehend large collections, to focus on the most interesting parts, and to explore specific resources in the context of their relationships to other resources and the library. Properties of information maps make them an excellent complement to search and browsing interfaces for DL. A recent study comparing student use of search, browsing and information map interfaces in an educational DL found that information maps were the method most preferred by students for accessing information; they were four times more popular than traditional search-based access methods. Several kinds of maps have been explored as interfaces to access large collections of resources (Börner and Chen, 2002, Yang et al., 2003, Dang et al., 2009, Perugini et al., 2004. Among these approaches, A self-organizing Map (SOM) (Kohonen, 1982) is frequently considered to be the most promising mapping approach for large document collections. While being most popular as a tool for two-dimensional cl...