PurposeThe purpose of this paper is to explore query expansion via conceptual distance in thesaurus indexed collectionsDesign/methodology/approachAn extract of the National Museum of Science and Industry's collections database, indexed with the Getty Art and Architecture Thesaurus (AAT), was the dataset for the research. The system architecture and algorithms for semantic closeness and the matching function are outlined. Standalone and web interfaces are described and formative qualitative user studies are discussed. One user session is discussed in detail, together with a scenario based on a related public inquiry. Findings are set in context of the literature on thesaurus‐based query expansion. This paper discusses the potential of query expansion techniques using the semantic relationships in a faceted thesaurus.FindingsThesaurus‐assisted retrieval systems have potential for multi‐concept descriptors, permitting very precise queries and indexing. However, indexer and searcher may differ in terminology judgments and there may not be any exactly matching results. The integration of semantic closeness in the matching function permits ranked results for multi‐concept queries in thesaurus‐indexed applications. An in‐memory representation of the thesaurus semantic network allows a combination of automatic and interactive control of expansion and control of expansion on individual query terms.Originality/valueThe application of semantic expansion to browsing may be useful in interface options where thesaurus structure is hidden.
when interacting with a search system augmented with a thesaurus. A basic search scenario illustrates this process through the model. Graphical and textual depictions of the model are complemented by a concise matrix representation for evaluation purposes. Potential problems at different stages of the search process are discussed, together with possibilities for system developers. The aim is to set out a framework of processes, decisions, and risks involved in thesaurus-based search, within which system developers can consider potential avenues for support.
There are many advantages for Digital Libraries in indexing with classifications or thesauri, but some current disincentive in the lack of flexible retrieval tools that deal with compound descriptors. This paper discusses a matching function for compound descriptors, or multi-concept subject headings, that does not rely on exact matching but incorporates term expansion via thesaurus semantic relationships to produce ranked results that take account of missing and partially matching terms. The matching function is based on a measure of semantic closeness between terms, which has the potential to help with recall problems. The work reported is part of the ongoing FACET project in collaboration with the National Museum of Science and Industry and its collections database. The architecture of the prototype system and its interface are outlined. The matching problem for compound descriptors is reviewed and the FACET implementation described. Results are discussed from scenarios using the faceted Getty Art and Architecture Thesaurus. We argue that automatic traversal of thesaurus relationships can augment the user's browsing possibilities. The techniques can be applied both to unstructured multi-concept subject headings and potentially to more syntactically structured strings. The notion of a focus term is used by the matching function to model AAT modified descriptors (noun phrases). The relevance of the approach to precoordinated indexing and matching faceted strings is discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.