Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
In this thesis, we present models for semantic search: Information Retrieval (IR) models that elicit the meaning behind the words found in documents and queries rather than simply matching keywords. This is achieved by the integration of structured domain knowledge and data-driven information retrieval methods. The research is set within health informatics to tackle the unique challenges within this domain; specifically, how to bridge the 'semantic gap'; that is, how to overcome the mismatch between raw medical data and the way human beings interpret it. Bridging the semantic gap involves addressing two issues: semantics; that is, aligning the meaning or concepts behind words found in documents and queries; and leveraging inference, which utilises semantics to infer relevant information. Three semantic search models -- all utilising concept-based rather than term-based representations---are developed; these include: the Bag-of-concepts model, which utilises concepts from the SNOMED CT medical ontology as its underlying representation; the Graph-based Concept Weighting model, which captures concept dependence and importance in a novel weighting function; and the core contribution of the thesis, the Graph INference model (GIN): a unified theoretical model of semantic search as inference, achieved by the integration of structured domain knowledge (ontologies) and statistical, information retrieval methods. It is the GIN that provides the necessary mechanism for inference to bridge the semantic gap. All three models are empirically evaluated using clinical queries and a real-world collection of clinical records taken from the TREC Medical Records Track (MedTrack). Our evaluation shows that the use of concept-based representations in the Bag-of-concepts model leads to improved retrieval effectiveness. When concepts are combined within the Graph-based ConceptWeighting model, further improvements are possible. The evaluation of GIN highlighted that its inference mechanism is suited to hard queries -- those that perform poorly on a term-based system. In-depth analysis also revealed that the GIN returned many new documents not retrieved by term-based systems and therefore never evaluated for relevance as part of the TREC MedTrack. This highlights that using current IR test collections, where semantic search systems did not contribute to the pool, may underestimate the effectiveness of semantic search systems. This work represents a significant step forward in the integration of structured domain knowledge and data-driven information retrieval methods. Furthermore, the thesis provides an understanding of inference -- when and how it should be applied for effective semantic search. It shows that queries with certain characteristics benefit from inference, while others do not. The detailed investigation into the evaluation of semantic search systems shows how current IR test collections may underestimate effectiveness of such systems and new techniques for evaluation are suggested. The Graph Inference model, although developed within the medical domain, is generally defined and has implications in other areas, including web search, where an emerging research trend is to utilise structured knowledge resources for more effective semantic search.
In this thesis, we present models for semantic search: Information Retrieval (IR) models that elicit the meaning behind the words found in documents and queries rather than simply matching keywords. This is achieved by the integration of structured domain knowledge and data-driven information retrieval methods. The research is set within health informatics to tackle the unique challenges within this domain; specifically, how to bridge the 'semantic gap'; that is, how to overcome the mismatch between raw medical data and the way human beings interpret it. Bridging the semantic gap involves addressing two issues: semantics; that is, aligning the meaning or concepts behind words found in documents and queries; and leveraging inference, which utilises semantics to infer relevant information. Three semantic search models -- all utilising concept-based rather than term-based representations---are developed; these include: the Bag-of-concepts model, which utilises concepts from the SNOMED CT medical ontology as its underlying representation; the Graph-based Concept Weighting model, which captures concept dependence and importance in a novel weighting function; and the core contribution of the thesis, the Graph INference model (GIN): a unified theoretical model of semantic search as inference, achieved by the integration of structured domain knowledge (ontologies) and statistical, information retrieval methods. It is the GIN that provides the necessary mechanism for inference to bridge the semantic gap. All three models are empirically evaluated using clinical queries and a real-world collection of clinical records taken from the TREC Medical Records Track (MedTrack). Our evaluation shows that the use of concept-based representations in the Bag-of-concepts model leads to improved retrieval effectiveness. When concepts are combined within the Graph-based ConceptWeighting model, further improvements are possible. The evaluation of GIN highlighted that its inference mechanism is suited to hard queries -- those that perform poorly on a term-based system. In-depth analysis also revealed that the GIN returned many new documents not retrieved by term-based systems and therefore never evaluated for relevance as part of the TREC MedTrack. This highlights that using current IR test collections, where semantic search systems did not contribute to the pool, may underestimate the effectiveness of semantic search systems. This work represents a significant step forward in the integration of structured domain knowledge and data-driven information retrieval methods. Furthermore, the thesis provides an understanding of inference -- when and how it should be applied for effective semantic search. It shows that queries with certain characteristics benefit from inference, while others do not. The detailed investigation into the evaluation of semantic search systems shows how current IR test collections may underestimate effectiveness of such systems and new techniques for evaluation are suggested. The Graph Inference model, although developed within the medical domain, is generally defined and has implications in other areas, including web search, where an emerging research trend is to utilise structured knowledge resources for more effective semantic search.
Background The Unified Medical Language System (UMLS) has been a critical tool in biomedical and health informatics, and the year 2021 marks its 30th anniversary. The UMLS brings together many broadly used vocabularies and standards in the biomedical field to facilitate interoperability among different computer systems and applications. Objective Despite its longevity, there is no comprehensive publication analysis of the use of the UMLS. Thus, this review and analysis is conducted to provide an overview of the UMLS and its use in English-language peer-reviewed publications, with the objective of providing a comprehensive understanding of how the UMLS has been used in English-language peer-reviewed publications over the last 30 years. Methods PubMed, ACM Digital Library, and the Nursing & Allied Health Database were used to search for studies. The primary search strategy was as follows: UMLS was used as a Medical Subject Headings term or a keyword or appeared in the title or abstract. Only English-language publications were considered. The publications were screened first, then coded and categorized iteratively, following the grounded theory. The review process followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Results A total of 943 publications were included in the final analysis. Moreover, 32 publications were categorized into 2 categories; hence the total number of publications before duplicates are removed is 975. After analysis and categorization of the publications, UMLS was found to be used in the following emerging themes or areas (the number of publications and their respective percentages are given in parentheses): natural language processing (230/975, 23.6%), information retrieval (125/975, 12.8%), terminology study (90/975, 9.2%), ontology and modeling (80/975, 8.2%), medical subdomains (76/975, 7.8%), other language studies (53/975, 5.4%), artificial intelligence tools and applications (46/975, 4.7%), patient care (35/975, 3.6%), data mining and knowledge discovery (25/975, 2.6%), medical education (20/975, 2.1%), degree-related theses (13/975, 1.3%), digital library (5/975, 0.5%), and the UMLS itself (150/975, 15.4%), as well as the UMLS for other purposes (27/975, 2.8%). Conclusions The UMLS has been used successfully in patient care, medical education, digital libraries, and software development, as originally planned, as well as in degree-related theses, the building of artificial intelligence tools, data mining and knowledge discovery, foundational work in methodology, and middle layers that may lead to advanced products. Natural language processing, the UMLS itself, and information retrieval are the 3 most common themes that emerged among the included publications. The results, although largely related to academia, demonstrate that UMLS achieves its intended uses successfully, in addition to achieving uses broadly beyond its original intentions.
BACKGROUND Background: The unified medical language system (UMLS) has been a critical tool in biomedical and health informatics, and the year 2020 marks the 30th anniversary of UMLS. Despite its longevity, there is no systematic review on UMLS, in general. Thus, this systematic review was conducted to provide an overview of UMLS and its usage in English-language publications in the last 30 years. OBJECTIVE Objectives: The objective is twofold: to provide a comprehensive and systematic picture of the themes, their subtopics, and the publications under each category and to document systematic evidence of UMLS and how it has been used in English-language publications in the last 30 years. METHODS Methods: PubMed, ACM Digital Library, and Nursing & Allied Health Database were used to search for literature. The primary literature search strategy was as follows: UMLS was used as a MeSH term or a keyword or appeared in the title or abstract. Only English-language publications were considered. RESULTS Results: A total of 943 publications were included in the final analysis. After analysis and categorization of publications, UMLS was found to be used in the following emerging themes: natural language processing (NLP) (230 publications), information retrieval (125 publications), terminology study (90 publications), ontology and modeling (80 publications), medical subdomains (76 publications), other language studies (53 publications), artificial intelligence tools and applications (46 publications), patient care (35 publications), data mining and knowledge discovery (25 publications), medical education (20 publications), degree-related theses (13 publications), and digital library (5 publications) as well as UMLS itself (150 publications). CONCLUSIONS Conclusions: UMLS has been used and published successfully in patient care, medical education, digital libraries, and software development, as originally planned, as well as in degree-related theses, building artificial intelligence tools, data mining and knowledge discovery and more foundational work in methodology and middle layers that may lead to advanced products. NLP, UMLS itself, and information retrieval are the three themes with the most publications. The review provides systematic evidence of UMLS in English-language peer-reviewed publications in the last 30 years.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.