Background Patients and families need to be provided with trusted information more than ever with the abundance of online information. Several organizations aim to build databases that can be searched based on the needs of target groups. One such group is individuals with neurodevelopmental disorders (NDDs) and their families. NDDs affect up to 18% of the population and have major social and economic impacts. The current limitations in communicating information for individuals with NDDs include the absence of shared terminology and the lack of efficient labeling processes for web resources. Because of these limitations, health professionals, support groups, and families are unable to share, combine, and access resources. Objective We aimed to develop a natural language–based pipeline to label resources by leveraging standard and free-text vocabularies obtained through text analysis, and then represent those resources as a weighted knowledge graph. Methods Using a combination of experts and service/organization databases, we created a data set of web resources for NDDs. Text from these websites was scraped and collected into a corpus of textual data on NDDs. This corpus was used to construct a knowledge graph suitable for use by both experts and nonexperts. Named entity recognition, topic modeling, document classification, and location detection were used to extract knowledge from the corpus. Results We developed a resource annotation pipeline using diverse natural language processing algorithms to annotate web resources and stored them in a structured knowledge graph. The graph contained 78,181 annotations obtained from the combination of standard terminologies and a free-text vocabulary obtained using topic modeling. An application of the constructed knowledge graph is a resource search interface using the ordered weighted averaging operator to rank resources based on a user query. Conclusions We developed an automated labeling pipeline for web resources on NDDs. This work showcases how artificial intelligence–based methods, such as natural language processing and knowledge graphs for information representation, can enhance knowledge extraction and mobilization, and could be used in other fields of medicine.
BACKGROUND Providing patients and families with trusted information is needed more than ever with the abundance of online information. Several organizations aim to build databases which can be searched based on needs by target groups. One such group is individuals with neurodevelopmental disabilities (NDD) and their families. NDDs affect up to 18% of the population and have major social and economic impacts. Current limitations in communicating information for individuals with NDDs include the absence of shared terminology and lack of efficient labeling processes for web resources. This leads to an inability for health professionals, support groups and families to share, combine and access resources. OBJECTIVE We aim to develop a natural language-based pipeline to label resources by leveraging standard vocabularies and free-text vocabulary obtained through text analysis and then representing those resources as a weighted knowledge graph. METHODS Using a combination of experience-experts and service/organization databases, we created a dataset of web resources for NDD. Text from these websites is scraped and used collected into a corpus of textual data on neurodevelopmental disorders. This corpus is used to construct a knowledge graph suitable for use by both experts and non-experts. Named entity recognition, topic modelling, document classification, and location detection are used to extract knowledge from the corpus. RESULTS We developed a resource annotation pipeline using diverse natural language processing algorithms to annotate web resources and store them in a structured knowledge graph containing 78,181 annotations obtained from the combination of standard terminologies and a free-text vocabulary obtained using topic modelling. An application of the constructed knowledge graph is illustrated: a resource search interface using the ordered weighted averaging operator to rank resources based on a user query. CONCLUSIONS This automated labeling pipeline for web resources on NDDs and use of knowledge graph will showcase how AI can enhance knowledge extraction and mobilization in NDD but also in other fields of medicine in the future.
Background Understanding how individuals think about a topic, known as the mental model, can significantly improve communication, especially in the medical domain where emotions and implications are high. Neurodevelopmental disorders (NDDs) represent a group of diagnoses, affecting up to 18% of the global population, involving differences in the development of cognitive or social functions. In this study, we focus on 2 NDDs, attention deficit hyperactivity disorder (ADHD) and autism spectrum disorder (ASD), which involve multiple symptoms and interventions requiring interactions between 2 important stakeholders: parents and health professionals. There is a gap in our understanding of differences between mental models for each stakeholder, making communication between stakeholders more difficult than it could be. Objective We aim to build knowledge graphs (KGs) from web-based information relevant to each stakeholder as proxies of mental models. These KGs will accelerate the identification of shared and divergent concerns between stakeholders. The developed KGs can help improve knowledge mobilization, communication, and care for individuals with ADHD and ASD. Methods We created 2 data sets by collecting the posts from web-based forums and PubMed abstracts related to ADHD and ASD. We utilized the Unified Medical Language System (UMLS) to detect biomedical concepts and applied Positive Pointwise Mutual Information followed by truncated Singular Value Decomposition to obtain corpus-based concept embeddings for each data set. Each data set is represented as a KG using a property graph model. Semantic relatedness between concepts is calculated to rank the relation strength of concepts and stored in the KG as relation weights. UMLS disorder-relevant semantic types are used to provide additional categorical information about each concept’s domain. Results The developed KGs contain concepts from both data sets, with node sizes representing the co-occurrence frequency of concepts and edge sizes representing relevance between concepts. ADHD- and ASD-related concepts from different semantic types shows diverse areas of concerns and complex needs of the conditions. KG identifies converging and diverging concepts between health professionals literature (PubMed) and parental concerns (web-based forums), which may correspond to the differences between mental models for each stakeholder. Conclusions We show for the first time that generating KGs from web-based data can capture the complex needs of families dealing with ADHD or ASD. Moreover, we showed points of convergence between families and health professionals’ KGs. Natural language processing–based KG provides access to a large sample size, which is often a limiting factor for traditional in-person mental model mapping. Our work offers a high throughput access to mental model maps, which could be used for further in-person validation, knowledge mobilization projects, and basis for communication about potential blind spots from stakeholders in interactions about NDDs. Future research will be needed to identify how concepts could interact together differently for each stakeholder.
BACKGROUND Understanding how individuals think about a topic can help to significantly improve communication. This is especially true when it comes to the medical domain where emotion and implications are high. Neurodevelopmental disorders (NDD) represent a group of diagnoses, affecting up to 18% of the population, involving differences in the development of cognitive or social functions and including attention deficit hyperactivity disorder (ADHD) as well as autism spectrum disorders (ASD). Both are complex conditions involving multiple symptoms and interventions where parents and health professionals interact. There is a gap in our global understanding of how each of those stakeholders differs in their preoccupations, making it difficult to address needs in knowledge mobilization. OBJECTIVE We aim to use Natural Language Processing techniques to build the Knowledge Graph from online information related to each stakeholder to help accelerate the identification of shared concerns and points of divergence between them. Ultimately, online information could be used to target knowledge mobilization and improve communication and care for individuals with ADHD and ASD. METHODS We created two datasets by collecting the posts from ASD and ADHD related online forums and PubMed abstracts and utilized the Unified Medical Language System (UMLS) to detect the biomedical concepts. Positive Pointwise mutual information (PPMI) followed by truncated Singular Value Decomposition (SVD) was applied to obtain the corpus-based UMLS concept embeddings for forums and PubMed. Property graph models were used for the Knowledge Graph representation of forums and PubMed. Semantic relatedness between concepts and the ASD condition or ADHD condition was calculated to rank the related concepts and stored as weight of edges. Additionally, UMLS semantic types were used to group concepts as well as to provide additional categorical information about concept’s domain. RESULTS Public forums on ADHD and ASD provide us with a wide range of concepts across multiple domains. Using Knowledge Graphs allows us to illustrate overlapping concepts between health professional literature (PubMed) and parental concerns (forums) with similar relevance scores, as the edge weight, and different co-occurrence frequency with the condition in each corpus, as the node size. Further, Knowledge Graphs also identify concepts with significantly different relevance scores between the stakeholders. CONCLUSIONS Understanding the complex needs of families dealing with ASD or ADHD plays an important role in better communication between health professionals and families. Online public data, which is a source of information from large numbers of individuals, can provide significant insights into a condition. Moreover, it allows us to capture diversity in preoccupations and identify most relevant concepts for each stakeholder. Future research will be needed to identify how overlapping concepts may interact differently between each other for each stakeholder.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.