Background Word embedding technologies, a set of language modeling and feature learning techniques in natural language processing (NLP), are now used in a wide range of applications. However, no formal evaluation and comparison have been made on the ability of each of the 3 current most famous unsupervised implementations (Word2Vec, GloVe, and FastText) to keep track of the semantic similarities existing between words, when trained on the same dataset. Objective The aim of this study was to compare embedding methods trained on a corpus of French health-related documents produced in a professional context. The best method will then help us develop a new semantic annotator. Methods Unsupervised embedding models have been trained on 641,279 documents originating from the Rouen University Hospital. These data are not structured and cover a wide range of documents produced in a clinical setting (discharge summary, procedure reports, and prescriptions). In total, 4 rated evaluation tasks were defined (cosine similarity, odd one, analogy-based operations, and human formal evaluation) and applied on each model, as well as embedding visualization. Results Word2Vec had the highest score on 3 out of 4 rated tasks (analogy-based operations, odd one similarity, and human validation), particularly regarding the skip-gram architecture. Conclusions Although this implementation had the best rate for semantic properties conservation, each model has its own qualities and defects, such as the training time, which is very short for GloVe, or morphological similarity conservation observed with FastText. Models and test sets produced by this study will be the first to be publicly available through a graphical interface to help advance the French biomedical research.
Background The huge amount of clinical, administrative, and demographic data recorded and maintained by hospitals can be consistently aggregated into health data warehouses with a uniform data model. In 2017, Rouen University Hospital (RUH) initiated the design of a semantic health data warehouse enabling both semantic description and retrieval of health information. Objective This study aimed to present a proof of concept of this semantic health data warehouse, based on the data of 250,000 patients from RUH, and to assess its ability to assist health professionals in prescreening eligible patients in a clinical trials context. Methods The semantic health data warehouse relies on 3 distinct semantic layers: (1) a terminology and ontology portal, (2) a semantic annotator, and (3) a semantic search engine and NoSQL (not only structured query language) layer to enhance data access performances. The system adopts an entity-centered vision that provides generic search capabilities able to express data requirements in terms of the whole set of interconnected conceptual entities that compose health information. Results We assessed the ability of the system to assist the search for 95 inclusion and exclusion criteria originating from 5 randomly chosen clinical trials from RUH. The system succeeded in fully automating 39% (29/74) of the criteria and was efficiently used as a prescreening tool for 73% (54/74) of them. Furthermore, the targeted sources of information and the search engine–related or data-related limitations that could explain the results for each criterion were also observed. Conclusions The entity-centered vision contrasts with the usual patient-centered vision adopted by existing systems. It enables more genericity in the information retrieval process. It also allows to fully exploit the semantic description of health information. Despite their semantic annotation, searching within clinical narratives remained the major challenge of the system. A finer annotation of the clinical texts and the addition of specific functionalities would significantly improve the results. The semantic aspect of the system combined with its generic entity-centered vision enables the processing of a large range of clinical questions. However, an important part of health information remains in clinical narratives, and we are currently investigating novel approaches (deep learning) to enhance the semantic annotation of those unstructured data.
We present two digital-based serious games aiming to engage students and the general public with battery sciences. The first one is a multiscale simulator in Mixed Reality of a battery-powered Electric Vehicle (EV) interacting with an Electrical Grid. One of the players drives the EV in a Virtual Reality (VR) environment where the EV can be recharged, and the other players control the electricity produced, distributed, consumed and stored by interacting with 3D-printed devices. The second serious game is a digital twin of a lithium ion battery manufacturing pilot line, which can be played from an Internet Browser or by using VR hardware. The key steps of the manufacturing process of cylindrical cells are represented in an interactive way. We discuss our games working principles, their implications for motivation, engagement and learning, and why they pave the way towards new ways of collaborative R&D in the battery field.
Background: Unstructured data from electronic health record is a gold mine. Doc’EDS is a pre-screening tool based on textual and semantic analysis. The system provides an easy-to-use interface to search documents in French. The aim of this study is to present the tools and to provide a formal evaluation of its semantic features. Material & Methods: Doc’EDS is a search tool built on the top of the clinical data warehouse developed in the Rouen University Hospital. This tool is a multilevel search engine combining structured and unstructured data. It also provides basic analytics features and semantic utilities. A formal evaluation has been conducted to measure the implemented Natural Language Processing algorithms. Results: About 17,3 million of narrative documents are contained in this CDW. The formal evaluation has been conducted over 5,000 clinical concepts that were manually collected. Negation concepts detection F-measure was 0.89, hypothesis concept detection F-measure was 0.57. Conclusion: We hereby present Doc’EDS, a semantic search tool which deals with language subtleties to enhance an advanced full text search engine dedicated to French health documents. This tool is currently used on a daily basis to help researchers identifying patients thanks to unstructured data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.