Massive increases in electronically available text have spurred a variety of natural language processing methods to automatically identify relationships from text; however, existing annotated collections comprise only bioinformatics (gene-protein) or clinical informatics (treatment-disease) relationships. This paper introduces the Claim Framework that reflects how authors across biomedical spectrum communicate findings in empirical studies. The Framework captures different levels of evidence by differentiating between explicit and implicit claims, and by capturing under-specified claims such as correlations, comparisons, and observations. The results from 29 full-text articles show that authors report fewer than 7.84% of scientific claims in an abstract, thus revealing the urgent need for text mining systems to consider the full-text of an article rather than just the abstract. The results also show that authors typically report explicit claims (77.12%) rather than an observations (9.23%), correlations (5.39%), comparisons (5.11%) or implicit claims (2.7%). Informed by the initial manual annotations, we introduce an automated approach that uses syntax and semantics to identify explicit claims automatically and measure the degree to which each feature contributes to the overall precision and recall. Results show that a combination of semantics and syntax is required to achieve the best system performance.
Scientists engage in the discovery process more than any other user population, yet their day-to-day activities are often elusive. One activity that consumes much of a scientist's time is developing models that balance contradictory and redundant evidence. Driven by our desire to understand the information behaviors of this important user group, and the behaviors of scientific discovery in general, we conducted an observational study of academic research scientists as they resolved different experimental results reported in the biomedical literature. This article is the first of two that reports our findings. In this article, we introduce the Collaborative Information Synthesis (CIS) model that reflects the salient information behaviors that we observed. The CIS model emerges from a rich collection of qualitative data including interviews, electronic recordings of meetings, meeting minutes, e-mail communications, and extraction worksheets. Our findings suggest that scientists provide two information constructs: a hypothesis projection and context information. They also engage in four critical tasks: retrieval, extraction, verification, and analysis. The findings also suggest that science is not an individual but rather a collaborative activity and that scientists use the results of one analysis to inform new analyses. In Part 2, we compare and contrast existing information and cognitive models that have inadvertently reported synthesis, and then provide five recommendations that will enable designers to build information systems that support the important synthesis activity. IntroductionScientists engage in the discovery process more than any other user population, yet their day-to-day activities are often elusive. Even a scientist who actively makes discoveries in one discipline can find the activities conducted in a related field a mystery. Regardless of their specific discipline, the role of a good scientist is to develop a model of the world that accurately explains the available evidence. The development of accurate models often requires that a scientist resolve conflicting evidence.One activity that consumes much of a scientists' time is synthesis, "the dialectic combination of thesis and antithesis into a higher stage of truth" (Merriam-Webster's Collegiate Dictionary, 2004). This dictionary definition reflects the alternative viewpoints that often occur when multiple empirical studies explore the same phenomena. The synthesis activity results in an overall finding-a higher stage of truthwhich scientists achieve by resolving conflicting evidence. Thus, the synthesis activity requires accurately weighing a body of evidence that includes contradictions (when the study results differ) and redundancies (when study results concur) that are inevitable when multiple studies explore the same natural phenomena. In this article, we consider synthesis activities that involve evidence reported in existing literature rather than synthesis activities that require additional data collection through experimentation.New te...
Background:Simultaneous or sequential exposure to multiple environmental stressors can affect chemical toxicity. Cumulative risk assessments consider multiple stressors but it is impractical to test every chemical combination to which people are exposed. New methods are needed to prioritize chemical combinations based on their prevalence and possible health impacts.Objectives:We introduce an informatics approach that uses publicly available data to identify chemicals that co-occur in consumer products, which account for a significant proportion of overall chemical load.Methods:Fifty-five asthma-associated and endocrine disrupting chemicals (target chemicals) were selected. A database of 38,975 distinct consumer products and 32,231 distinct ingredient names was created from online sources, and PubChem and the Unified Medical Language System were used to resolve synonymous ingredient names. Synonymous ingredient names are different names for the same chemical (e.g., vitamin E and tocopherol).Results:Nearly one-third of the products (11,688 products, 30%) contained ≥ 1 target chemical and 5,229 products (13%) contained > 1. Of the 55 target chemicals, 31 (56%) appear in ≥ 1 product and 19 (35%) appear under more than one name. The most frequent three-way chemical combination (2-phenoxyethanol, methyl paraben, and ethyl paraben) appears in 1,059 products. Further work is needed to assess combined chemical exposures related to the use of multiple products.Conclusions:The informatics approach increased the number of products considered in a traditional analysis by two orders of magnitude, but missing/incomplete product labels can limit the effectiveness of this approach. Such an approach must resolve synonymy to ensure that chemicals of interest are not missed. Commonly occurring chemical combinations can be used to prioritize cumulative toxicology risk assessments.Citation:Gabb HA, Blake C. 2016. An informatics approach to evaluating combined chemical exposures from consumer products: a case study of asthma-associated chemicals and potential endocrine disruptors. Environ Health Perspect 124:1155–1165; http://dx.doi.org/10.1289/ehp.1510529
The trend in information retrieval systems is from document to sub-document retrieval, such as sentences in a summarization system and words or phrases in question-answering system. Despite this trend, systems continue to model language at a document level using the inverse document frequency (IDF). In this paper, we compare and contrast IDF with inverse sentence frequency (ISF) and inverse term frequency (ITF). A direct comparison reveals that all language models are highly correlated; however, the average ISF and ITF values are 5.5 and 10.4 higher than IDF. All language models appeared to follow a power law distribution with a slope coefficient of 1.6 for documents and 1.7 for sentences and terms. We conclude with an analysis of IDF stability with respect to random, journal, and section partitions of the 100,830 full-text scientific articles in our experimental corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.