Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.
Detecting negative and speculative information is essential in most biomedical text-mining tasks where these language forms are used to express impressions, hypotheses, or explanations of experimental results. Our research is focused on developing a system based on machine-learning techniques that identifies negation and speculation signals and their scope in clinical texts. The proposed system works in two consecutive phases: first, a classifier decides whether each token in a sentence is a negation/speculation signal or not. Then another classifier determines, at sentence level, the tokens which are affected by the signals previously identified. The system was trained and evaluated on the clinical texts of the BioScope corpus, a freely available resource consisting of medical and biological texts: fulllength articles, scientific abstracts, and clinical reports. The results obtained by our system were compared with those of two different systems, one based on regular expressions and the other based on machine learning. Our system's results outperformed the results obtained by these two systems. In the signal detection task, the F-score value was 97.3% in negation and 94.9% in speculation. In the scope-finding task, a token was correctly classified if it had been properly identified as being inside or outside the scope of all the negation signals present in the sentence. Our proposal showed an F score of 93.2% in negation and 80.9% in speculation. Additionally, the percentage of correct scopes (those with all their tokens correctly classified) was evaluated obtaining F scores of 90.9% in negation and 71.9% in speculation.
Objective We explored two strategies for query expansion utilizing medical subject headings (MeSH) ontology to improve the effectiveness of medical image retrieval systems. In order to achieve greater effectiveness in the expansion, the search text was analyzed to identify which terms were most amenable to being expanded. Design To perform the expansions we utilized the hierarchical structure by which the MeSH descriptors are organized. Two strategies for selecting the terms to be expanded in each query were studied. The first consisted of identifying the medical concepts using the unified medical language system metathesaurus. In the second strategy the text of the query was divided into n-grams, resulting in sequences corresponding to MeSH descriptors.
In this paper we present ongoing work on annotating negation in Spanish clinical documents. A corpus of anamnesis and radiology reports has been annotated by two domain expert annotators with negation markers and negated events. The Dice coefficient for inter-annotator agreement is higher than 0.94 for negation markers and higher than 0.72 for negated events. The corpus will be publicly released when the annotation process is finished, constituting the first corpus annotated with negation for Spanish clinical reports available for the NLP community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.