A significant amount of information about drug-related safety issues such as adverse effects are published in medical case reports that can only be explored by human readers due to their unstructured nature. The work presented here aims at generating a systematically annotated corpus that can support the development and validation of methods for the automatic extraction of drug-related adverse effects from medical case reports. The documents are systematically double annotated in various rounds to ensure consistent annotations. The annotated documents are finally harmonized to generate representative consensus annotations. In order to demonstrate an example use case scenario, the corpus was employed to train and validate models for the classification of informative against the non-informative sentences. A Maximum Entropy classifier trained with simple features and evaluated by 10-fold cross-validation resulted in the F₁ score of 0.70 indicating a potential useful application of the corpus.
With this study, we provide a comprehensive reference dataset of detailed miRNA expression profiles from seven types of human peripheral blood cells (NK cells, B lymphocytes, cytotoxic T lymphocytes, T helper cells, monocytes, neutrophils and erythrocytes), serum, exosomes and whole blood. The peripheral blood cells from buffy coats were typed and sorted using FACS/MACS. The overall dataset was generated from 450 small RNA libraries using high-throughput sequencing. By employing a comprehensive bioinformatics and statistical analysis, we show that 3′ trimming modifications as well as composition of 3′ added non-templated nucleotides are distributed in a lineage-specific manner—the closer the hematopoietic progenitors are, the higher their similarities in sequence variation of the 3′ end. Furthermore, we define the blood cell-specific miRNA and isomiR expression patterns and identify novel cell type specific miRNA candidates. The study provides the most comprehensive contribution to date towards a complete miRNA catalogue of human peripheral blood, which can be used as a reference for future studies. The dataset has been deposited in GEO and also can be explored interactively following this link: http://134.245.63.235/ikmb-tools/bloodmiRs.
Motivation: Chemical compounds like small signal molecules or other biological active chemical substances are an important entity class in life science publications and patents. Several representations and nomenclatures for chemicals like SMILES, InChI, IUPAC or trivial names exist. Only SMILES and InChI names allow a direct structure search, but in biomedical texts trivial names and Iupac like names are used more frequent. While trivial names can be found with a dictionary-based approach and in such a way mapped to their corresponding structures, it is not possible to enumerate all IUPAC names. In this work, we present a new machine learning approach based on conditional random fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text as well as its evaluation and the conversion rate with available name-to-structure tools.Results: We present an IUPAC name recognizer with an F1 measure of 85.6% on a MEDLINE corpus. The evaluation of different CRF orders and offset conjunction orders demonstrates the importance of these parameters. An evaluation of hand-selected patent sections containing large enumerations and terms with mixed nomenclature shows a good performance on these cases (F1 measure 81.5%). Remaining recognition problems are to detect correct borders of the typically long terms, especially when occurring in parentheses or enumerations. We demonstrate the scalability of our implementation by providing results from a full MEDLINE run.Availability: We plan to publish the corpora, annotation guideline as well as the conditional random field model as a UIMA component.Contact:roman.klinger@scai.fraunhofer.de
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.