The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/
Fog computing, as a new paradigm, has many characteristics that are different from cloud computing. Due to the resources being limited, fog nodes/MEC hosts are vulnerable to cyberattacks. Lightweight intrusion detection system (IDS) is a key technique to solve the problem. Because extreme learning machine (ELM) has the characteristics of fast training speed and good generalization ability, we present a new lightweight IDS called sample selected extreme learning machine (SS-ELM). The reason why we propose "sample selected extreme learning machine" is that fog nodes/MEC hosts do not have the ability to store extremely large amounts of training data sets. Accordingly, they are stored, computed, and sampled by the cloud servers. Then, the selected sample is given to the fog nodes/MEC hosts for training. This design can bring down the training time and increase the detection accuracy. Experimental simulation verifies that SS-ELM performs well in intrusion detection in terms of accuracy, training time, and the receiver operating characteristic (ROC) value.
In this paper, co-word analysis is used to analyze the evolvement in stem cell field. Articles in the stem cell journals are downloaded from PubMed for analysis. Terms selection is one of the most important steps in co-word analysis, so the useless and the general subject headings are removed firstly, and then the major subject headings and minor subject headings are weighted respectively. Then, improved information entropy is exploited to select the subject headings with the experts consulting. Hierarchical cluster analysis is used to cluster the subject headings and the strategic diagram is formed to analyze the evolutionary trends in the stem cell field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.