A major obstacle
for reusing and integrating existing data is finding
the data that is most relevant in a given context. The primary metadata
resource is the scientific literature describing the experiments that
produced the data. To stimulate the development of natural language
processing methods for extracting this information from articles,
we have manually annotated 100 recent open access publications in
Analytical Chemistry as semantic graphs. We focused on articles mentioning
mass spectrometry in their experimental sections, as we are particularly
interested in the topic, which is also within the domain of several
ontologies and controlled vocabularies. The resulting gold standard
dataset is publicly available and directly applicable to validating
automated methods for retrieving this metadata from the literature.
In the process, we also made a number of observations on the structure
and description of experiments and open access publication in this
journal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.