We introduce PubMedQA, a novel biomedical question answering (QA) dataset collected from PubMed abstracts. The task of Pub-MedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts. PubMedQA has 1k expert-annotated, 61.2k unlabeled and 211.3k artificially generated QA instances. Each PubMedQA instance is composed of (1) a question which is either an existing research article title or derived from one, (2) a context which is the corresponding abstract without its conclusion, (3) a long answer, which is the conclusion of the abstract and, presumably, answers the research question, and (4) a yes/no/maybe answer which summarizes the conclusion. Pub-MedQA is the first QA dataset where reasoning over biomedical research texts, especially their quantitative contents, is required to answer the questions. Our best performing model, multi-phase fine-tuning of BioBERT with long answer bag-of-word statistics as additional supervision, achieves 68.1% accuracy, compared to single human performance of 78.0% accuracy and majority-baseline of 55.2% accuracy, leaving much room for improvement. PubMedQA is publicly available at https://pubmedqa.github.io.
Many problems in NLP require aggregating information from multiple mentions of the same entity which may be far apart in the text. Existing Recurrent Neural Network (RNN) layers are biased towards short-term dependencies and hence not suited to such tasks. We present a recurrent layer which is instead biased towards coreferent dependencies. The layer uses coreference annotations extracted from an external system to connect entity mentions belonging to the same cluster. Incorporating this layer into a state-of-the-art reading comprehension model improves performance on three datasets -Wikihop, LAMBADA and the bAbi AI tasks -with large gains when training data is scarce.
Contextualized word embeddings derived from pre-trained language models (LMs) show significant improvements on downstream NLP tasks. Pre-training on domain-specific corpora, such as biomedical articles, further improves their performance. In this paper, we conduct probing experiments to determine what additional information is carried intrinsically by the in-domain trained contextualized embeddings. For this we use the pre-trained LMs as fixed feature extractors and restrict the downstream task models to not have additional sequence modeling layers. We compare BERT (Devlin et al., 2018), ELMo (Peters et al., 2018a), BioBERT (Lee et al., 2019) and BioELMo, a biomedical version of ELMo trained on 10M PubMed abstracts. Surprisingly, while fine-tuned BioBERT is better than BioELMo in biomedical NER and NLI tasks, as a fixed feature extractor BioELMo outperforms BioBERT in our probing tasks. We use visualization and nearest neighbor analysis to show that better encoding of entity-type and relational information leads to this superiority.
The addition of chitosan to silicate (Laponite) cross-linked poly(ethylene oxide) (PEO) is used for tuning nanocomposite material properties and tailoring cellular adhesion and bioactivity. By combining the characteristics of chitosan (which promotes cell adhesion and growth, antimicrobial) with properties of PEO (prevents protein and cell adhesion) and those of Laponite (bioactive), the resulting material properties can be used to tune cellular adhesion and control biomineralization. Here, we present the hydration, dissolution, degradation, and mechanical properties of multiphase bio-nanocomposites and relate these to the cell growth of MC3T3-E1 mouse preosteoblast cells. We find that the structural integrity of these bio-nanocomposites is improved by the addition of chitosan, but the release of entrapped proteins is suppressed. Overall, this study shows how chitosan can be used to tune properties in Laponite cross-linked PEO for creating bioactive scaffolds to be considered for bone repair.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.