We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets § . The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ questionanswer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping. § maximum representation of these templates comes from the i2b2 heart disease risk dataset
Chromatin accessibility is integral to the process by which transcription factors (TFs) read out cis-regulatory DNA sequences, but it is difficult to differentiate between TFs that drive accessibility and those that do not. Deep learning models that learn complex sequence rules provide an unprecedented opportunity to dissect this problem. Using zygotic genome activation in the Drosophila embryo as a model, we generated high-resolution TF binding and chromatin accessibility data, analyzed the data with interpretable deep learning, and performed genetic experiments for validation. We uncover a clear hierarchical relationship between the pioneer TF Zelda and the TFs involved in axis patterning. Zelda consistently pioneers chromatin accessibility proportional to motif affinity, while patterning TFs augment chromatin accessibility in sequence contexts in which they mediate enhancer activation. We conclude that chromatin accessibility occurs in two phases: one through pioneering, which makes enhancers accessible but not necessarily active, and a second when the correct combination of transcription factors leads to enhancer activation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.