BackgroundThis article provides an overview of the first BioASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BioASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies.ResultsThe 2013 BioASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PubMed documents with MeSH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MeSH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BioASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available.ConclusionsA publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MeSH headings to published articles or to English questions; retrieve relevant RDF triples from ontologies, relevant articles and snippets from PubMed Central; produce “exact” and paragraph-sized “ideal” answers (summaries). The results of the systems that participated in the 2013 BioASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM’s MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the “ideal” answers; hence, they produced high quality summaries as answers. Overall, BioASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0564-6) contains supplementary material, which is available to authorized users.
The eukaryotic replisome is a crucial determinant of genome stability, but its structure is still poorly understood. We found previously that many regulatory proteins assemble around the MCM2-7 helicase at yeast replication forks to form the replisome progression complex (RPC), which might link MCM2-7 to other replisome components. Here, we show that the RPC associates with DNA polymerase a that primes each Okazaki fragment during lagging strand synthesis. Our data indicate that a complex of the GINS and Ctf4 components of the RPC is crucial to couple MCM2-7 to DNA polymerase a. Others have found recently that the Mrc1 subunit of RPCs binds DNA polymerase epsilon, which synthesises the leading strand at DNA replication forks. We show that cells lacking both Ctf4 and Mrc1 experience chronic activation of the DNA damage checkpoint during chromosome replication and do not complete the cell cycle. These findings indicate that coupling MCM2-7 to replicative polymerases is an important feature of the regulation of chromosome replication in eukaryotes, and highlight a key role for Ctf4 in this process.
Background The UK 100,000 Genomes Project is in the process of investigating the role of genome sequencing of patients with undiagnosed rare disease following usual care, and the alignment of research with healthcare implementation in the UK’s national health service. (Other parts of this Project focus on patients with cancer and infection.) Methods We enrolled participants, collected clinical features with human phenotype ontology terms, undertook genome sequencing and applied automated variant prioritization based on virtual gene panels (PanelApp) and phenotypes (Exomiser), alongside identification of novel pathogenic variants through research analysis. We report results on a pilot study of 4660 participants from 2183 families with 161 disorders covering a broad spectrum of rare disease. Results Diagnostic yields varied by family structure and were highest in trios and larger pedigrees. Likely monogenic disorders had much higher diagnostic yields (35%) with intellectual disability, hearing and vision disorders, achieving yields between 40 and 55%. Those with more complex etiologies had an overall 25% yield. Combining research and automated approaches was critical to 14% of diagnoses in which we found etiologic non-coding, structural and mitochondrial genome variants and coding variants poorly covered by exome sequencing. Cohort-wide burden testing across 57,000 genomes enabled discovery of 3 new disease genes and 19 novel associations. Of the genetic diagnoses that we made, 24% had immediate ramifications for the clinical decision-making for the patient or their relatives. Conclusion Our pilot study of genome sequencing in a national health care system demonstrates diagnostic uplift across a range of rare diseases. (Funded by National Institute for Health Research and others)
Comparative genomics has revealed a class of non-protein-coding genomic sequences that display an extraordinary degree of conservation between two or more organisms, regularly exceeding that found within protein-coding exons. These elements, collectively referred to as conserved non-coding elements (CNEs), are non-randomly distributed across chromosomes and tend to cluster in the vicinity of genes with regulatory roles in multicellular development and differentiation. CNEs are organized into functional ensembles called genomic regulatory blocks–dense clusters of elements that collectively coordinate the expression of shared target genes, and whose span in many cases coincides with topologically associated domains. CNEs display sequence properties that set them apart from other sequences under constraint, and have recently been proposed as useful markers for the reconstruction of the evolutionary history of organisms. Disruption of several of these elements is known to contribute to diseases linked with development, and cancer. The emergence, evolutionary dynamics and functions of CNEs still remain poorly understood, and new approaches are required to enable comprehensive CNE identification and characterization. Here, we review current knowledge and identify challenges that need to be tackled to resolve the impasse in understanding extreme non-coding conservation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.