Bioactive molecules such as drugs, pesticides and food additives are produced in large numbers by many commercial and academic groups around the world. Enormous quantities of data are generated on the biological properties and quality of these molecules. Access to such data - both on licensed and commercially available compounds, and also on those that fail during development - is crucial for understanding how improved molecules could be developed. For example, computational analysis of aggregated data on molecules that are investigated in drug discovery programmes has led to a greater understanding of the properties of successful drugs. However, the information required to perform these analyses is rarely published, and when it is made available it is often missing crucial data or is in a format that is inappropriate for efficient data-mining. Here, we propose a solution: the definition of reporting guidelines for bioactive entities - the Minimum Information About a Bioactive Entity (MIABE) - which has been developed by representatives of pharmaceutical companies, data resource providers and academic groups.
The life science industries (including pharmaceuticals, agrochemicals and consumer goods) are exploring new business models for research and development that focus on external partnerships. In parallel, there is a desire to make better use of data obtained from sources such as human clinical samples to inform and support early research programmes. Success in both areas depends upon the successful integration of heterogeneous data from multiple providers and scientific domains, something that is already a major challenge within the industry. This issue is exacerbated by the absence of agreed standards that unambiguously identify the entities, processes and observations within experimental results. In this article we highlight the risks to future productivity that are associated with incomplete biological and chemical vocabularies and suggest a new model to address this long-standing issue.
We describe 11 best practices for the successful use of Artificial Intelligence and Machine Learning in the pharmaceutical and biotechnology research, on the data, technology, and organizational management levels.
Next-generation sequencing machines produce large quantities of data which are becoming increasingly difficult to move between collaborating organisations or even store within a single organisation. Compressing the data to assist with this is vital, but existing techniques do not perform as well as might be expected. The need for a new compression technique was identified by the Pistoia Alliance who commissioned an open innovation contest to find one. The dynamic and interactive nature of the contest led to some novel algorithms and a high level of competition between participants.
The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance to improve the FAIRness for both existing and future clinical and molecular datasets. We validated the framework in collaboration with several major public-private partnership projects, demonstrating and delivering improvements across all aspects of FAIR and across a variety of datasets and their contexts. We therefore managed to establish the reproducibility and far-reaching applicability of our approach to FAIRification tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.