Computational omics methods packaged as software have become essential to modern biological research. The increasing dependence of scientists on these powerful software tools creates a need for systematic assessment of these methods, known as benchmarking. Adopting a standardized benchmarking practice could help researchers who use omics data to better leverage recent technological innovations. Our review summarizes benchmarking practices from 25 recent studies and discusses the challenges, advantages, and limitations of benchmarking across various domains of biology. We also propose principles that can make computational biology benchmarking studies more sustainable and reproducible, ultimately increasing the transparency of biomedical data and results.
Background: Rapid, preoperative identification of patients with the highest risk for medical complications is necessary to ensure that limited infrastructure and human resources are directed towards those most likely to benefit. Existing risk scores either lack specificity at the patient level or utilise the American Society of Anesthesiologists (ASA) physical status classification, which requires a clinician to review the chart. Methods: We report on the use of machine learning algorithms, specifically random forests, to create a fully automated score that predicts postoperative in-hospital mortality based solely on structured data available at the time of surgery. Electronic health record data from 53 097 surgical patients (2.01% mortality rate) who underwent general anaesthesia between April 1, 2013 and December 10, 2018 in a large US academic medical centre were used to extract 58 preoperative features. Results: Using a random forest classifier we found that automatically obtained preoperative features (area under the curve [AUC] of 0.932, 95% confidence interval [CI] 0.910e0.951) outperforms Preoperative Score to Predict Postoperative Mortality (POSPOM) scores (AUC of 0.660, 95% CI 0.598e0.722), Charlson comorbidity scores (AUC of 0.742, 95% CI
Developing new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through uniform resource locators (URLs) published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed “easy to install,” and 28% of the tools failed to be installed at all because of problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software.
Worldwide, testing capacity for SARS-CoV-2 is limited and bottlenecks in the scale up of polymerase chain reaction (PCR-based testing exist. Our aim was to develop and evaluate a machine learning algorithm to diagnose COVID-19 in the inpatient setting. The algorithm was based on basic demographic and laboratory features to serve as a screening tool at hospitals where testing is scarce or unavailable. We used retrospectively collected data from the UCLA Health System in Los Angeles, California. We included all emergency room or inpatient cases receiving SARS-CoV-2 PCR testing who also had a set of ancillary laboratory features (n = 1,455) between 1 March 2020 and 24 May 2020. We tested seven machine learning models and used a combination of those models for the final diagnostic classification. In the test set (n = 392), our combined model had an area under the receiver operator curve of 0.91 (95% confidence interval 0.87-0.96). The model achieved a sensitivity of 0.93 (95% CI 0.85-0.98), specificity of 0.64 (95% CI 0.58-0.69). We found that our machine learning algorithm had excellent diagnostic metrics compared to SARS-CoV-2 PCR. This ensemble machine learning algorithm to diagnose COVID-19 has the potential to be used as a screening tool in hospital settings where PCR testing is scarce or unavailable.
Bacillus Calmette–Guerin (BCG) is a live attenuated form of Mycobacterium bovis that was developed 100 years ago as a vaccine against tuberculosis (TB) and has been used ever since to vaccinate children globally. It has also been used as the first-line treatment in patients with nonmuscle invasive bladder cancer (NMIBC), through repeated intravesical applications. Numerous studies have shown that BCG induces off-target immune effects in various pathologies. Accumulating data argue for the critical role of the immune system in the course of neurodegenerative diseases such as Alzheimer’s disease (AD) and Parkinson’s disease (PD). In this study, we tested whether repeated exposure to BCG during the treatment of NMIBC is associated with the risk of developing AD and PD. We presented a multi-center retrospective cohort study with patient data collected between 2000 and 2019 that included 12,185 bladder cancer (BC) patients, of which 2301 BCG-treated patients met all inclusion criteria, with a follow-up of 3.5 to 7 years. We considered the diagnosis date of AD and nonvascular dementia cases for BC patients. The BC patients were partitioned into those who underwent a transurethral resection of the bladder tumor followed by BCG therapy, and a disjoint group that had not received such treatment. By applying Cox proportional hazards (PH) regression and competing for risk analyses, we found that BCG treatment was associated with a significantly reduced risk of developing AD, especially in the population aged 75 years or older. The older population (≥75 years, 1578 BCG treated, and 5147 controls) showed a hazard ratio (HR) of 0.726 (95% CI: 0.529–0.996; p-value = 0.0473). While in a hospital-based cohort, BCG treatment resulted in an HR of 0.416 (95% CI: 0.203–0.853; p-value = 0.017), indicating a 58% lower risk of developing AD. The risk of developing PD showed the same trend with a 28% reduction in BCG-treated patients, while no BCG beneficial effect was observed for other age-related events such as Type 2 diabetes (T2D) and stroke. We attributed BCG’s beneficial effect on neurodegenerative diseases to a possible activation of long-term nonspecific immune effects. We proposed a prospective study in elderly people for testing intradermic BCG inoculation as a potential protective agent against AD and PD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.