Background Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. Methods We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. Results The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. Conclusions A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.
Several groups of bacteria have complex life cycles involving cellular differentiation and multicellular structures. For example, actinobacteria of the genus Streptomyces form multicellular vegetative hyphae, aerial hyphae, and spores. However, similar life cycles have not yet been described for archaea. Here, we show that several haloarchaea of the family Halobacteriaceae display a life cycle resembling that of Streptomyces bacteria. Strain YIM 93972 (isolated from a salt marsh) undergoes cellular differentiation into mycelia and spores. Other closely related strains are also able to form mycelia, and comparative genomic analyses point to gene signatures (apparent gain or loss of certain genes) that are shared by members of this clade within the Halobacteriaceae. Genomic, transcriptomic and proteomic analyses of non-differentiating mutants suggest that a Cdc48-family ATPase might be involved in cellular differentiation in strain YIM 93972. Additionally, a gene encoding a putative oligopeptide transporter from YIM 93972 can restore the ability to form hyphae in a Streptomyces coelicolor mutant that carries a deletion in a homologous gene cluster (bldKA-bldKE), suggesting functional equivalence. We propose strain YIM 93972 as representative of a new species in a new genus within the family Halobacteriaceae, for which the name Actinoarchaeum halophilum gen. nov., sp. nov. is herewith proposed. Our demonstration of a complex life cycle in a group of haloarchaea adds a new dimension to our understanding of the biological diversity and environmental adaptation of archaea.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.