IFNα-based antiviral therapy of hepatitis delta was independently associated with a lower likelihood for clinical disease progression. Durable undetectability of HDV RNA is a valid surrogate endpoint in the treatment of hepatitis delta. (Hepatology 2017;65:414-425).
Existing non-invasive fibrosis scores are either impracticable or do not perform well in chronic hepatitis delta patients. However, the new Delta Fibrosis Score is the first non-invasive fibrosis score specifically developed for chronic hepatitis delta and requires only standard parameters.
Model selection and performance assessment for prediction models are important tasks in machine learning, e.g. for the development of medical diagnosis or prognosis rules based on complex data. A common approach is to select the best model via cross-validation and to evaluate this final model on an independent dataset. In this work, we propose to instead evaluate several models simultaneously. These may result from varied hyperparameters or completely different learning algorithms. Our main goal is to increase the probability to correctly identify a model that performs sufficiently well. In this case, adjusting for multiplicity is necessary in the evaluation stage to avoid an inflation of the family wise error rate. We apply the so-called maxT-approach which is based on the joint distribution of test statistics and suitable to (approximately) control the family-wise error rate for a wide variety of performance measures. We conclude that evaluating only a single final model is suboptimal. Instead, several promising models should be evaluated simultaneously, e.g. all models within one standard error of the best validation model. This strategy has proven to increase the probability to correctly identify a good model as well as the final model performance in extensive simulation studies.
Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing. A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations on compiling test datasets. We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in different countries? The recommendations are intended to help AI developers demonstrate the utility of their products and to help pathologists and regulatory agencies verify reported performance measures. Further research is needed to formulate criteria for sufficiently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic workflows in the future.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.