ObjectiveTo determine how machine learning has been applied to prediction applications in population health contexts. Specifically, to describe which outcomes have been studied, the data sources most widely used and whether reporting of machine learning predictive models aligns with established reporting guidelines.DesignA scoping review.Data sourcesMEDLINE, EMBASE, CINAHL, ProQuest, Scopus, Web of Science, Cochrane Library, INSPEC and ACM Digital Library were searched on 18 July 2018.Eligibility criteriaWe included English articles published between 1980 and 2018 that used machine learning to predict population-health-related outcomes. We excluded studies that only used logistic regression or were restricted to a clinical context.Data extraction and synthesisWe summarised findings extracted from published reports, which included general study characteristics, aspects of model development, reporting of results and model discussion items.ResultsOf 22 618 articles found by our search, 231 were included in the review. The USA (n=71, 30.74%) and China (n=40, 17.32%) produced the most studies. Cardiovascular disease (n=22, 9.52%) was the most studied outcome. The median number of observations was 5414 (IQR=16 543.5) and the median number of features was 17 (IQR=31). Health records (n=126, 54.5%) and investigator-generated data (n=86, 37.2%) were the most common data sources. Many studies did not incorporate recommended guidelines on machine learning and predictive modelling. Predictive discrimination was commonly assessed using area under the receiver operator curve (n=98, 42.42%) and calibration was rarely assessed (n=22, 9.52%).ConclusionsMachine learning applications in population health have concentrated on regions and diseases well represented in traditional data sources, infrequently using big data. Important aspects of model development were under-reported. Greater use of big data and reporting guidelines for predictive modelling could improve machine learning applications in population health.Registration numberRegistered on the Open Science Framework on 17 July 2018 (available at https://osf.io/rnqe6/).
Background Our objective was to determine the impacts of artificial intelligence (AI) on public health practice. Methods We used a fundamental qualitative descriptive study design, enrolling 15 experts in public health and AI from June 2018 until July 2019 who worked in North America and Asia. We conducted in-depth semi-structured interviews, iteratively coded the resulting transcripts, and analyzed the results thematically. Results We developed 137 codes, from which nine themes emerged. The themes included opportunities such as leveraging big data and improving interventions; barriers to adoption such as confusion regarding AI’s applicability, limited capacity, and poor data quality; and risks such as propagation of bias, exacerbation of inequity, hype, and poor regulation. Conclusions Experts are cautiously optimistic about AI’s impacts on public health practice, particularly for improving disease surveillance. However, they perceived substantial barriers, such as a lack of available expertise, and risks, including inadequate regulation. Therefore, investment and research into AI for public health practice would likely be beneficial. However, increased access to high-quality data, research and education regarding the limitations of AI, and development of rigorous regulation are necessary to realize these benefits.
The field of nutritional epidemiology faces challenges posed by measurement error, diet as a complex exposure, and residual confounding. The objective of this perspective article is to highlight how developments in big data and machine learning can help address these challenges. New methods of collecting 24-h dietary recalls and recording diet could enable larger samples and more repeated measures to increase statistical power and measurement precision. In addition, use of machine learning to automatically classify pictures of food could become a useful complimentary method to help improve precision and validity of dietary measurements. Diet is complex due to thousands of different foods that are consumed in varying proportions, fluctuating quantities over time, and differing combinations. Current dietary pattern methods may not integrate sufficient dietary variation, and most traditional modeling approaches have limited incorporation of interactions and nonlinearity. Machine learning could help better model diet as a complex exposure with nonadditive and nonlinear associations. Last, novel big data sources could help avoid unmeasured confounding by offering more covariates, including both omics and features derived from unstructured data with machine learning methods. These opportunities notwithstanding, application of big data and machine learning must be approached cautiously to ensure quality of dietary measurements, avoid overfitting, and confirm accurate interpretations. Greater use of machine learning and big data would also require substantial investments in training, collaborations, and computing infrastructure. Overall, we propose that judicious application of big data and machine learning in nutrition science could offer new means of dietary measurement, more tools to model the complexity of diet and its relations with diseases, and additional potential ways of addressing confounding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.