Background Between 2013 and 2015, the UK Biobank collected accelerometer traces from 103,712 volunteers aged between 40 and 69 years using wrist-worn triaxial accelerometers for 1 week. This data set has been used in the past to verify that individuals with chronic diseases exhibit reduced activity levels compared with healthy populations. However, the data set is likely to be noisy, as the devices were allocated to participants without a set of inclusion criteria, and the traces reflect free-living conditions. Objective This study aims to determine the extent to which accelerometer traces can be used to distinguish individuals with type 2 diabetes (T2D) from normoglycemic controls and to quantify their limitations. Methods Machine learning classifiers were trained using different feature sets to segregate individuals with T2D from normoglycemic individuals. Multiple criteria, based on a combination of self-assessment UK Biobank variables and primary care health records linked to UK Biobank participants, were used to identify 3103 individuals with T2D in this population. The remaining nondiabetic 19,852 participants were further scored on their physical activity impairment severity based on other conditions found in their primary care data, and those deemed likely physically impaired at the time were excluded. Physical activity features were first extracted from the raw accelerometer traces data set for each participant using an algorithm that extends the previously developed Biobank Accelerometry Analysis toolkit from Oxford University. These features were complemented by a selected collection of sociodemographic and lifestyle features available from UK Biobank. Results We tested 3 types of classifiers, with an area under the receiver operating characteristic curve (AUC) close to 0.86 (95% CI 0.85-0.87) for all 3 classifiers and F1 scores in the range of 0.80-0.82 for T2D-positive individuals and 0.73-0.74 for T2D-negative controls. Results obtained using nonphysically impaired controls were compared with highly physically impaired controls to test the hypothesis that nondiabetic conditions reduce classifier performance. Models built using a training set that included highly impaired controls with other conditions had worse performance (AUC 0.75-0.77; 95% CI 0.74-0.78; F1 scores in the range of 0.76-0.77 for T2D positives and 0.63-0.65 for controls). Conclusions Granular measures of free-living physical activity can be used to successfully train machine learning models that are able to discriminate between individuals with T2D and normoglycemic controls, although with limitations because of the intrinsic noise in the data sets. From a broader clinical perspective, these findings motivate further research into the use of physical activity traces as a means of screening individuals at risk of diabetes and for early detection, in conjunction with routinely used risk scores, provided that appropriate quality control is enforced on the data collection protocol.
A major challenge in the model‐based engineering of Cyber‐Physical Systems (CPSs) is that of providing methods and tools that support decision‐making in design. In the CPS design space, system models may be composed of diverse notations and tools, and the range of alternatives to be evaluated is immense. In spite of this, the benefits of trade‐off analysis at an early design stage are significant, making the automation of Design Space Exploration (DSE) an appealing prospect. In bringing DSE to industry scale, there is a clear need for both guidance to engineers in designing DSE, and a theory of cost‐effective DSE algorithms. To address this, we propose and demonstrate a SysML profile that encourages the explicit description of DSE over multi‐disciplinary co‐models, and demonstrate the ability to tune a genetic DSE algorithm across a range of design spaces.
BACKGROUND Between 2013 and 2015, the UK Biobank (UKBB) collected accelerometer traces (AXT) using wrist-worn triaxial accelerometers for 103,712 volunteers aged between 40 and 69, for one week each. This dataset has been used in the past to verify that individuals with chronic diseases exhibit reduced activity levels compared to healthy populations 1. Yet, the dataset is likely to be noisy, as the devices were allocated to participants without a specific set of inclusion criteria, and the traces reflect uncontrolled free-living conditions. OBJECTIVE To determine the extent to which AXT traces can distinguish individuals with Type-2 Diabetes (T2D) from normoglycaemic controls, and to quantify their limitations. METHODS Physical activity features were first extracted from the raw AXT dataset for each participant, using an algorithm that extends the previously developed Biobank Accelerometry Analysis toolkit from Oxford University 1. These features were complemented by a selected collection of socio-demographic and lifestyle (SDL) features available from UKBB. Clustering was used to determine whether activity features would naturally partition participants, and the SDL features were projected onto the resulting clusters for a more meaningful interpretation. Supervised machine learning classifiers were then trained using the different sets of features, to segregate T2D positive individuals from normoglycaemic. Multiple criteria, based on a combination of self-assessment Biobank variables and primary care health records linked to the participants in Biobank, were used to identify 3,103 individuals in this population who have T2D. The remaining non-diabetic participants were further scored on their physical activity impairment severity levels based on other conditions found in their primary care data, and those likely to have been physically impaired at the time were excluded. RESULTS Three types of classifiers were tested, with AUROC close to .86 for all three, and F1 scores in the range [.80,.82] for T2D positives and [.73,.74] for controls. Results obtained using non-physically impaired controls were compared to highly physically impaired controls, to test the hypothesis that non-diabetes conditions reduce classifier performance. Models built using a training set that includes controls with other conditions had worse performance: AUROC [.75-.77] and F1 in the range [.76-.77] (positives) and [.63,.65] (controls). Clusters generated using k-means and hierarchical methods showed limited quality (Silhouette scores: 0.105, 0.207 respectively), however a 2-dimensional visual rendering obtained using T-SNE reveals well-defined clusters. Importantly, one of the 3 hierarchical clusters contain almost exclusively (close to 100%) T2D participants. CONCLUSIONS The study demonstrates the potential, and limitations, of AXT in the UKBB when these are used to discriminate between T2D and normoglycaemic controls. The use of primary care EHRs is essential both to correctly identify positives, and also to identify controls that should be excluded to reduce noise in the training set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.