Accuracy of Fitbit Charge 4, Garmin Vivosmart 4, and WHOOP Versus Polysomnography: Systematic Review

Background Consumer wearable technologies have become ubiquitous, with clinical and non-clinical populations leveraging a variety of devices to quantify various aspects of health and wellness. However, the accuracy with which these devices measure biometric outcomes such as heart rate, sleep and physical activity remains unclear. Objective To conduct a ‘living’ (i.e. ongoing) evaluation of the accuracy of consumer wearable technologies in measuring various physiological outcomes. Methods A systematic search of the literature was conducted in the following scientific databases: MEDLINE via PubMed, Embase, Cinahl and SPORTDiscus via EBSCO. The inclusion criteria required systematic reviews or meta-analyses that evaluated the validation of consumer wearable devices against accepted reference standards. In addition to publication details, review protocol, device specifics and a summary of the authors’ results, we extracted data on mean absolute percentage error (MAPE), pooled absolute bias, intraclass correlation coefficients (ICCs) and mean absolute differences. Results Of 904 identified studies through the initial search, 24 systematic reviews met our inclusion criteria; these systematic reviews included 249 non-duplicate validation studies of consumer wearable devices involving 430,465 participants (43% female). Of the commercially available wearable devices released to date, approximately 11% have been validated for at least one biometric outcome. However, because a typical device can measure a multitude of biometric outcomes, the number of validation studies conducted represents just 3.5% of the total needed for a comprehensive evaluation of these devices. For heart rate, wearables showed a mean bias of ± 3%. In arrhythmia detection, wearables exhibited a pooled sensitivity and specificity of 100% and 95%, respectively. For aerobic capacity, wearables significantly overestimated VO2max by ± 15.24% during resting tests and ± 9.83% during exercise tests. Physical activity intensity measurements had a mean absolute error ranging from 29 to 80%, depending on the intensity of the activity being undertaken. Wearables mostly underestimated step counts (mean absolute percentage errors ranging from − 9 to 12%) and energy expenditure (mean bias = − 3 kcal per minute, or − 3%, with error ranging from − 21.27 to 14.76%). For blood oxygen saturation, wearables showed a mean absolute difference of up to 2.0%. Sleep measurement showed a tendency to overestimate total sleep time (mean absolute percentage error typically > 10%). Conclusions While consumer wearables show promise in health monitoring, a conclusive assessment of their accuracy is impeded by pervasive heterogeneity in research outcomes and methodologies. There is a need for standardised validation protocols and collaborative industry partnerships to enhance the reliability and practical applicability of wearable technology assessments. Prospero ID CRD42023402703.

show abstract

Performance Evaluation of the Verily Numetric Watch sleep suite for digital sleep assessment against in-lab polysomnography

Nelson,

Saeb,

Barman

et al. 2024

Preprint

View full text Add to dashboard Cite

The goal was to evaluate the performance of a multi-sensor wrist-worn wearable device for generating 12 sleep measures in a diverse cohort. Our study technology was the sleep suite of the Verily Numetric Watch (VNW), using polysomnography (PSG) as reference during 1-night simultaneous recording in a sample of N=41 (18 male, age range: 18-78 years). We performed epoch-by-epoch comparisons for all measures. Key specific analyses were: core accuracy metrics for sleep vs wake classification; bias for continuous measures (Bland-Altman); Cohen’s kappa and accuracy for sleep stage classifications; and mean count difference and linearly weighted Cohen’s kappa for count metric. In addition, we performed subgroup analyses by sex, age, skin tone, body mass index, and arm hair density. Sensitivity and specificity (95% CI) of sleep versus wake classification were 0.97 (0.96, 0.98) and 0.66 (0.61, 0.71), respectively. Mean total sleep time bias was 14.55 minutes (1.61, 27.16); wake after sleep onset, −11.77 minutes (−23.89, 1.09); sleep efficiency, 3.15% (0.68, 5.57); sleep onset latency, −3.24 minutes (−9.38, 3.57); light-sleep duration, 3.78 minutes (−7.04, 15.06); deep-sleep duration, 3.91 minutes (−4.59, 12.60); rapid eye movement-sleep duration, 6.94 minutes (0.57, 13.04). Median difference for number of awakenings, 0.00 (0.00, 1.00); and overall accuracy of sleep stage classification, 0.78 (0.51, 0.88). Most measures showed statistically significant proportional biases and/or heteroscedasticity. Subgroup results appeared largely consistent with the overall group, although small samples preclude strong conclusions. These results support the use of VNW’s in classifying sleep versus wake, sleep stages, and for related overnight sleep measures.

show abstract

Accuracy of Fitbit Charge 4, Garmin Vivosmart 4, and WHOOP Versus Polysomnography: Systematic Review

Cited by 8 publications

References 46 publications

REPAIR Platform: Robot-AidEd PersonAlIzed Rehabilitation

REPAIR Platform: Robot-AidEd PersonAlIzed Rehabilitation

Keeping Pace with Wearables: A Living Umbrella Review of Systematic Reviews Evaluating the Accuracy of Consumer Wearable Technologies in Health Measurement

Performance Evaluation of the Verily Numetric Watch sleep suite for digital sleep assessment against in-lab polysomnography

Contact Info

Product

Resources

About