Goal and AimsTo evaluate the performance of 6 wearable devices across 4 device classes (research-grade EEG-based headband, research-grade actigraphy, high-end consumer tracker, low-cost consumer tracker) over 3 age-groups (young: 18-30y, middle-aged: 31-50y and older adults: 51-70y).Focus TechnologyDreem 3 headband, Actigraph GT9X, Oura ring Gen3 running the latest sleep staging algorithm (OSSA 2.0), Fitbit Sense, Xiaomi Mi Band 7, Axtro Fit3.Reference TechnologyIn-lab polysomnography (PSG) with consensus sleep scoring.Sample60 participants (26 males) across 3 age groups (young: N=21, middle-aged: N=23 and older adults: N=16).DesignParticipants slept overnight in a sleep laboratory from their habitual sleep time to wake time, wearing 5 devices concurrently.Core AnalyticsDiscrepancy and epoch-by-epoch analyses for sleep/wake (2-stage) and sleep-stage (4-stage; wake/light/deep/REM) classification (devices vs. PSG). Mixed model ANOVAs for comparisons of biases across devices (within-subject), and age and sex (between-subjects).Core OutcomesThe EEG-based Dreem headband outperformed the other wearables in terms of 2-stage (kappa = .76) and 4-stage (kappa = .76-.86) classification but was not tolerated by at least 25% of participants. This was followed by the high-end, validated consumer trackers: Oura (2-stage kappa = .64, 4-stage kappa = .55-.70) and Fitbit (2-stage kappa = .58, 4-stage kappa = .45-.60). Next was the accelerometry-based research-grade Actigraph which only provided 2-stage classification (kappa = .47), and finally the low-cost consumer trackers which had very low kappa values overall (2-stage kappa < .31, 4-stage kappa < .33).Important Additional OutcomesProportional biases were driven by nights with poorer sleep (i.e., longer sleep onset latencies [SOL] and wake after sleep onset [WASO]). For those nights with sleep efficiency ≥85%, the large majority of sleep measure estimates from Dreem, Oura, Fitbit and Actigraph were within clinically acceptable limits of 30 mins. Biases for total sleep time [TST] and WASO were also largest in older participants who tended to have poorer sleep.Core ConclusionThe Dreem band is recommended for highest accuracy sleep tracking, but it has price, comfort and ease of use trade-offs. The high-end consumer sleep trackers (Oura, Fitbit) balance classification accuracy with cost, comfort and ease of use and are recommended for large-scale population studies where sleep is mostly normal. The low-cost trackers, despite poor wake detection could have some utility for logging time in bed.