Yale Cognitive Science

Senior Thesis

Do consumer sleep wearables actually capture how rested adolescents feel? A predictive analysis of 179 ABCD Study participants.

The Question

Consumer sleep wearables now sit on millions of adolescent wrists, marketed on a premise of objective, accurate sleep measurement. The empirical question is narrower and more difficult: do the metrics these devices report actually capture the dimension of sleep that matters most to people — the felt sense of being rested? My senior thesis tests this question against the most reliable everyday observers of adolescent sleep: the caregivers who live with these subjects.

What I Looked At

179

adolescents in the analytic sample

12–13

years old (early high school)

18

research sites across the United States

Data were drawn from the Adolescent Brain Cognitive Development (ABCD) Study, the largest longitudinal investigation of adolescent neurodevelopment in the United States. Each participant wore a Fitbit Charge 2 across the assessment window; for each participant, a caregiver completed a structured battery of sleep questions covering adequacy, restfulness, and disruption.

Finding 01

Wearables could not predict caregiver-rated sleep adequacy

Five regression families were trained on 33 objective sleep parameters under 5-fold cross-validation, with predictive performance reported as R².

Linear Regression

R² = −0.991

Ridge

R² = −0.053

Lasso

R² = −0.118

Elastic Net

R² = −0.120

Random Forest

R² = +0.020

Bars left of zero indicate performance worse than predicting the sample mean.

The best-performing model explained less than 2% of the variance in caregiver-reported sleep adequacy — substantively similar to naively predicting the sample mean for every participant.

Finding 02

The wearable and the caregiver routinely disagreed

A discordance score was computed for each participant by subtracting the normalized caregiver adequacy rating from a standardized composite of objective sleep quality. Positive values indicate the wearable composite exceeded the caregiver's assessment.

Wearable composite > caregiver rating79.9%

Caregiver rating > wearable composite20.1%

In 143 of 179 participants (mean discordance = 0.185, SD = 0.274), the device's composite landed above the caregiver's rating — the wearable was systematically more optimistic about sleep than the person observing it.

Finding 03

Eight pre-specified pairings, none survived correction

For each Fitbit metric, I picked the single caregiver question it’s supposed to line up with — sleep efficiency with the caregiver’s read on efficiency, REM percentage with feeling refreshed, and so on across eight pairs. After accounting for the fact that running eight tests at once inflates false positives, none of them held up. The one pair that even reached the conventional bar before correction pointed the wrong way — adolescents who slept longer were rated as less rested, possibly because extra sleep is compensating for poor quality rather than reflecting it.

Sleep efficiency × Caregiver-rated efficiency

r = +0.04

Wake episodes × Difficulty staying asleep

r = +0.00

Sleep onset latency × Difficulty falling asleep

r = −0.02

REM % × Feeling refreshed

r = −0.01

Deep sleep % × Feeling calm

r = +0.11

Total sleep time × Sleep adequate

r = −0.18*

Wake episodes × Daytime interference

r = −0.11

Social jetlag × Sleep timing quality

r = +0.05

0 of 8 pairings significant after BH correction.

Finding 04

Social jetlag predicted the discordance

One predictor consistently explained when wearable and caregiver assessments diverged: social jetlag — the absolute difference between school-night and free-night sleep midpoints.

Correlation strength

r = −0.277

N = 179 · p = 0.0002 · adjusted for chronotype, demographics, and objective sleep quality

Adolescents with greater circadian misalignment received lower caregiver adequacy ratings even when their objective sleep architecture appeared intact — suggesting caregivers are sensitive to the rhythm and regularity of sleep, dimensions the wearable composite does not encode.

What It Means

Subjective sleep adequacy, in this sample, is not reducible to physiological sleep architecture. It is jointly determined by sleep biology and the temporal context surrounding it — when a person sleeps, how regularly, how the week is shaped. Wearable devices instrument the first dimension well. They do not instrument the second.

The methodological implication is that consumer sleep scores answer a substantially narrower question than the one users are typically asking when they consult them. The intervention implication is that interventions targeted at consumer sleep technology, if they aim at felt sleep quality, will likely need to address circadian regularity and sleep timing alongside sleep architecture.

Next project