This notebook contains the code and results of the data-analysis for the blog-post that I published here: https://hermandevries.nl/2020/09/23/relationships-between-hrv-sleep-and-physical-activity-in-personal-data/
I performed these analyses in RStudio 4.0.2 that uses R 1.3.1073.
library(tidyverse) # Includes the packages dplyr and ggplot2 that I'll be using a lot.
library(here) # Useful for convenient file-management.
library(TTR) # Used to calculate the rolling averages with the SMA() function.
# Import the .csv file that I got from the Oura cloud dashboard:
df_oura <- read_csv(here::here("Data/OURA/oura_2019-05-11_2020-09-22_trends.csv"))
Duplicated column names deduplicated: 'Sleep Timing' => 'Sleep Timing_1' [20]Parsed with column specification:
cols(
.default = col_double(),
date = col_date(format = ""),
`Sleep Timing_1` = col_logical(),
`Bedtime Start` = col_datetime(format = ""),
`Bedtime End` = col_datetime(format = "")
)
See spec(...) for full column specifications.
# In my exported file, I had 2 days with missing nocturnal data. This were two nights with a very short sleep duration, which the Oura algorithm apparently filters out for some reason because it doesn't trust the observation - but I know these were actually correct. Luckily, the app itself still showed the data visually, so I added these two observations manually.
# My missing value on March 13, 2020:
df_oura$`Total Sleep Time`[308] = 121*60
df_oura$`Total Bedtime`[308] = 248*60
df_oura$`Awake Time`[308] = 127*60
df_oura$`REM Sleep Time`[308] = 3*60
df_oura$`Light Sleep Time`[308] = 52*60
df_oura$`Deep Sleep Time`[308] = 65*60
df_oura$`Sleep Latency`[308] = 43*60
df_oura$`Sleep Score`[308] = 38
df_oura$`Sleep Efficiency`[308] = 49
df_oura$`Bedtime Start`[308] = as.POSIXct("2020-03-12 22:03:00",
tz = "Europe/Amsterdam",
format = "%Y-%m-%d %H:%M:%OS")
df_oura$`Bedtime End`[308] = as.POSIXct("2020-03-13 02:11:00",
tz = "Europe/Amsterdam",
format = "%Y-%m-%d %H:%M:%OS")
df_oura$`Average Resting Heart Rate`[308] = 56
df_oura$`Lowest Resting Heart Rate`[308] = 52
df_oura$`Average HRV`[308] = 62
df_oura$`Temperature Deviation (°C)`[308] = 0.00
df_oura$`Respiratory Rate`[308] = 12.9
# Missing values for July 28th, 2020:
df_oura$`Total Sleep Time`[445] = 171*60
df_oura$`Total Bedtime`[445] = 191*60
df_oura$`Awake Time`[445] = 19*60
df_oura$`REM Sleep Time`[445] = 0*60
df_oura$`Light Sleep Time`[445] = 76*60
df_oura$`Deep Sleep Time`[445] = 95*60
df_oura$`Sleep Latency`[445] = 5*60
df_oura$`Sleep Score`[445] = 57
df_oura$`Sleep Efficiency`[445] = 90
df_oura$`Bedtime Start`[445] = as.POSIXct("2020-07-28 22:33:00",
tz = "Europe/Amsterdam",
format = "%Y-%m-%d %H:%M:%OS")
df_oura$`Bedtime End`[445] = as.POSIXct("2020-07-28 01:34:00",
tz = "Europe/Amsterdam",
format = "%Y-%m-%d %H:%M:%OS")
df_oura$`Average Resting Heart Rate`[445] = 43
df_oura$`Lowest Resting Heart Rate`[445] = 42
df_oura$`Average HRV`[445] = 92
df_oura$`Temperature Deviation (°C)`[445] = -0.10
df_oura$`Respiratory Rate`[445] = 12.4
# I like to rename some variables for more convenient coding, and will calculate some additional variables here that I'll use:
df_oura <- df_oura %>%
rename(
score_sleep = `Sleep Score`,
score_sleep_total = `Total Sleep Score`,
score_sleep_rem = `REM Sleep Score`,
score_sleep_deep = `Deep Sleep Score`,
score_sleep_efficiency = `Sleep Efficiency Score`,
score_sleep_restfullness = `Restfulness Score`,
score_sleep_latency = `Sleep Latency Score`,
score_sleep_timing = `Sleep Timing Score`,
tbt_min = `Total Bedtime`,
tst_min = `Total Sleep Time`,
awake_min = `Awake Time`,
rem_min = `REM Sleep Time`,
light_min = `Light Sleep Time`,
deep_min = `Deep Sleep Time`,
restless = `Restless Sleep`,
se = `Sleep Efficiency`,
sol_min = `Sleep Latency`,
timing = `Sleep Timing`,
start = `Bedtime Start`,
end = `Bedtime End`,
rhr_avg = `Average Resting Heart Rate`,
rhr_low = `Lowest Resting Heart Rate`,
hrv = `Average HRV`,
temp = `Temperature Deviation (°C)`,
rr = `Respiratory Rate`,
score_activity = `Activity Score`,
score_stay_active = `Stay Active Score`,
score_move_every_hour = `Move Every Hour Score`,
score_meet_daily_targets = `Meet Daily Targets Score`,
score_training_frequency = `Training Frequency Score`,
score_training_volume = `Training Volume Score`,
score_recovery_time = `Recovery Time Score`,
cal_act = `Activity Burn`,
cal_tot = `Total Burn`,
cal_tar = `Target Calories`,
steps = Steps,
active = `Daily Movement`,
sed = `Inactive Time`,
rest = `Rest Time`,
lig = `Low Activity Time`,
mod = `Medium Activity Time`,
vig = `High Activity Time`,
nwt = `Non-wear Time`,
met = `Average MET`,
inactive_long = `Long Periods of Inactivity`,
score_readiness = `Readiness Score`,
score_prev_night = `Previous Night Score`,
score_sleep_balance = `Sleep Balance Score`,
score_prev_activity = `Previous Day Activity Score`,
score_activity_balance = `Activity Balance Score`,
score_temp = `Temperature Score`,
score_rhr = `Resting Heart Rate Score`,
score_hrv_balance = `HRV Balance Score`,
score_recovery = `Recovery Index Score`
) %>%
mutate(
tbt_min = tbt_min / 60,
tbt_hr = tbt_min / 60,
tst_min = tst_min / 60,
tst_hr = tst_min / 60,
awake_min = awake_min / 60,
awake_hr = awake_min / 60,
rem_min = rem_min / 60,
rem_hr = rem_min / 60,
light_min = light_min / 60,
light_hr = light_min / 60,
deep_min = deep_min / 60,
deep_hr = deep_min / 60,
mvpa = mod+vig,
tst_hr_7dra = SMA(tst_hr, 7),
hrv_7dra = SMA(hrv, 7)
) %>%
dplyr::select(-`Sleep Timing_1`)
# Dump all the algorithmically determined scores, since I'm personally not interested in those:
df_oura <- df_oura %>% dplyr::select(-starts_with("score_"))
# Exploring the distribution of the HRV data:
hist_hrv <- ggplot(df_oura, aes(hrv)) +
geom_histogram() +
labs(
title = "Histogram of nocturnal HRV",
subtitle = "Measured with OURA ring (v2)",
x = "RMSSD (ms)",
y = "Count"
) +
theme_light()
hist_hrv
# That distribution actually looks relatively symmetric already. Let's logarithmically transform it just in case (this is a common practice in HRV research) and see if the distribution improves:
df_oura <- df_oura %>%
mutate(lnhrv = log(hrv))
# Create the actual histogram to see that distribution:
hist_lnhrv <- ggplot(df_oura, aes(lnhrv)) +
geom_histogram() +
labs(
title = "Histogram of natural log of nocturnal HRV",
subtitle = "Measured with OURA ring (v2)",
x = "lnRMSSD (ms)",
y = "Count"
) +
theme_light()
hist_lnhrv
# That arguably looks slightly worse. Let's do some normality tests to see explore which of the two is the most normal (gaussian):
shapiro.test(df_oura$hrv)
Shapiro-Wilk normality test
data: df_oura$hrv
W = 0.99402, p-value = 0.04591
shapiro.test(df_oura$lnhrv)
Shapiro-Wilk normality test
data: df_oura$lnhrv
W = 0.95706, p-value = 6.638e-11
# Confirmed. The first is almost significant, the second very significant. I'll work with regular HRV in my analysis.
# Let's also check out the TST distribution:
hist_tst <- ggplot(df_oura, aes(tst_hr)) +
geom_histogram() +
labs(
title = "Histogram of TST",
subtitle = "Measured with OURA ring (v2)",
x = "TST (hr)",
y = "Count"
) +
theme_light()
hist_tst
shapiro.test(df_oura$tst_hr)
Shapiro-Wilk normality test
data: df_oura$tst_hr
W = 0.93266, p-value = 3.053e-14
# Doesn't look normal (gaussian), but at least it looks reasonably symmetric so I'm not concerned with using it in the upcoming analyses.
library(performance) # A package with useful functions to easily assess the model diagnostics, e.g. with the check_model() function
Warning messages:
1: Unknown or uninitialised column: `tst_hr_std`.
2: Unknown or uninitialised column: `tst_hr_7dra_std`.
3: Unknown or uninitialised column: `hrv_std`.
4: Unknown or uninitialised column: `hrv_7dra_std`.
# First: See if time sedentary, light, moderate and vigorous physical activity predict the nocturnal HRV:
lm_hrv_pa <- lm(lead(hrv) ~ vig + mod + lig + sed, df_oura)
summary(lm_hrv_pa)
Call:
lm(formula = lead(hrv) ~ vig + mod + lig + sed, data = df_oura)
Residuals:
Min 1Q Median 3Q Max
-44.295 -10.895 -0.433 11.107 40.240
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 100.901963 8.723832 11.566 < 2e-16 ***
vig -0.099211 0.018831 -5.268 2.06e-07 ***
mod -0.198167 0.038302 -5.174 3.34e-07 ***
lig -0.017947 0.012047 -1.490 0.137
sed -0.049845 0.009714 -5.132 4.14e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.92 on 495 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.1016, Adjusted R-squared: 0.09432
F-statistic: 13.99 on 4 and 495 DF, p-value: 7.998e-11
check_model(lm_hrv_pa)
# Yes they do, all of them except for light PA. Model diagnostics look great. 9.4% explained variance.
# Second: See if my nocturnal HRV predicts my subsequent daytime moderate-to-vigorous physical activity (MVPA).
lm_mvpa_hrv <- lm(mvpa ~ hrv, df_oura)
summary(lm_mvpa_hrv) # It does. 8.8% explained variance.
Call:
lm(formula = mvpa ~ hrv, data = df_oura)
Residuals:
Min 1Q Median 3Q Max
-64.48 -25.78 -11.29 10.03 328.55
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.5537 7.6864 -2.544 0.0113 *
hrv 0.8631 0.1221 7.071 5.21e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 45.6 on 499 degrees of freedom
Multiple R-squared: 0.09108, Adjusted R-squared: 0.08926
F-statistic: 50 on 1 and 499 DF, p-value: 5.212e-12
check_model(lm_mvpa_hrv)
Not enough model terms in the conditional part of the model to check for multicollinearity.
# Model diagnostics look iffy. Redisuals are clearly not normally distributed, and the homoscedasticity graph shows a clear floor effect of the HRV, whereas the Q-Q plot shows that particularly for very high HRV values, the residuals move away from the line. Perhaps this same model but with the logarithmically transformed HRV is better?
lm_mvpa_lnhrv <- lm(mvpa ~ lnhrv, df_oura)
summary(lm_mvpa_lnhrv)
Call:
lm(formula = mvpa ~ lnhrv, data = df_oura)
Residuals:
Min 1Q Median 3Q Max
-54.40 -27.71 -12.69 10.49 328.84
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -152.266 27.574 -5.522 5.40e-08 ***
lnhrv 45.549 6.766 6.732 4.61e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 45.8 on 499 degrees of freedom
Multiple R-squared: 0.08326, Adjusted R-squared: 0.08143
F-statistic: 45.32 on 1 and 499 DF, p-value: 4.605e-11
check_model(lm_mvpa_lnhrv)
Not enough model terms in the conditional part of the model to check for multicollinearity.