Guide to Extracting Data w/ APIs from the Whoop Strap 4.0§

As Kanye once said,
Scoop-diddy-whoop
Whoop-diddy-scoop
While we certainly won’t be rapping in this notebook, we will be using the WHOOP 4.0, a small wearable that fits onto your wrist as shown above! Featuring a light, discreet wristband that is also waterproof, the WHOOP 4.0 offers comfortable physical activity and sleep tracking with wireless charging and an app with useful calculated metrics such as strain and recovery.
We’ve used the WHOOP 4.0 for a while now, and we’ll show you how to extract its data, visualize steep stages, and compute correlations and other statistical measures based on your data! While you will need a WHOOP 4.0 set up to actually collect the data, the data extraction requires only an internet connection and your username and password.
This is a comprehensive, clear guide to extract your data from the WHOOP 4.0 using an unofficial WHOOP Application Programming Interface (API).
If you want to know more about WHOOP 4.0, see the README for a detailed analysis of performances, sensors, data privacy, and extraction pipelines.
We will be able to extract the following parameters:
Parameter Name |
Sampling Frequency |
|---|---|
Average heart rate |
Daily |
Maximum heart rate |
Daily |
Kilojoules (energy burned) |
Daily |
Strain score |
Daily |
Number of naps |
Daily |
Baseline sleep need |
Daily |
Sleep debt |
Daily |
Sleep need strain (extra sleep required due to strain) |
Daily |
Total sleep need (calculated with other values) |
Daily |
Sleep quality duration |
Daily |
Respiratory rate |
Daily (during core sleep) |
Blood oxygen |
Daily (during core sleep) |
Resting heart rate |
Daily (during core sleep) |
Heart rate variability |
Daily (during core sleep) |
Skin temperature |
Daily (during core sleep) |
Heart rate |
Every 7 seconds |
Note that some of the parameters are for an entire 24-hour period, while others are measured and reported only for the time the user is in a core sleep (contrasted with a nap, the core sleep is the “main” sleep, which usually occurs at night).
In this guide, we sequentially cover the following five topics to extract from the WHOOP 4.0 API: 1. Setup 2. Authentication/Authorization - Requires only username and password, no OAuth(2). 3. Data extraction - You can get data from the API in a couple lines of code. 4. Data visualization - 4.1: We reproduce a day-by-day week-long plot of hours slept and sleep needed, which is displayed in the mobile app originally. We do this with matplotlib. - 4.2: We reproduce a time series plot of heart rate over the course of a day, sampled every 6 seconds, which is shown in the WHOOP app (both the webapp and the mobile app, if you turn the phone sideways). In this plot, we deal with missing data (when the user takes off the device for a long time) and also combine sleep information to color in intervals of the time series where the user is sleeping, allowing you to quickly see how the heart rate responds to sleep. 5. Data analysis - 5.1: We try to find a correlation between the length of a sleep period and the median heart rate for that sleep period. We find that the correlation is not statistically significant, unless excluding an outlier. - 5.2: We try to check whether the median heart rate is correlated to whether the sleep is a nap or a night sleep. We find that for our data this is statistically significant.
Disclaimer: this notebook is purely for educational purposes. All of the data currently stored in this notebook is purely synthetic, meaning randomly generated according to rules we created. Despite this, the end-to-end data extraction pipeline has been tested on our own data, meaning that if you enter your own email and password on your own Colab instance, you can visualize your own real data. That being said, we were unable to thoroughly test the timezone functionality, though, since we only have one account, so beware.
1. Setup§
1.1 Study participant setup and usage§
Setting the watch up for data collection was fairly straightforward in our experience: just download the WHOOP app and follow the instructions there. Since the watch does not connect to internet, all you need to do is connect your phone to the watch via bluetooth, and all data transfer will happen (1) between the watch and the phone via Bluetooth and (2) between the phone and Withings’ servers via Wi-Fi. The watch can store up to 3 days’ worth of data, so it does not need to be connected to the phone all of the time.
2. Authentication/Authorization§
To obtain access to data, authorization is required. All you’ll need to do here is just put in your email and password for your WHOOP 4.0 device. We’ll use this username and password to extract the data in the sections below.
In this notebook, we use a blank email and password, which indicates that we would like to use synthetic data rather than real biometric data. Though, if you put in your real email and password, everything should work as intended and pull your data from the servers in an identical data format. All demo plots should run identically (assuming data is available for the given dates, etc.).
[ ]:
#@title Enter your username and password
# download external file we've written to condense the API pinging
# and parsing process
!wget https://pastebin.com/raw/n4KG2wky -q -O whoop_user.py
import requests
import json
from datetime import datetime
import pytz
import csv
import os
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.ndimage import gaussian_filter
from math import ceil
from tqdm import tqdm
tqdm.pandas() # activate for pandas
from whoop_user import WhoopUser
email = '' #@param {type:"string"}
password = '' #@param {type:"string"}
print('Username:', email)
print('Password:', password)
Username:
Password:
3. Data Extraction§
Data can be extracted via an unofficial API, which is also documented here and here. This documented unofficial API works by replicating the requests your phone or browser makes when viewing your WHOOP data interactively either in the WHOOP app or the WHOOP web app.
We go into a bit of detail as to how this works below. This is not strictly necessary for extracting your data once, but it can be necessary if you want to extend this notebook for your own usage.
Below, the first figure shows how data is uploaded to WHOOP servers. The watch stores very little (only up to 3 days worth of data if you do not connect your phone) and sends all of its data immediately to the phone, whereas the phone is responsible for uploading the data to the server at a relatively low rate (i.e. once every few minutes). Therefore, the phone is the mediator for data transfer.
To actually view data, the watch is not involved at all. Instead, the phone makes requests to WHOOP servers. The unofficial API aims to replicate these requests from your own computer.
Unfortunately, the API is fairly locked down, so it only allows you to make requests for information that is directly displayed onto the app or web app. This means that, as far as we know, we cannot extract continuous skin temperature measurements, as an example. That being said, we can still extract quite a bit:
Continuous all-day heart rate measurements (once every six seconds)
since many values can be derived from this (e.g. time slept, calories, heart rate variability, respiratory rate), heart rate is a pretty fundamental measurement that should be useful for many analyses
Coarse statistics for blood oxygen level and skin temperature
the WHOOP 4.0 does in fact contain sensors that measure both of these, but unfortunately only summary statistics (mean, variation) about these measurements over the course of a given sleep cycle
Now let’s extract!
Let’s get some coarse statistics about every single sleep cycle you’ve had! You can also set custom start and end dates to get information only within that time range.
[ ]:
user = WhoopUser(email, password)
user.SEED = 2
user.SEED = 17
#user = WhoopUser('', '')
#@title Enter start and end dates (in the format yyyy-mm-dd)
#set start and end dates - this will give you all the data from 2000-01-01 (January 1st, 2000) to 2100-02-03 (February 3rd, 2100), for example
startStr='2000-01-01' #@param {type:"string"}
endStr='2100-02-03' #@param {type:"string"}
# get cycle data from start to end
start_end ={
'start': startStr+'T00:00:00.000Z',
'end': endStr+'T00:00:00.000Z'
}
#show information for sleep cycles of interest
cycles_df = user.get_cycles_df(params=start_end)
#gives summary statistics for various metrics
metrics_df = user.get_health_metrics_df(params=start_end)
This data provides a bird’s eye view of your entire usage of the device. With cycles_df can see what your resting heart rate was for each day, your max heart rate for each day, and how much you slept each day (in milliseconds). With metrics_df you see other metrics (which are displayed under the “Health Monitor” tab in the phone app).
4. Data Visualization§
Let’s plot how much sleep you have had, replicating the plot from the app…

Above is a plot from the app
4.1 Plot Sleep§
Here we use matplotlib to reproduce the plot above. There’s a lot of code here, but we added comments to help guide your reading.
[ ]:
#@title Plot sleep throughout the week
start_date = "2022-04-28" #@param {type:"date"}
start_idx = np.where(cycles_df.day == start_date)[0][0]
from scipy import interpolate
from datetime import datetime
# only plotting a week
PLOT_LENGTH = 7
# custom font
# https://stackoverflow.com/questions/35668219/how-to-set-up-a-custom-font-with-custom-path-to-matplotlib-global-font
# download the font and unzip (quiet so it does not print)
!wget -q 'https://dl.dafont.com/dl/?f=gymkhana'
!unzip -qo "index.html?f=gymkhana"
# move to directory where fonts should be kept
!mv gymkhana-bk.ttf /usr/share/fonts/truetype/
# build cache, redirect to /dev/null to suppress stdout output
!fc-cache -f -v > /dev/null
import matplotlib as mpl
import matplotlib.font_manager as fm
# try and except, just in case something fails we fallback onto the
# default font
try:
fe = fm.FontEntry(
#font name
fname='/usr/share/fonts/truetype/gymkhana-bk.ttf',
name='gymkhana')
fm.fontManager.ttflist.insert(0, fe) # or append is fine
mpl.rcParams['font.family'] = fe.name # = 'your custom ttf font name'
except:
pass
weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
def ms_to_hm(ms):
hours = int(ms / (60 * 60 * 1000))
minutes = round((ms / (60 * 1000)) % 60)
return hours, minutes
def ms_to_text(ms):
hours, minutes = ms_to_hm(ms)
return f'{hours:02}:{minutes:02}'
def determine_offset(i, values):
"""
Takes in array-like values and an index for that array, then
computes what offset the text annotation should have. Intuition
is that text annotation should not collide with the lines.
NOTE: We attempted to use https://github.com/Phlya/adjustText, but
could not get it to work.
"""
if i == 0:
# put above if going down, otherwise below
offset = (values[0] > values[1]) * 2 - 1
elif i == sleep_dur.shape[0] - 1:
# put above if going up, otherwise below
offset = (values[i] > values[i - 1]) * 2 - 1
elif values[i] < values[i+1] and values[i] < values[i-1]:
# valley
offset = -1
elif values[i] > values[i+1] and values[i] > values[i-1]:
# peak
offset = 1
elif values[i] > values[i+1] and values[i] < values[i-1]:
# maintain same sign of slope (downwards)
if values[i] - values[i+1] > values[i-1] - values[i]:
offset = 1
else:
offset = -1
elif values[i] < values[i+1] and values[i] > values[i-1]:
# maintain same sign of slope (upwards)
if values[i+1] - values[i] > values[i] - values[i-1]:
offset = -1
else:
offset = 1
return offset
def plot_line_fancy(X, Y, label, color=None):
plt.plot(X, Y / (60 * 60 * 1000), marker='o', markerfacecolor='black',
markeredgewidth=1, markersize=10, linewidth=3, label=label,
color=color)
# add text annotations
for i, dur in enumerate(Y):
offset = determine_offset(i, np.array(Y))
plt.text(i, 0.6 * offset + dur / (60 * 60 * 1000), ms_to_text(dur).lstrip('0'),
ha='center', fontsize=15, color=color)
# set to maximum y-value (in this case it is 12 hours)
max_sleep_hours = 13
plt.ylim(0, max_sleep_hours)
# this function turns a string like '2022-04-28' into 'Thu\n28'
def day_label_to_fig_label(day):
weekday = weekdays[datetime.strptime(day, '%Y-%m-%d').weekday()]
day_num = str(int(day.split('-')[-1]))
return weekday + '\n' + day_num
# set the labels on the left and bottom to match the app's plot
plt.xticks(ticks=np.arange(PLOT_LENGTH),
labels=[day_label_to_fig_label(day) for day in X],
fontsize=15)
plt.yticks(ticks = list(range(max_sleep_hours + 1))[::2],
labels = [f'{i}:00' for i in range(max_sleep_hours + 1)][::2],
rotation = 'horizontal',fontweight='bold', fontsize=15)
# get rid of the little tickmarks on the bottom and side
plt.tick_params(
axis='x', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
bottom=False, # ticks along the bottom edge are off
top=False, # ticks along the top edge are off
labelbottom=True)
plt.tick_params(
axis='y', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
left=False, # ticks along the left edge are off
labelbottom=True)
with plt.style.context('dark_background'):
sleep_dur = cycles_df.sleep_quality_duration.iloc[start_idx:start_idx+PLOT_LENGTH]
sleep_need = cycles_df.sleep_need_total.iloc[start_idx:start_idx+PLOT_LENGTH]
day = cycles_df.day.iloc[start_idx:start_idx+PLOT_LENGTH]
# we'll use the same dark blue shade for the background color
# throughout this plot
background_color = '#13191C'
# create the figure
plt.figure(figsize=(12,8), facecolor=background_color)
# plot both values over time
plot_line_fancy(day, sleep_dur, 'Hours of Sleep', color='#6c97b2')
plot_line_fancy(day, sleep_need, 'Sleep Need', color='#00fb9b')
# insert legend
plt.legend(facecolor=background_color, # ensure same background color
frameon=False, # turn off boundaries
ncol=2, # arrange horizontally
prop={'size': 15}) # set font size
# get rid of the axis lines
for d in ["left", "top", "bottom", "right"]:
plt.gca().spines[d].set_visible(False)
# set background color *inside the figure*
plt.gca().set_facecolor(color=background_color)
# add horizontal grid
plt.gca().grid(False)
plt.gca().grid(axis='y', color='black')
Above is a plot we created ourselves!
We can also dig into other statistics, such as skin temperature, taking advantage of that extra sensor. In fact, here we’ll actually plot something you simply cannot see in the phone app nor web app! On the app, you can only see last night’s skin temperature, but not previous night’s skin temperature. Here, we’ll plot the skin temperature the same week’s sleep cycles.
[ ]:
#@title Skin Temperature Plot
start_date = "2022-04-28" #@param {type:"date"}
start_idx = np.where(metrics_df.day == start_date)[0][0]
with plt.style.context('dark_background'):
#start_idx = 2
temps = metrics_df['SKIN_TEMPERATURE_FAHRENHEIT.current_value'].iloc[start_idx:start_idx+7]
day = metrics_df.day.iloc[start_idx:start_idx+7]
plt.figure(figsize=(12,8))
plt.plot(day, temps, marker='o', markerfacecolor='black',
markeredgewidth=1, markersize=10, linewidth=3)
plt.title('Skin Temperature', fontsize=30)
plt.ylabel('Temperature', fontsize=20)
plt.xlabel('Day', fontsize=20)
plt.yticks(fontsize=15)
plt.xticks(ticks=day, labels=['-'.join(d.split('-')[1:]) for d in day], fontsize=15)
4.2 Plot Heart Rate§
Now let’s dig a little deeper and look at our raw heart rate data! Let’s get all of our heart rate data in the week 04-28 to 05-05. Feel free to change this to some other week of your choosing!
Also, specify the timezone for which you were using this device (see this list for exact strings, under the column TZ database name).
[ ]:
#@title Set date range and timezone
start = "2022-04-28" #@param {type:"date"}
end = "2022-05-05" #@param {type:"date"}
timezone = "US/Pacific" #@param {type:"string"}
params = {
'start': f'{start}T00:00:00.000Z',
'end': f'{end}T00:00:00.000Z'
}
We’ll use our WhoopUser class that we already initialized to extract the heart rate data first.
[ ]:
hr_df = user.get_heart_rate_df(params=params, timezone=timezone)
/content/whoop_user.py:223: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead
hr_df_syn = pd.DataFrame(pd.np.empty((N, 2)) * pd.np.nan, columns=['heart_rate', 'timestamp'])
100%|██████████| 275000/275000 [00:17<00:00, 16070.07it/s]
43it [00:00, 294.60it/s]
100%|██████████| 275000/275000 [00:13<00:00, 20977.43it/s]
Let’s see what hr_df contains in its raw form.
[ ]:
hr_df
| heart_rate | timestamp | |
|---|---|---|
| 24686 | 71 | 2022-04-27 17:00:02-07:00 |
| 24687 | 62 | 2022-04-27 17:00:09-07:00 |
| 24688 | 98 | 2022-04-27 17:00:16-07:00 |
| 24689 | 72 | 2022-04-27 17:00:23-07:00 |
| 24690 | 59 | 2022-04-27 17:00:30-07:00 |
| ... | ... | ... |
| 111081 | 78 | 2022-05-04 16:59:27-07:00 |
| 111082 | 89 | 2022-05-04 16:59:34-07:00 |
| 111083 | 78 | 2022-05-04 16:59:41-07:00 |
| 111084 | 73 | 2022-05-04 16:59:48-07:00 |
| 111085 | 86 | 2022-05-04 16:59:55-07:00 |
86400 rows × 2 columns
Now let’s try to reproduce the heart rate line graph from the app, as shown below.

Above is a plot from the app
First, to satiate our curiosity, let’s just see what the heart rate graph looks like for the entire week…
[ ]:
with plt.style.context('seaborn-darkgrid'):
plt.figure(figsize=(18,8))
plt.title('Heart Rate', fontsize=20)
plt.plot(hr_df.timestamp, hr_df.heart_rate, linewidth=0.5, color='dimgrey')
plt.fill_between(hr_df.timestamp, hr_df.heart_rate, color='grey', alpha=0.3)
plt.ylim(0, 180)
plt.ylabel('Heart Rate', fontsize=15)
plt.xlabel('Time', fontsize=15)
Great! We can see some periodic patterns arise! Let’s do some simple Gaussian smoothing to try to see what these patterns really are. We use the `gaussian_filter method <https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.gaussian_filter.html>`__ from SciPy for this.
[ ]:
stdev_minutes = 30
# multiply by 60 to get stdev_seconds, then divide by 6 to get
# this in units of measurements (once every six seconds)
stdev = stdev_minutes * 60 / 6
with plt.style.context('fivethirtyeight'):
plt.figure(figsize=(18,8))
plt.title('Heart Rate', fontsize=20)
plt.plot(hr_df.timestamp, hr_df.heart_rate, linewidth=0.5, color='dimgrey', label='Raw')
plt.fill_between(hr_df.timestamp, hr_df.heart_rate, color='grey', alpha=0.3)
plt.plot(hr_df.timestamp, gaussian_filter(np.array(hr_df.heart_rate), sigma=stdev), linewidth=3, color='black', label='Smoothed')
plt.ylabel('Heart Rate', fontsize=15)
plt.xlabel('Time', fontsize=15)
plt.legend(prop={'size': 20})
Now let’s get back to trying to reproduce the specific plot we see in the app. While the plotting code is long (which is necessary to reproduce as close as possible), it has lots of comments to guide you through what each component does, if you want to adapt it for your own usage. Regardless, here are some high-level pointers:
We define gaps (when the user takes off the device) as any space between measurements that exceeds 12 seconds. To find the gaps, we compute the consecutive pairwise differences between measurement timestamps, and find where the difference exceeds 12 seconds.
plt.fill_betweenandplt.plotare ordered in such a way that subsequent ones are overlayed ontop of the previous ones, allowing us to worry less about iterating through in the right way from left to right and instead just plotting waking hours, sleep hours, and gaps completely separately.For some weird reason, when requesting cycles or sleep data, the API does not care about time, only date. Since both sleeps are on the same day (in the UTC timezone), we just have to request one day of sleep information, and the time is arbitrary for both parameters fed into
user.get_sleeps_df().
[ ]:
#@title Reproduce 24 hour heart rate plot
from matplotlib.ticker import FormatStrFormatter
from dateutil import tz
import datetime
# measurements are taken every six seconds
HEART_RATE_RECORDING_LENGTH = 6
day_num = 1
params_day = {
'start': f'2022-05-{str(day_num+1).zfill(2)} T00:00:00.000Z',
'end': f'2022-05-{str(day_num+1).zfill(2)} T23:00:00.000Z' # arbitrary time choice
}
sleeps = user.get_sleeps_df(params=params_day, timezone='US/Pacific')
from datetime import datetime
stdev_minutes = .5
# multiply by 60 to get stdev_seconds, then divide by 6 to get
# this in units of measurements (once every six seconds)
stdev = stdev_minutes * 60 / 6
with plt.style.context('seaborn-darkgrid'):
# get the start and end times as timestamps by using the datetime library
start_ts = datetime.strptime(f'2022-05-{str(day_num).zfill(2)} 21:50:00-07:00', '%Y-%m-%d %H:%M:%S%z').timestamp()
end_ts = datetime.strptime(f'2022-05-{str(day_num+1).zfill(2)} 21:50:00-07:00', '%Y-%m-%d %H:%M:%S%z').timestamp()
# now find the indices in hr_df.timestamp that match as closely as possible
start_idx = np.argmin(np.abs(hr_df.timestamp.apply(lambda x: x.timestamp()) - start_ts))
end_idx = np.argmin(np.abs(hr_df.timestamp.apply(lambda x: x.timestamp()) - end_ts))
x = np.array(hr_df.timestamp.iloc[start_idx:end_idx])
y = hr_df.heart_rate.iloc[start_idx:end_idx]
# make it not as bumpy with gaussian filter
y = gaussian_filter(y, sigma=stdev)
plt.figure(figsize=(22,8), facecolor='white')
x_timestamp = np.array([x_.timestamp() for x_ in x])
# get the gaps. we include [6] as well because when you do np.diff,
# it actually leaves out exactly one element
differences = np.concatenate((np.diff(x_timestamp), [6]))
# interpret a gap (i.e. when a user takes off the device for some prolonged
# period of time) as any two measurements that are taken more than
# 6 * 2 = 12 seconds apart, to account for minor variations around 6s
gap_idxes = np.where(differences > HEART_RATE_RECORDING_LENGTH * 2)[0]
# get the sleeps
sleep_idxes = []
for lower, upper in zip(sleeps.time_lower_bound, sleeps.time_upper_bound):
# get the location in the timestamp array that is closest to `lower`
lower_idx = np.argmin(np.abs((x_timestamp - lower.timestamp()) - 0))
# get the location in the timestamp array that is closest to `upper`
upper_idx = np.argmin(np.abs((x_timestamp - upper.timestamp()) - 0))
sleep_idxes.append((lower_idx, upper_idx))
# fill in the area under the curve with different colors depending
# on whether it is a gap or a sleep
# first, we just plot the entire thing
plt.plot(x, y, linewidth=0.75, color='#d1d1d1')
plt.fill_between(x, y, color='#e3e3e3', alpha=0.3)
# now we overlay sleeps
for sleep_start, sleep_end in sleep_idxes:
plt.plot(x[sleep_start:sleep_end], y[sleep_start:sleep_end], linewidth=0.75, color='#3f8fc5')
if sleep_start >= sleep_end:
continue
plt.fill_between(x[sleep_start:sleep_end], y[sleep_start:sleep_end], color='#eaf1f6', alpha=0.3)
# now we overlay gaps by overlaying with white
for gap_idx in gap_idxes:
plt.plot(x[gap_idx-2:gap_idx+2], y[gap_idx-2:gap_idx+2], linewidth=1, color='white')
plt.fill_between(x[gap_idx-2:gap_idx+2], y[gap_idx-100:gap_idx+100].max(), color='white')
plt.ylim(0, 225)
plt.xlim(x[0] - pd.Timedelta(minutes=30), x[-1])
# set the x ticks to be every 4 hours, just like the app's plot
datetimes = [
datetime.strptime(f'2022-05-{str(day_num).zfill(2)} 22:00:00-0700', '%Y-%m-%d %H:%M:%S%z'),
datetime.strptime(f'2022-05-{str(day_num+1).zfill(2)} 02:00:00-0700', '%Y-%m-%d %H:%M:%S%z'),
datetime.strptime(f'2022-05-{str(day_num+1).zfill(2)} 06:00:00-0700', '%Y-%m-%d %H:%M:%S%z'),
datetime.strptime(f'2022-05-{str(day_num+1).zfill(2)} 10:00:00-0700', '%Y-%m-%d %H:%M:%S%z'),
datetime.strptime(f'2022-05-{str(day_num+1).zfill(2)} 14:00:00-0700', '%Y-%m-%d %H:%M:%S%z'),
datetime.strptime(f'2022-05-{str(day_num+1).zfill(2)} 18:00:00-0700', '%Y-%m-%d %H:%M:%S%z')
]
plt.xticks(ticks=datetimes,
labels=['10:00pm', '2:00am', '6:00am', '10:00am', '2:00pm', '6:00pm'],
fontsize=12,
color='grey')
# ensure that the bottom labels are padded
plt.tick_params(
axis='x', # changes apply to the x-axis
which='both', # both major and minor ticks are affected
bottom=False, # ticks along the bottom edge are off
pad=15)
# set y ticks so that they are major and minor, so that we can have gridlines
# on a larger interval
plt.gca().set_yticks(np.arange(0, 250, 25)[::2], minor=False)
plt.gca().set_yticks(np.arange(0, 250, 25)[1::2], minor=True)
plt.gca().yaxis.set_minor_formatter(FormatStrFormatter("%d"))
# get rid of the ticks for the y-axis
plt.tick_params(
axis='y', # changes apply to the x-axis
which='minor', # both major and minor ticks are affected
bottom=False, # ticks along the bottom edge are off
labelcolor='grey')
# set background color *inside the figure*
plt.gca().set_facecolor(color='white')
# add horizontal grid
plt.gca().grid(axis='x', color='#e5e5e5')
plt.gca().grid(axis='y', color='#e5e5e5', which='major')
# ensure the bottom axis is slightly darker and extends all the way
# to the left
plt.gca().spines['bottom'].set_edgecolor('grey')
plt.gca().spines['bottom'].set_linewidth(1)
plt.gca().spines['bottom'].set_visible(True)
Above is a plot we created ourselves!
As you can see, we are able to reproduce not only just the heart rate over time, but also when the heart rate is not recorded, and also when the user goes to sleep, either for a night sleep or for a nap.
5. Data Analysis§
Data isn’t much without some analysis, so we’re going to do some in this section.
DISCLAIMER: the analyses below may not be 100% biologically or scientifically grounded; the code is here to assist in your process, if you are interested in asking these kinds of questions.
5.1 Heart rate vs. sleep period length§
Maybe the heart rate is correlated with how long a particular sleep period was. Let’s see if this hypothesis is true. First, we get all sleeps. We just give a time period from 2020 until May 29nd, 2022, because that’ll catch everything for this user up until the time this notebook and the associated analyses below were run (so that our analysis results stay the same).
[ ]:
#@title Set date range and timezone
start = "2020-01-01" #@param {type:"date"}
end = "2022-05-22" #@param {type:"date"}
timezone = "US/Pacific" #@param {type:"string"}
params_all = {
'start': f'{start}T00:00:00.000Z',
'end': f'{end}T00:00:00.000Z'
}
[ ]:
sleeps = user.get_sleeps_df(params=params_all, timezone=timezone)
sleeps
| cycle_id | sleep_id | cycles_count | disturbance_count | time_upper_bound | time_lower_bound | is_nap | in_bed_duration | light_sleep_duration | latency_duration | no_data_duration | rem_sleep_duration | respiratory_rate | sleep_score | sleep_efficiency | sleep_consistency | sws_duration | wake_duration | quality_duration | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 786 | 831 | 1 | 14 | 2022-04-26 09:17:57-07:00 | 2022-04-26 01:00:00-07:00 | False | 8.299187 | 4.149593 | 0 | 0 | 0.829919 | 16.034276 | 50 | 0.9 | 0.374207 | 2.489756 | 0.829919 | 7.469268 |
| 1 | 499 | 877 | 0 | 14 | 2022-04-26 15:28:26-07:00 | 2022-04-26 15:00:00-07:00 | True | 0.474137 | 0.237069 | 0 | 0 | 0.047414 | 15.175261 | 69 | 0.9 | 0.373756 | 0.142241 | 0.047414 | 0.426724 |
| 2 | 930 | 331 | 0 | 7 | 2022-04-27 09:27:38-07:00 | 2022-04-27 01:00:00-07:00 | False | 8.460614 | 4.230307 | 0 | 0 | 0.846061 | 15.849114 | 43 | 0.9 | 0.352968 | 2.538184 | 0.846061 | 7.614553 |
| 3 | 217 | 598 | 1 | 20 | 2022-04-27 15:53:37-07:00 | 2022-04-27 15:00:00-07:00 | True | 0.893689 | 0.446844 | 0 | 0 | 0.089369 | 14.577023 | 42 | 0.9 | 0.544819 | 0.268107 | 0.089369 | 0.804320 |
| 4 | 803 | 117 | 2 | 7 | 2022-04-28 08:50:54-07:00 | 2022-04-28 01:00:00-07:00 | False | 7.848470 | 3.924235 | 0 | 0 | 0.784847 | 14.036402 | 61 | 0.9 | 0.383946 | 2.354541 | 0.784847 | 7.063623 |
| 5 | 69 | 365 | 0 | 23 | 2022-04-28 15:47:20-07:00 | 2022-04-28 15:00:00-07:00 | True | 0.789084 | 0.394542 | 0 | 0 | 0.078908 | 15.380196 | 44 | 0.9 | 0.425368 | 0.236725 | 0.078908 | 0.710176 |
| 6 | 810 | 354 | 3 | 9 | 2022-04-29 09:53:54-07:00 | 2022-04-29 01:00:00-07:00 | False | 8.898347 | 4.449174 | 0 | 0 | 0.889835 | 14.627758 | 89 | 0.9 | 0.609418 | 2.669504 | 0.889835 | 8.008512 |
| 7 | 98 | 51 | 2 | 12 | 2022-04-29 15:59:27-07:00 | 2022-04-29 15:00:00-07:00 | True | 0.990975 | 0.495487 | 0 | 0 | 0.099097 | 15.436805 | 35 | 0.9 | 0.809198 | 0.297292 | 0.099097 | 0.891877 |
| 8 | 983 | 848 | 2 | 11 | 2022-04-30 10:07:15-07:00 | 2022-04-30 01:00:00-07:00 | False | 9.120842 | 4.560421 | 0 | 0 | 0.912084 | 16.163113 | 98 | 0.9 | 0.355837 | 2.736253 | 0.912084 | 8.208758 |
| 9 | 78 | 6 | 2 | 13 | 2022-04-30 16:36:27-07:00 | 2022-04-30 15:00:00-07:00 | True | 1.607619 | 0.803809 | 0 | 0 | 0.160762 | 15.308161 | 68 | 0.9 | 0.252681 | 0.482286 | 0.160762 | 1.446857 |
| 10 | 775 | 454 | 3 | 14 | 2022-05-01 10:04:16-07:00 | 2022-05-01 01:00:00-07:00 | False | 9.071268 | 4.535634 | 0 | 0 | 0.907127 | 14.652452 | 65 | 0.9 | 0.862854 | 2.721380 | 0.907127 | 8.164141 |
| 11 | 358 | 563 | 0 | 12 | 2022-05-01 15:19:30-07:00 | 2022-05-01 15:00:00-07:00 | True | 0.325271 | 0.162635 | 0 | 0 | 0.032527 | 14.424556 | 34 | 0.9 | 0.474888 | 0.097581 | 0.032527 | 0.292744 |
| 12 | 620 | 729 | 1 | 18 | 2022-05-02 09:22:25-07:00 | 2022-05-02 01:00:00-07:00 | False | 8.373860 | 4.186930 | 0 | 0 | 0.837386 | 15.970099 | 74 | 0.9 | 0.497271 | 2.512158 | 0.837386 | 7.536474 |
| 13 | 924 | 941 | 1 | 14 | 2022-05-02 15:47:45-07:00 | 2022-05-02 15:00:00-07:00 | True | 0.796051 | 0.398026 | 0 | 0 | 0.079605 | 15.322483 | 97 | 0.9 | 0.516666 | 0.238815 | 0.079605 | 0.716446 |
| 14 | 869 | 839 | 1 | 7 | 2022-05-03 07:59:17-07:00 | 2022-05-03 01:00:00-07:00 | False | 6.988255 | 3.494128 | 0 | 0 | 0.698826 | 14.770084 | 32 | 0.9 | 0.595472 | 2.096477 | 0.698826 | 6.289430 |
| 15 | 168 | 368 | 0 | 14 | 2022-05-03 15:41:16-07:00 | 2022-05-03 15:00:00-07:00 | True | 0.687900 | 0.343950 | 0 | 0 | 0.068790 | 14.952369 | 30 | 0.9 | 0.676832 | 0.206370 | 0.068790 | 0.619110 |
| 16 | 293 | 996 | 1 | 11 | 2022-05-04 08:50:26-07:00 | 2022-05-04 01:00:00-07:00 | False | 7.840609 | 3.920304 | 0 | 0 | 0.784061 | 16.805819 | 54 | 0.9 | 0.885807 | 2.352183 | 0.784061 | 7.056548 |
| 17 | 446 | 652 | 1 | 9 | 2022-05-04 16:05:48-07:00 | 2022-05-04 15:00:00-07:00 | True | 1.096686 | 0.548343 | 0 | 0 | 0.109669 | 15.304242 | 76 | 0.9 | 0.394750 | 0.329006 | 0.109669 | 0.987017 |
| 18 | 575 | 376 | 0 | 16 | 2022-05-05 08:25:51-07:00 | 2022-05-05 01:00:00-07:00 | False | 7.430997 | 3.715498 | 0 | 0 | 0.743100 | 15.386410 | 4 | 0.9 | 0.674037 | 2.229299 | 0.743100 | 6.687897 |
| 19 | 444 | 566 | 1 | 14 | 2022-05-05 17:21:47-07:00 | 2022-05-05 15:00:00-07:00 | True | 2.363327 | 1.181663 | 0 | 0 | 0.236333 | 14.310409 | 31 | 0.9 | 0.218315 | 0.708998 | 0.236333 | 2.126994 |
| 20 | 607 | 193 | 0 | 14 | 2022-05-06 08:15:01-07:00 | 2022-05-06 01:00:00-07:00 | False | 7.250415 | 3.625207 | 0 | 0 | 0.725041 | 15.779541 | 0 | 0.9 | 0.901105 | 2.175124 | 0.725041 | 6.525373 |
| 21 | 685 | 647 | 1 | 11 | 2022-05-06 15:28:59-07:00 | 2022-05-06 15:00:00-07:00 | True | 0.483293 | 0.241646 | 0 | 0 | 0.048329 | 13.843893 | 7 | 0.9 | 0.641838 | 0.144988 | 0.048329 | 0.434964 |
| 22 | 513 | 113 | 2 | 13 | 2022-05-07 09:09:54-07:00 | 2022-05-07 01:00:00-07:00 | False | 8.165274 | 4.082637 | 0 | 0 | 0.816527 | 15.501172 | 90 | 0.9 | 0.462592 | 2.449582 | 0.816527 | 7.348746 |
| 23 | 797 | 919 | 1 | 9 | 2022-05-07 15:40:09-07:00 | 2022-05-07 15:00:00-07:00 | True | 0.669218 | 0.334609 | 0 | 0 | 0.066922 | 13.270048 | 88 | 0.9 | 0.827408 | 0.200765 | 0.066922 | 0.602296 |
| 24 | 926 | 679 | 0 | 12 | 2022-05-08 09:27:00-07:00 | 2022-05-08 01:00:00-07:00 | False | 8.450036 | 4.225018 | 0 | 0 | 0.845004 | 12.825569 | 33 | 0.9 | 0.236947 | 2.535011 | 0.845004 | 7.605033 |
| 25 | 465 | 290 | 1 | 15 | 2022-05-08 15:46:39-07:00 | 2022-05-08 15:00:00-07:00 | True | 0.777598 | 0.388799 | 0 | 0 | 0.077760 | 12.898602 | 0 | 0.9 | 0.968547 | 0.233279 | 0.077760 | 0.699838 |
| 26 | 427 | 977 | 0 | 11 | 2022-05-09 07:14:30-07:00 | 2022-05-09 01:00:00-07:00 | False | 6.241848 | 3.120924 | 0 | 0 | 0.624185 | 14.795094 | 78 | 0.9 | 0.716821 | 1.872554 | 0.624185 | 5.617663 |
| 27 | 151 | 324 | 1 | 15 | 2022-05-09 16:35:34-07:00 | 2022-05-09 15:00:00-07:00 | True | 1.592799 | 0.796399 | 0 | 0 | 0.159280 | 15.960457 | 7 | 0.9 | 0.136361 | 0.477840 | 0.159280 | 1.433519 |
| 28 | 236 | 107 | 0 | 11 | 2022-05-10 08:14:47-07:00 | 2022-05-10 01:00:00-07:00 | False | 7.246474 | 3.623237 | 0 | 0 | 0.724647 | 14.955512 | 14 | 0.9 | 0.938953 | 2.173942 | 0.724647 | 6.521827 |
| 29 | 327 | 554 | 1 | 10 | 2022-05-10 15:44:41-07:00 | 2022-05-10 15:00:00-07:00 | True | 0.744898 | 0.372449 | 0 | 0 | 0.074490 | 15.364963 | 47 | 0.9 | 0.604165 | 0.223469 | 0.074490 | 0.670408 |
| 30 | 721 | 850 | 0 | 11 | 2022-05-11 07:49:22-07:00 | 2022-05-11 01:00:00-07:00 | False | 6.822928 | 3.411464 | 0 | 0 | 0.682293 | 14.783668 | 98 | 0.9 | 0.719355 | 2.046879 | 0.682293 | 6.140636 |
| 31 | 566 | 369 | 2 | 18 | 2022-05-11 15:12:00-07:00 | 2022-05-11 15:00:00-07:00 | True | 0.200000 | 0.100000 | 0 | 0 | 0.020000 | 15.595256 | 69 | 0.9 | 0.218327 | 0.060000 | 0.020000 | 0.180000 |
| 32 | 548 | 944 | 0 | 16 | 2022-05-12 07:06:42-07:00 | 2022-05-12 01:00:00-07:00 | False | 6.111775 | 3.055888 | 0 | 0 | 0.611178 | 14.139506 | 79 | 0.9 | 0.926083 | 1.833533 | 0.611178 | 5.500598 |
| 33 | 669 | 534 | 0 | 5 | 2022-05-12 16:00:59-07:00 | 2022-05-12 15:00:00-07:00 | True | 1.016515 | 0.508258 | 0 | 0 | 0.101652 | 15.620259 | 7 | 0.9 | 0.967963 | 0.304955 | 0.101652 | 0.914864 |
| 34 | 709 | 979 | 2 | 9 | 2022-05-13 09:02:03-07:00 | 2022-05-13 01:00:00-07:00 | False | 8.034359 | 4.017179 | 0 | 0 | 0.803436 | 15.069961 | 93 | 0.9 | 0.786687 | 2.410308 | 0.803436 | 7.230923 |
| 35 | 142 | 538 | 0 | 13 | 2022-05-13 15:37:05-07:00 | 2022-05-13 15:00:00-07:00 | True | 0.618167 | 0.309084 | 0 | 0 | 0.061817 | 14.863122 | 57 | 0.9 | 0.163287 | 0.185450 | 0.061817 | 0.556351 |
| 36 | 171 | 572 | 2 | 8 | 2022-05-14 08:52:12-07:00 | 2022-05-14 01:00:00-07:00 | False | 7.870137 | 3.935069 | 0 | 0 | 0.787014 | 14.333240 | 38 | 0.9 | 0.741364 | 2.361041 | 0.787014 | 7.083123 |
| 37 | 396 | 796 | 1 | 20 | 2022-05-14 16:04:40-07:00 | 2022-05-14 15:00:00-07:00 | True | 1.077970 | 0.538985 | 0 | 0 | 0.107797 | 15.347927 | 55 | 0.9 | 0.614615 | 0.323391 | 0.107797 | 0.970173 |
| 38 | 855 | 235 | 0 | 13 | 2022-05-15 10:03:46-07:00 | 2022-05-15 01:00:00-07:00 | False | 9.062858 | 4.531429 | 0 | 0 | 0.906286 | 14.381761 | 33 | 0.9 | 0.865710 | 2.718857 | 0.906286 | 8.156572 |
| 39 | 203 | 23 | 0 | 15 | 2022-05-15 15:22:28-07:00 | 2022-05-15 15:00:00-07:00 | True | 0.374689 | 0.187344 | 0 | 0 | 0.037469 | 14.527462 | 41 | 0.9 | 0.456201 | 0.112407 | 0.037469 | 0.337220 |
| 40 | 392 | 938 | 1 | 12 | 2022-05-16 09:11:34-07:00 | 2022-05-16 01:00:00-07:00 | False | 8.192975 | 4.096488 | 0 | 0 | 0.819298 | 16.119079 | 19 | 0.9 | 0.959654 | 2.457893 | 0.819298 | 7.373678 |
| 41 | 139 | 373 | 1 | 10 | 2022-05-16 16:17:13-07:00 | 2022-05-16 15:00:00-07:00 | True | 1.287147 | 0.643573 | 0 | 0 | 0.128715 | 15.077836 | 19 | 0.9 | 0.998704 | 0.386144 | 0.128715 | 1.158432 |
| 42 | 582 | 687 | 0 | 18 | 2022-05-17 08:21:22-07:00 | 2022-05-17 01:00:00-07:00 | False | 7.356141 | 3.678071 | 0 | 0 | 0.735614 | 13.780315 | 54 | 0.9 | 0.401078 | 2.206842 | 0.735614 | 6.620527 |
Then, let’s get the heart rate over time for each sleep. Note that requesting all of the heart rate data will take just a bit of time, depending on how many weeks we have to request data for. We’ll use the verbose parameter to ensure that we maintain our sanity as we wait for it to complete.
[ ]:
hr_df = user.get_heart_rate_df(params_all, timezone='US/Pacific', verbose=True)
100%|██████████| 275000/275000 [00:13<00:00, 20516.82it/s]
Now let’s take this raw heart rate data over the course of several weeks and analyze it against sleep.
[ ]:
heart_rates = []
lengths = []
for lower, upper in zip(sleeps.time_lower_bound, sleeps.time_upper_bound):
heart_rate = np.array(hr_df[np.logical_and(lower < hr_df.timestamp, hr_df.timestamp < upper)].heart_rate)
length = (upper - lower).seconds / 3600 # hours
heart_rates.append(heart_rate)
lengths.append(length)
sleeps['length_in_hours'] = np.array(lengths)
sleeps['median_heart_rate'] = np.array([np.median(hr) for hr in heart_rates]) # median heart rates
Let’s make a quick plot to get some intuition. Here we just use seaborn, as it’s very quick to get beautiful plots out with minimal effort.
[ ]:
p = sns.jointplot(x='length_in_hours', y='median_heart_rate', data=sleeps, kind='reg')
As we can see from the scatterplot above, it looks like there might be a correlation there. Let’s compute \(R^2\) just to see exactly how correlated.
We’ll follow this documentation and perform a linear regression to obtain the coefficient of determination.
[ ]:
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(sleeps['length_in_hours'], sleeps['median_heart_rate'])
print(f'Slope: {slope:.3g}')
print(f'Coefficient of determination: {r_value**2:.3g}')
print(f'p-value: {p_value:.3g}')
Slope: -2.75
Coefficient of determination: 0.957
p-value: 1.4e-29
We also see that the p-value, which is determined by scipy to be the two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero, is significant ($<$0.05).
So given this evidence from this particular data, maybe length of a sleep period is correlated with your heart rate.
However, let’s suppose there was an anomaly in the heart rate (such as due to a measurement error or device failure) that skewed our results, such as below:
[ ]:
sleeps.loc[1, ['median_heart_rate']] = 800
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(sleeps['length_in_hours'], sleeps['median_heart_rate'])
print(f'Slope: {slope:.3g}')
print(f'Coefficient of determination: {r_value**2:.3g}')
print(f'p-value: {p_value:.3g}')
Slope: -8.17
Coefficient of determination: 0.0653
p-value: 0.098
We see that if we perform the same p-value analysis as done before, we get that the p-value is not significant.
Let’s implement a system that assumes the data was sampled according to a multivariate normal distribution to automatically detect and remove outliers like this, based on the method described here.
Warning: anomaly/outlier detection is done just to show an example of data processing that can be done and may hide genuine values that have clinical significance. Use at your own risk.
[ ]:
from scipy.stats import multivariate_normal
#calculate the covariance matrix
data = np.stack((sleeps['length_in_hours'],sleeps['median_heart_rate']),axis=0)
covariance_matrix = np.cov(data)
#calculating the mean
mean_values = [np.mean(sleeps['length_in_hours']),np.mean(sleeps['median_heart_rate'])]
#multivariate normal distribution
model = multivariate_normal(cov=covariance_matrix,mean=mean_values)
data = np.stack((sleeps['length_in_hours'],sleeps['median_heart_rate']),axis=1)
#finding the outliers
#any point with a probability lower than the threshold value is considered an outlier and removed
threshold = 1.0e-07
outlier = model.pdf(data).reshape(-1) < threshold
newData=data
outlierValues=[]
for boolean,i in enumerate(outlier):
if i == True:
print(data[boolean]," is an Outlier")
print(np.where(data==(data[boolean])[0])[0].item(0))
#delete outliers
newData=np.delete(newData,np.where(newData==(data[boolean])[0])[0].item(0),axis=0)
#plot new graph with outliers removed
#newData[:,0] correspond to the new lengths, and newData[:,1] correspond to the new values
p = sns.jointplot(x=newData[:,0], y=newData[:,1], kind='reg')
[4.73888889e-01 8.00000000e+02] is an Outlier
1
Now let’s perform a linear regression again to see how this affects our results.
[ ]:
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(newData[:,0], [np.median(hr) for hr in newData[:,1]])
print(f'Slope: {slope:.3g}')
print(f'Coefficient of determination: {r_value**2:.3g}')
print(f'p-value: {p_value:.3g}')
Slope: -2.73
Coefficient of determination: 0.956
p-value: 1.03e-28
With the outlier removed, we can see that our results are return to statistical significance!
5.2: Heart rate vs. nap or sleep§
Now we’ll look at whether the median heart rate changes depending on whether you are napping or sleeping.
We’ve already extracted the heart rate time series data for each sleep, so now all we need to do is just get is_nap column.
[ ]:
plt.figure(figsize=(14,9))
sns.set_style('darkgrid')
# just return the sleeps dataframe back to normal
sleeps.loc[1, ['median_heart_rate']] = 62
sns.stripplot(x='is_nap',y='median_heart_rate', data=sleeps)
<matplotlib.axes._subplots.AxesSubplot at 0x7f2424ef8a10>
In the plot above, we do see that the heart rate does seem to vary a bit depending on whether the user is napping or not. Here, we see that when the the sleep is marked as a nap (whether by the user or automatic detection, depending on when it occurs during the day), we tend to get a median heart rate higher than otherwise.
Note: WHOOP calculates hours of sleep without including naps, while sleep need is calculated with naps taken into account.
Let’s do a T-test to see if the difference in heart rate is significant.
[ ]:
result = stats.ttest_ind(sleeps.median_heart_rate[sleeps.is_nap == True],
sleeps.median_heart_rate[sleeps.is_nap == False])
print(f'P-value is {result.pvalue:.3g}')
P-value is 2.94e-43
Looks significant (<0.05)!