Guide to Extracting Data w/ APIs from the Dreem Headband§

0173ebc4d8f44768900d52894b8deb88

Why dream when you can dreem? Introducing the Dreem 2, a headband that is capable of providing not only raw, high frequency EEG measurements, but also sleep stage predictions throughout the night. In this notebook, we’ll be extracting data from a Dreem headband, visualizing its raw EEG signals and its sleep stage predictions, and then analyzing this data.

Note that this notebook assumes that you are conducting a clinical trial, and have registered this through Dreem.

If you want to know more about the Dreem 2 headband, see the README for a detailed analysis of performances, sensors, data privacy, and extraction pipelines.

TODO: FIX GITHUB README LINK

While Dreem does not support an official API, as stated on the site (“do not have a public or private API for the moment”), we reverse-engineer the API implicitly used when visiting https://dreem-viewer.rythm.co/ and viewing clinical trial health records, allowing you to programmatically extract all of your clinical trial data at once (instead of having to click through a web interface).

Overall, the Dreem API makes the following values available:

Parameter Name

Sampling Frequency

Sleep stage classification (S0/1/2/3, MT, REM)

every 30 seconds

7 separate raw EEG signals

250Hz

7 separate filtered EEG signals

250Hz

Raw accelerometer (x, y, z, norm)

50Hz

Filtered accelerometer (x, y, z)

50Hz

Positiongram

10Hz

Note that when we say “filtered” the EEG signal is filtered via a Butterworth bandpass filter of order 2 (0.4-35Hz) and the accelerometer signal is filtered via a Butterworth bandpass filter of order 2 (0.1-0.5Hz), see this official article and this scientific article for more.

In this guide, we sequentially cover the following five topics to extract from the unofficial Dreem API: 1. Set up - We include some details about what you’d be expected to do before running this notebook. 2. Authentication/Authorization - This is seamless, only requiring the clinical trial admin username and password. 3. Data extraction - Extraction is simple, requiring only clinical trial admin email and password. We provide some code to interface with the backend API. 4. Data visualization - 4.1: We visualize the hypnogram for the given sleep session. - 4.2: We also visualize an EEG signal within a time interval of your choice. - 4.3: We also visualize the variance of an EEG signal within a short time interval (1 second) over different sleep stages. 5. Data analysis - 5.1: We check whether the change in the variance of a representative EEG signal within a short time interval (1 second) is statistically significant between a wake stage vs. a sleep stage. - 5.2: We check whether the change in the norm of the acceleration within a short time interval (1 second) is statistically significant between a wake stage vs. a sleep stage.

1. Set Up§

TODO: explain this. We aren’t really sure because we didn’t have to do this part. Maybe we can reach out to Dalia soon to get an overview?

2. Authentication/Authorization§

Due to confidentiality requirements, we cannot share our own login credentials (though you can use your own, of course).

Note that the website you would normally log into is https://dreem-viewer.rythm.co, so to get a more user-friendly table of records you can inspect in your browser, visit that link.

[ ]:
#@title Enter login credentials
email = "" #@param {type:"string"}
password = "" #@param {type:"string"}

import base64
import json
import requests

authorization_str = 'Basic ' + base64.b64encode(bytes(email + ':' + password, 'utf-8')).decode()

auth_dict = json.loads(requests.post('https://login.rythm.co/token/', headers={'Authorization': authorization_str}).text)

3. Data Extraction§

Now we extract data from the API.

First, we get all available users. This gives us a mapping between usernames and user id’s.

[ ]:
#@title Get available users

import pprint

def get_aux_records(auth_dict):
    url = 'https://api.rythm.co/v1/dreem/algorythm/restricted_list/9f38863b-8180-4dd4-81cf-9395800ae0cc/record/?limit=100&offset=0'

    headers = {'Authorization': 'Bearer ' + auth_dict['token']}

    out = requests.get(url, headers=headers)

    out_dict = json.loads(out.text)

    return out_dict

def get_user2id_mapping(auth_dict, user_ids):
    url = 'https://api.rythm.co/v1/dreem/dreemer/dreemer/details/'
    headers = {'Authorization': 'Bearer ' + auth_dict['token'],
            'Content-Type': 'application/json'}

    payload = {
        'id': [user_id for user_id in user_ids]
    }

    out = requests.post(url, headers=headers, data='{"id":["ce73192e-874c-4576-a2e3-27b0dc0ebcee","1222a474-bc02-44ab-b4c5-d66dc34620b7","e1ec95b0-b71b-4499-b73c-cf9b1e3576c2"]}').text

    return {x['pseudo']: x['dreemer'] for x in json.loads(out)}

# we care about aux_records because:
# (1) it contains the user_ids
# (2) it contains a mapping between references and record id's
aux_records = get_aux_records(auth_dict)

# (1) here
user_ids = set([result['user'] for result in aux_records['results']])

map = get_user2id_mapping(auth_dict, user_ids)

print('Available users:')
pprint.pprint(map)
Available users:
{'61843_test01@dreem.com': 'e1ec95b0-b71b-4499-b73c-cf9b1e3576c2',
 '61843_test02@dreem.com': '1222a474-bc02-44ab-b4c5-d66dc34620b7',
 '61843_test03@dreem.com': 'ce73192e-874c-4576-a2e3-27b0dc0ebcee'}

Next, we filter to only download and display records from a particular user. At the end, we have a Pandas dataframe called records_df that contains the raw data behind the table you normally see through the admin portal.

813b233ae63845909f4f8fe4af16419c

[ ]:
#@title Filter to display only records from a specific user
username = "61843_test03@dreem.com" #@param {type:"string"}

uname_filter = username

uid = [uid for uname, uid in map.items() if uname == uname_filter][0]

# filter to only include this username
filtered_aux_records = [record for record in aux_records['results'] if record['user'] == uid]

record_ids = [record['id'] for record in filtered_aux_records]

import pandas as pd

def get_table_records():
    # this is distinct from aux_records

    # we get the *actual* info that is displayed on the table
    # in the web portal
    url = 'https://api.rythm.co/v1/dreem/record/report/details/'

    headers = {
        'Authorization': 'Bearer ' + auth_dict['token'],
        'Content-Type': 'application/json'
    }

    payload = {
        'id': record_ids
    }

    out_dict = json.loads(requests.post(url, headers=headers, data=json.dumps(payload)).text)

    return out_dict

def records_to_df(table_records, aux_records):
    columns = {
        'record_id': 'Record ID',
        'record_start_iso': 'Start (ISO)',
        'record_stop_iso': 'Stop (ISO)',
        'timezone': 'Timezone',
        'record_duration': 'Duration (seconds)',
        'proportion_good_quality_eeg1': 'Channel quality 1',
        'proportion_good_quality_eeg2': 'Channel quality 2',
        'proportion_good_quality_eeg4': 'Channel quality 4',
        'proportion_off_head': 'Proportion off head',
        'proportion_scorable': 'Proportion scorable'
    }

    list_of_dicts = []

    for record in table_records:


        d = {columns[key]: endpoint for key, endpoint in record['endpoints'].items() if key in columns.keys()}

        d['Record ID'] = record['record']

        list_of_dicts.append(d)

    records_df = pd.DataFrame(list_of_dicts, columns=columns.values())

    # (2) here
    id2ref_map = {aux_record['id']: aux_record['reference'] for aux_record in aux_records}

    records_df.insert(0, 'Ref', records_df['Record ID'].apply(lambda id: id2ref_map[id]))

    return records_df

table_records = get_table_records()

#filtered_table_records = [record for record in table_records if record['user'] == uid]

records_df = records_to_df(table_records, filtered_aux_records)

records_df = records_df.sort_values('Ref', ascending=False,
                                    ignore_index=True)

records_df
Ref Record ID Start (ISO) Stop (ISO) Timezone Duration (seconds) Channel quality 1 Channel quality 2 Channel quality 4 Proportion off head Proportion scorable
0 2751857 6c2faef3-e9a6-46a0-b34d-9afdb8b24ec8 2022-06-14T19:20:27-05:00 2022-06-14T20:23:12-05:00 America/Chicago 3765.0 0.226766 0.008497 0.353160 1.000000 0.296000
1 2751856 5096a84f-b971-446e-9d49-0202fdf7c9a8 2022-06-14T19:14:25-05:00 2022-06-14T19:18:14-05:00 America/Chicago 229.0 0.009009 0.000000 0.000000 1.000000 0.000000
2 2751854 44fa039b-8b16-4a2f-b8ef-d7e5a69157b8 2022-06-15T00:55:16-05:00 2022-06-15T02:44:37-05:00 America/Chicago 6561.0 0.050213 0.138466 0.426354 0.561644 0.853881
3 2751380 d3c1b33e-c591-48a9-95bc-38282e69840e 2022-06-14T17:19:32-05:00 2022-06-14T17:19:55-05:00 America/Chicago 23.0 0.000000 0.090909 0.090909 NaN NaN
4 2751349 423eafa3-2a71-40b3-ac39-b6dcb6f3d43a 2022-06-14T03:38:12-05:00 2022-06-14T14:58:41-05:00 America/Chicago 40829.0 0.710756 0.783470 0.734684 0.023495 0.969897
5 2751348 5039f0c2-239e-41e8-9caa-abcfe071615c 2022-05-24T04:15:58-07:00 2022-05-24T10:49:51-07:00 America/Los_Angeles 23633.0 0.557961 0.656210 0.614526 0.052030 0.779188
6 2751347 2ca402e4-c2fb-4c72-866d-c73e49d5ce65 2022-05-27T05:13:46-07:00 2022-05-27T06:16:06-07:00 America/Los_Angeles 3740.0 0.305556 0.419872 0.565705 1.000000 0.604839

Now, we wish to extract a particular record from this table. Since we see that record 2751349 is the most complete (11 hours with good enough data to be scored by a human 97% of the time, we download all of its associated data (464MB of raw accelerometer + EEG data, and its hypnogram). The raw data is stored as a file 2751349.h5 on disk and the hypnogram is parsed and stored in a Pandas dataframe hypnogram_df.

Since the raw data is stored in HDF5 format, we get all of the cool advantages of HDF5: an elegant hierarchical format with fast array-like indexing that loads into memory only when absolutely necessary.

[ ]:
#@title Record ref number
record_ref = 2751349 #@param {type:"integer"}

from tqdm import tqdm

def get_download_info(record_id, record_ref):
    # get the record id
    id = records_df[records_df['Ref'] == record_ref]['Record ID'].iloc[0]

    url = f'https://api.rythm.co/v1/dreem/algorythm/record/{id}/h5/?filename={record_ref}'

    headers = {
        'Authorization': 'Bearer ' + auth_dict['token']
    }

    out_dict = json.loads(requests.get(url, headers=headers).text)

    return out_dict

def get_hypnogram(record_id):

    url = f'https://api.rythm.co/v1/dreem/record/record/{record_id}/latest_hypnogram_as_txt/'

    headers = {
        'Authorization': 'Bearer ' + auth_dict['token']
    }

    hypnogram_text = requests.get(url, headers=headers).text

    from io import StringIO

    # get index of header
    idx_start = [i for i,l in enumerate(hypnogram_text.split('\n')) if 'Sleep Stage' in l][0]

    hypnogram = pd.read_csv(StringIO('\n'.join(hypnogram_text.split('\n')[idx_start:])), sep='\t')

    return hypnogram

record_id = records_df[records_df['Ref'] == record_ref]['Record ID'].iloc[0]

print('Getting download URL...')
download_info = get_download_info(record_id, record_ref)

assert download_info['status_display'] == 'Available'

download_url = download_info['data_url']

def download_file(dowload_url, download_path):

    # Streaming, so we can iterate over the response.
    response = requests.get(download_url, stream=True)
    total_size_in_bytes= int(response.headers.get('content-length', 0))
    block_size = 1024 #1 Kibibyte
    progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)
    with open(download_path, 'wb') as file:
        for data in response.iter_content(block_size):
            progress_bar.update(len(data))
            file.write(data)

    progress_bar.close()

    if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
        print("ERROR, something went wrong")

print('Now downloading text hypnogram...')

hypnogram_df = get_hypnogram(record_id)

print('Now downloading raw EEG/accelerometer data...\n')

download_file(download_url, str(record_ref) + '.h5')
Getting download URL...
Now downloading text hypnogram...
Now downloading raw EEG/accelerometer data...

100%|██████████| 464M/464M [00:19<00:00, 23.8MiB/s]

4. Data Visualization§

Now that we’ve downloaded all of the data, let’s visualize some of it.

4.1. Hypnogram Plot§

First, let’s plot the hypnogram for this sleep session. This plot is very useful as it is a standard plot to visualize the stages of sleep over time.

[ ]:
#@title Plot hypnogram

from datetime import datetime, timezone
import numpy as np
import h5py
import matplotlib.pyplot as plt
import pytz

# get user's timezone
timezone_user = records_df[records_df['Ref'] == record_ref]['Timezone'].iloc[0]

def utc_to_local(utc_dt):
    return utc_dt.replace(tzinfo=timezone.utc).astimezone(tz=pytz.timezone(timezone_user))

# we do this stage_to_int thing to ensure that the
# stages appear in order on the y-axis
stage_to_int = [
    'SLEEP-MT',
    'SLEEP-S3',
    'SLEEP-S2',
    'SLEEP-S1',
    'SLEEP-REM',
    'SLEEP-S0'
]

stages_to_real_names = {
    'SLEEP-MT': None,
    'SLEEP-S3': 'DEEP',
    'SLEEP-S2': 'N2',
    'SLEEP-S1': 'N1',
    'SLEEP-REM': 'REM',
    'SLEEP-S0': 'WAKE'
}

stage_ints = np.array(hypnogram_df['Sleep Stage'].apply(lambda x: stage_to_int.index(x)))

num_stages = len(stage_to_int)

# load in hdf5 file
f = h5py.File('2751349.h5', 'r')

# turn into seconds so we can plot
hypnogram_df['seconds'] = hypnogram_df['Time [hh:mm:ss]'].apply(lambda x: 3600 * int(x.split(':')[0]) + 60 * int(x.split(':')[1]) + int(x.split(':')[-1]))

# get the date this was taken
date = datetime.fromtimestamp(f.attrs['start_time'])

timezone_str = utc_to_local(date).strftime('%Z')
date = utc_to_local(date).strftime('%B %d, %Y')

# get the contiguous regions in stage_ints to plot
# these separately
empty_idxes = np.where(stage_ints != 0)[0]

regions = []

cur_region_start = -1

cur_region_end = -1

for i, (idx1, idx2) in enumerate(zip(empty_idxes[:-1], empty_idxes[1:])):
    if cur_region_start == -1:
        cur_region_start = idx1
        cur_region_end = idx2

    # either diff > 1 or we're at the end
    if idx2 - idx1 > 1 or i == empty_idxes[:-1].shape[0] - 1:
        cur_region_end = idx1
        # add 1 to account for extra dummy entry in heart_rate_day
        # at the beginning
        regions.append((cur_region_start+1, cur_region_end+1))

        cur_region_start = -1
        cur_region_end = -1

with plt.style.context('ggplot'):
    plt.figure(figsize=(14, 2))

    plt.title(f'Hypnogram for record {record_ref} taken on {date} ({timezone_str})',
              fontsize=20)

    for region_start, region_end in regions:
        plt.plot(hypnogram_df['seconds'].iloc[region_start:region_end] / 3600,
                stage_ints[region_start:region_end], color='red')

    plt.yticks(ticks=list(range(1, num_stages)),
            labels=[stages_to_real_names[stage] for stage in stage_to_int[1:]])

    plt.xticks(ticks=[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14],
               labels=['4AM', '5AM', '6AM', '7AM', '8AM', '9AM', '10AM', '11AM', '12PM', '1PM', '2PM'])

    # just set the xlim so that it extends all the way left
    # and right
    plt.xlim(*(hypnogram_df['seconds'] / 3600).iloc[[0, -1]])

    plt.tight_layout()

    plt.show()
../_images/notebooks_dreem_headband_3_10_0.png

4.2. Hypnogram + EEG plot§

Now let’s plot some EEG signals (filtered). This will allow us to obtain a very granular view at the EEG signal throughout the entire sleep session. In particular, we will reproduce the below plot that you can see yourself on the admin portal. Feel free to tweak the inputs to the plot and see how it reacts!

571f2dec3d784717be7552752fc35ff1

Above is a plot taken directly from the online admin web interface!

[ ]:
#@title Plot EEG signals
date = "2022-06-14" #@param {type:"date"}
start_hour =  6 #@param {type:"integer"}
start_minute =   40#@param {type:"integer"}
end_hour =  6 #@param {type:"integer"}
end_minute =   50#@param {type:"integer"}
channel_1 = 1 #@param {type:"integer"}
channel_2 = 2 #@param {type:"integer"}
channel_3 = 4 #@param {type:"integer"}
channel_4 = 5 #@param {type:"integer"}
channel_5 = 6 #@param {type:"integer"}

eeg_channel_nums = [channel_1, channel_2, channel_3,
                    channel_4, channel_5]

#eeg_channel_nums = [1, 2, 4, 5, 6]

date_provided = date

fig, ax = plt.subplots(6, 1, figsize=(16, 8))

# get user's timezone
timezone_user = records_df[records_df['Ref'] == record_ref]['Timezone'].iloc[0]

def utc_to_local(utc_dt):
    return utc_dt.replace(tzinfo=timezone.utc).astimezone(tz=pytz.timezone(timezone_user))

# we do this stage_to_int thing to ensure that the
# stages appear in order on the y-axis
stage_to_int = [
    'SLEEP-MT',
    'SLEEP-S3',
    'SLEEP-S2',
    'SLEEP-S1',
    'SLEEP-REM',
    'SLEEP-S0'
]

stages_to_real_names = {
    'SLEEP-MT': None,
    'SLEEP-S3': 'DEEP',
    'SLEEP-S2': 'N2',
    'SLEEP-S1': 'N1',
    'SLEEP-REM': 'REM',
    'SLEEP-S0': 'WAKE'
}

stage_ints = np.array(hypnogram_df['Sleep Stage'].apply(lambda x: stage_to_int.index(x)))

num_stages = len(stage_to_int)

# load in hdf5 file
f = h5py.File('2751349.h5', 'r')

# turn into seconds so we can plot
hypnogram_df['seconds'] = hypnogram_df['Time [hh:mm:ss]'].apply(lambda x: 3600 * int(x.split(':')[0]) + 60 * int(x.split(':')[1]) + int(x.split(':')[-1]))

# get the date this was taken
date = datetime.fromtimestamp(f.attrs['start_time'])

timezone_str = utc_to_local(date).strftime('%Z')
date = utc_to_local(date).strftime('%B %d, %Y')

# get the contiguous regions in stage_ints to plot
# these separately
empty_idxes = np.where(stage_ints != 0)[0]

regions = []

cur_region_start = -1

cur_region_end = -1

for i, (idx1, idx2) in enumerate(zip(empty_idxes[:-1], empty_idxes[1:])):
    if cur_region_start == -1:
        cur_region_start = idx1
        cur_region_end = idx2

    # either diff > 1 or we're at the end
    if idx2 - idx1 > 1 or i == empty_idxes[:-1].shape[0] - 1:
        cur_region_end = idx1
        # add 1 to account for extra dummy entry in heart_rate_day
        # at the beginning
        regions.append((cur_region_start+1, cur_region_end+1))

        cur_region_start = -1
        cur_region_end = -1


date_tz = datetime.strptime(date_provided, '%Y-%m-%d').replace(tzinfo=pytz.timezone(timezone_user))

start_timestamp = date_tz.timestamp() + start_hour * 3600 + start_minute * 60
end_timestamp = date_tz.timestamp() + end_hour * 3600 + end_minute * 60

start_time_offset = start_timestamp - f.attrs['start_time']
end_time_offset = end_timestamp - f.attrs['start_time']

start_time_idx = int(start_time_offset * 250)
end_time_idx = int(end_time_offset * 250)

eeg_signals = [f[f'eeg{eeg_channel_num}/filtered'][start_time_idx:end_time_idx] for eeg_channel_num in eeg_channel_nums]


with plt.style.context('ggplot'):


    for region_start, region_end in regions:
        ax[0].plot(hypnogram_df['seconds'].iloc[region_start:region_end] / 3600,
                stage_ints[region_start:region_end], color='black')

    # now fill in an interval
    ax[0].axvspan(start_hour + start_minute / 60, end_hour + end_minute / 60,
                  color='#ff95d4')

    ax[0].set_yticks(ticks=list(range(1, num_stages)))
    ax[0].set_yticklabels(labels=[stages_to_real_names[stage] for stage in stage_to_int[1:]])

    ticks = [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
    ax[0].set_xticks(ticks=ticks)
    ax[0].set_xticklabels(labels=['0' + str(tick) + ':00' for tick in ticks])#['04:00', '06:00', '', '9AM', '10AM', '11AM', '12PM', '1PM', '2PM'])
    ax[0].tick_params(colors='silver', which='ticks')

    # just set the xlim so that it extends all the way left
    # and right
    ax[0].set_xlim(*(hypnogram_df['seconds'] / 3600).iloc[[0, -1]])

    ax[0].spines['top'].set_visible(False)
    ax[0].spines['right'].set_visible(False)
    ax[0].spines['bottom'].set_visible(True)
    ax[0].spines['left'].set_visible(True)

    plt.setp(ax[0].get_xticklines(), color='silver')
    plt.setp(ax[0].get_yticklines(), color='silver')

    ax[0].spines['bottom'].set_edgecolor('silver')
    ax[0].spines['left'].set_edgecolor('silver')


for i in range(1, 6):
    ax[i].plot(eeg_signals[i-1], color='black', linewidth=.7)

    ax[i].set_ylim(-150, 150)

    ax[i].set_xlim(0, eeg_signals[i-1].shape[0])

    # leave only the left edge of the frame in
    ax[i].spines['top'].set_visible(False)
    ax[i].spines['right'].set_visible(False)
    ax[i].spines['bottom'].set_visible(False)
    ax[i].spines['left'].set_visible(True)
    ax[i].get_xaxis().set_ticks([])


fig.tight_layout()
plt.tight_layout()
../_images/notebooks_dreem_headband_3_12_0.png

Above is a plot we created ourselves!

Now let’s combine both of the above plots together. What happens to the EEG signal across different sleep stages? Does it become more chaotic during some vs. others? Intuitively, here we’d expect that the EEG signal becomes very active and chaotic when the user is awake, and the EEG signal becomes quiet in deep sleep. We measure this notion “chaos” with the variance of the signal, taken during a 1 second time interval. Below we aggregate all 1 second time intervals, pair them to their apppropriate sleep stages, and plot the variance vs. sleep stage.

[ ]:
#@title Visualize EEG signal vs. sleep stage
import seaborn as sns

def strided_window_reshape(x, window_size):
    # reshape 1D array according to strided window size (windows do not overlap)
    num_windows = x.shape[0] // window_size

    return x[:num_windows * window_size].reshape((-1, window_size))

# get variance for each 1-second interval in EEG signal
windows = strided_window_reshape(f['eeg1/filtered'][:], 250)

variances = np.var(windows, axis=1)

# get median for each 30-second interval in variance signal
variances_windowed = strided_window_reshape(variances, 30)

median_variances = np.median(variances_windowed, axis=1)

sleep_stages = np.unique(hypnogram_df['Sleep Stage'])

sleep_stage_variances = {}

df_dict = []

for sleep_stage in sleep_stages:
    median_variance = median_variances[np.where(hypnogram_df['Sleep Stage'] == sleep_stage)[0]]

    for v in median_variance:
        df_dict.append({'Sleep Stage': stages_to_real_names[sleep_stage],
                        'variance': v})


eeg_var_df = pd.DataFrame(df_dict).dropna()

with plt.style.context('ggplot'):
    plt.figure(figsize=(16, 8))

    plt.title('EEG signal variance vs. sleep stage (1 second time intervals)',
              fontsize=20)

    sns.barplot(x='Sleep Stage', y='variance', data=eeg_var_df,
                capsize=.1)
../_images/notebooks_dreem_headband_3_14_0.png

Indeed, we see that there is a large amount of variability in this variance across different sleep stages. In particular, we see that the WAKE stage is particularly chaotic. However, at least visually, we see very little difference between deep sleep and REM sleep in terms of the variance of the EEG signal.

5. Data analysis§

DISCLAIMER: the analyses below may not be 100% biologically or scientifically grounded; the code is here to assist in your process, if you are interested in asking these kinds of questions.

5.1. Variance of EEG signal vs. stage§

Here, we’ll just use what we already computed earlier in section 4.3 (eeg_var_df, a DataFrame hidden behind the code fold) and compute a statistical test on that, for due diligence.

[ ]:
from scipy.stats import ttest_ind

wake_var = eeg_var_df[eeg_var_df['Sleep Stage'] == 'WAKE']['variance']

sleep_var = eeg_var_df[eeg_var_df['Sleep Stage'] != 'WAKE']['variance']

res = ttest_ind(wake_var, sleep_var)

print(f'P-value: {res.pvalue:.3g}')
P-value: 2.42e-05

Looks significant!

5.2. Norm of acceleration vs. stage§

For our next analysis, we’ll do a similar analysis (with 1 second time intervals as well), but with the norm of the acceleration instead. First, we’ll do the same plot, just to get an idea of what’s going on.

[ ]:
#@title Plot accelerometer norm vs. sleep stage
acc_norm = np.sum([f[k]['filtered'][:] ** 2 for k in ['accelerometer_x', 'accelerometer_y', 'accelerometer_z']], axis=0)

# get variance for each 1-second interval in acc norm
windows = strided_window_reshape(acc_norm, 50)

variances = np.median(windows, axis=1)

# get median for each 30-second interval in variance signal
variances_windowed = strided_window_reshape(variances, 30)

median_variances = np.median(variances_windowed, axis=1)

sleep_stages = np.unique(hypnogram_df['Sleep Stage'])

sleep_stage_variances = {}

df_dict = []

for sleep_stage in sleep_stages:
    median_variance = median_variances[np.where(hypnogram_df['Sleep Stage'] == sleep_stage)[0]]

    for v in median_variance:
        df_dict.append({'Sleep Stage': stages_to_real_names[sleep_stage],
                        'norm': v})


acc_norm_df = pd.DataFrame(df_dict).dropna()

with plt.style.context('ggplot'):
    plt.figure(figsize=(16, 8))

    plt.title('Accelerometer norm vs. sleep stage (1 second time intervals)',
              fontsize=20)

    sns.barplot(x='Sleep Stage', y='norm', data=acc_norm_df,
                capsize=.1)
../_images/notebooks_dreem_headband_3_18_0.png

Now let’s again compute a two-tailed T-test, again for due diligence.

[ ]:
from scipy.stats import ttest_ind

wake_var = acc_norm_df[acc_norm_df['Sleep Stage'] == 'WAKE']['norm']

sleep_var = acc_norm_df[acc_norm_df['Sleep Stage'] != 'WAKE']['norm']

res = ttest_ind(wake_var, sleep_var)

print(f'P-value: {res.pvalue:.3g}')
P-value: 1.46e-69

Significant, as expected.