{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"\n",
"\n",
"_A picture of the Oura Ring 3 that was used for this notebook_"
],
"metadata": {
"id": "K8Il_ziA9mCj"
}
},
{
"cell_type": "markdown",
"source": [
"At **\\$299** + **\\$5.99** per month and a weight of just 4-6 grams, the [Oura Ring 3](https://support.ouraring.com/hc/en-us/articles/4409072131091-Meet-Gen3) is a unique Sleep and physical activity tracker that, beyond having a price-to-weight ratio higher than gold, has achieved much attention for its ease of use in clinical studies and comfortable wear.\n",
"\n",
"We have used the Oura ring for a month, and we will show you how to extract its data, visualize sleep stages for time and compute correlations and statistical significance from sleep scores to step counts.\n",
"\n",
"This is a comprehensive, clear guide to extract our data from the Oura Ring 3 using the Oura API. Links to external resources and official Oura documentation are provided sporadically throughout the guide for further reference.\n",
"\n",
"If you want to learn more about the Oura ring, see the [README](https://github.com/alrojo/wearipedia/tree/main/wearables/oura_ring_3) for a detailed analysis of performance, sensors, data privacy, and extraction pipelines.\n",
"
\n",
"\n",
"A list of the most important accessible data categories is provided below, For the full list, access the meta_api_data in section 3.2. There are two versions of the API: versions 1 and 2, so (V1) and (V2) in the parameter list refer to the API version. Version 2 of the API is the more recent version that's still being updated and will replace version 1, and that's why both are included here.\n",
"**The items in bold are the ones we use in the notebook.**\n",
"\n",
"Category Name (API version)| Parameter Name (subcategory)| Frequency of Sampling \n",
":-------------------:|:----------------------:|:----------------------:\n",
"**Heart rate (V2)** | **BPM** | **Every 5 mins**\n",
"**Daily activity (V2)** | **resting time**|**Daily**\n",
"Daily activity (V2) | active calories|Daily \n",
"**Daily activity (V2)** | **equivalent walking distance**|**Daily** \n",
"Daily activity (V2) | steps |Daily\n",
"Daily activity (V2) | medium activity time |Daily\n",
"Daily activity (V2) | total calories |Daily\n",
"Daily activity (V2) | low activity time |Daily\n",
"Daily activity (V2) | non-wear time |Daily\n",
"Daily activity (V2) | overall score |Daily\n",
"Sleep (V1) | awake time |Daily\n",
"Sleep (V1) | average breath rate |Daily\n",
"**Sleep (V1)** | **REM sleep time** |**Daily**\n",
"**Sleep (V1)** | **deep sleep time** |**Daily**\n",
"**Sleep (V1)** | **light sleep time** |**Daily**\n",
"**Sleep (V1)** | **temperature deviation** |**Daily**\n",
"Sleep (V1) | bedtime start |Daily\n",
"Sleep (V1)| restless time |Daily\n",
"Sleep (V1)| duration |Daily\n",
"Activity (V1) | score of recovery time |Daily\n",
"Activity (V1) | overall Score|Daily\n",
"Activity (V1) | high activity time|Daily\n",
"Activity (V1) | low activity time|Daily\n",
"Activity (V1) | total activity time|Daily\n",
"Activity (V1) | inactive time|Daily\n",
"Activity (V1) | target miles|Daily\n",
"Activity (V1) | target Calories|Daily\n",
"Activity (V1) | training frequency score|Daily\n",
"Readiness (V1) | activity balance score |Daily\n",
"**Readiness (V1)** | **previous day score**|**Daily**\n",
"**Readiness (V1)** | **previous night score** |**Daily**\n",
"Readiness (V1) | sleep balance score|Daily\n",
"Readiness (V1) | temperature score|Daily \n",
"Ideal bedtimes (V1) | bedtime window|Daily\n",
"Sleep (V1) | bedtime end|Daily\n",
"**Sleep (V1)** | **Overall score**|**Daily**\n",
"Sleep (V1) | efficiency|Daily\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"In this guide, we sequentially cover the following **five** topics to extract data from the Oura Ring API:\n",
"\n",
"1. **Setup**\n",
" - 1.1: Study participant setup and usage\n",
" - 1.2: Library imports needed to follow through\n",
"2. **Authentication/Authorization**\n",
" - 2.1: OAuth(2), allowing access for multiple entities\n",
" - 2.2: Personal access token, intended for personal use \n",
"3. **Data extraction**\n",
" - 3.1: Setting up the parameters to extract the data.\n",
" - 3.2: Extracting data from version 2 endpoints\n",
" - 3.3: Extracting data from version 1 endpoints\n",
" - 3.4: Aggregating the data to be able to easily access it\n",
" - 3.5: Plotting multiple parameters simultaneously\n",
"4. **Data visualization**\n",
" - 4.1: Visualizing the different sleep stages \n",
" - 4.2: Visualizing Resting Heart Rate\n",
" - 4.2: Visualizing Body temperature\n",
"5. **Data analysis**\n",
" - 5.1: Finding Outliers (Anomaly Detection). We provide two ways to find outliers in any set of output data.\n",
" - 5.2: Dispersion analysis between day and night readiness scores.\n",
" - 5.3: We try to find a correlation between equivalent walking distance and the amount of deep sleep. We then find that the correlation is not statistically significant. \n",
" - 5.4: We try to find a correlation between resting time and sleep score. We then find that this correlation is statically significant.\n",
"\n",
"*Note: Full documentation of APIs by Oura can be found [here](https://cloud.ouraring.com/v2/docs) (version 2) and [here](https://cloud.ouraring.com/docs/) (version 1)."
],
"metadata": {
"id": "ggVLILf7cD8x"
}
},
{
"cell_type": "markdown",
"source": [
"# 1. Setup"
],
"metadata": {
"id": "nFpTwru7iPb9"
}
},
{
"cell_type": "markdown",
"source": [
"## 1.1 Study participant setup and usage\n"
],
"metadata": {
"id": "pM0NPHjGa-dL"
}
},
{
"cell_type": "markdown",
"source": [
"## Participant Setup\n",
"\n",
"Dear Participant,\n",
"\n",
"First, download the Oura app from the app store and charge the ring. When the LED blinks blue, the ring is ready to pair. Pair the phone to the ring with Bluetooth and follow the instructions in the app to create your Oura account. You have to spend one night wearing the ring for sleep and readiness score to appear, but the rest of the data will be available immediately. Check [this](https://support.ouraring.com/hc/en-us/articles/4411128662291-Set-Up-an-Oura-Ring) out if you need more info about the setup.\n",
"\n",
"Best,\n",
"\n",
"Wearipedia\n",
"\n",
"## Data Receiver Setup\n",
"\n",
"Please follow the below steps:\n",
"\n",
"1. Create an email address for the participant, for example `foo@email.com`.\n",
"2. Create an Oura account with the email `foo@email.com` and some random password.\n",
"3. Keep `foo@email.com` and password stored somewhere safe.\n",
"4. Distribute the device to the participant and instruct them to follow the participant setup letter above.\n",
"\n",
"\n",
"To make data available for extraction, you will just need the username and password to generate the API key as illustrated in section 2. Once an API key is generated, these would be no need for the log-in information on the researcher's side. "
],
"metadata": {
"id": "5MkNKxcScNkj"
}
},
{
"cell_type": "markdown",
"source": [
"## 1.2 Library imports"
],
"metadata": {
"id": "m-CFyV7Hbuk4"
}
},
{
"cell_type": "code",
"source": [
"# Uncomment and Run the following command to install the nessecary libraries locally\n",
"# pip install requests matplotlib numpy scipy seaborn pandas\n",
"\n",
"# Relevant libraries are imported below.\n",
"import requests\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib import figure\n",
"import numpy\n",
"from scipy import stats\n",
"import seaborn as sns\n",
"import pandas as pd\n",
"import numpy as np\n",
"from scipy.ndimage import gaussian_filter\n",
"from matplotlib.patches import FancyBboxPatch"
],
"metadata": {
"id": "uuVAYBKuiSNj"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# 2. Authentication/Authorization"
],
"metadata": {
"id": "xaFsRnKrps4l"
}
},
{
"cell_type": "markdown",
"source": [
"To obtain access to data, authentication is required. There are two ways to authenticate, namely:\n",
"* [OAuth 2.0](https://cloud.ouraring.com/oauth/applications)\n",
"\n",
"\n",
"* [Personal Access Token](https://cloud.ouraring.com/personal-access-tokens)\n",
"\n",
"OAuth 2.0 is considered the gold standard. While certainly not [foolproof](https://datatracker.ietf.org/doc/html/draft-ietf-oauth-security-topics), it offers one of the most comprehensive and easily accessible security measures. More information about the OAuth 2.0 protocol is available [here](https://datatracker.ietf.org/doc/html/rfc6749).\n",
"\n",
"Alternatively, Oura also allows for data extraction through Personal Access Tokens (PAT). As a much less secure method, Oura recommends use of PATs for personal use only.\n",
"\n",
"**Clinical use == OAuth 2.0**\n",
"\n",
"*Note: You have to login fist to be able to authenticate"
],
"metadata": {
"id": "A2pjC9JvV_3d"
}
},
{
"cell_type": "markdown",
"source": [
"## 2.1 OAuth 2.0"
],
"metadata": {
"id": "34kRFVw3FGR-"
}
},
{
"cell_type": "markdown",
"source": [
"OAuth 2.0 can be created by visiting this [link](https://cloud.ouraring.com/oauth/applications) and registering a new application. A couple things to take note of:\n",
"* `client_id`: A unique ID required to obtain a code or access token\n",
"* `client_secret`: A secret code for added security\n",
"* Redirect URIs: `https://127.0.0.1:8080` is also known as the localhost and is where a user can receive a code or token\n",
"* Allow server-side authentication/Allow client-side authentication: Allows the user to obtain a code or token\n",
"\n",
"Server-side authorization is more secure but only returns a code, which should then be traded again for a token. Client-side authorization is less secure, but allows users to skip the step of trading the code for a token. For beginners, we recommend utilizing client-side authentication for simplicity.\n",
"\n",
"This is shown below\n",
"\n",
"\n",
"\n"
],
"metadata": {
"id": "Tkz8f7DLWCaX"
}
},
{
"cell_type": "code",
"source": [
"# We will start storing pertinent parameters into a Python dictionary (`variables`)\n",
"variables = dict()\n",
"\n",
"# CHANGE THESE PARAMETERS WITH YOUR OWN ID AND SECRET\n",
"variables[\"client_id\"] = \"WOTA24GUEPZ63APX\"\n",
"variables[\"client_secret\"] = \"CREYFR574EDYZSREMBR34K5WQH7XZVKP\"\n",
"\n",
"# DO NOT CHANGE THESE PARAMETERS\n",
"variables[\"state\"] = \"XXXX\"\n",
"variables[\"redirect_uri\"] = \"https%3A%2F%2F127.0.0.1%3A8080\"\n",
"variables[\"response_type\"] = \"token\""
],
"metadata": {
"id": "sVWH9hSKSov8"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Afterwards, generate a link with the following code block. Alternatively, a link is provided underneath the APP DETAILS page.\n",
"\n",
"This is shown below\n",
"\n",
""
],
"metadata": {
"id": "Ii9foglK7BBn"
}
},
{
"cell_type": "code",
"source": [
"url = \"https://cloud.ouraring.com/oauth/authorize\"\n",
"for key, value in variables.items():\n",
" if url == \"https://cloud.ouraring.com/oauth/authorize\":\n",
" url += \"?\" + key + \"=\" + value\n",
" elif key == \"client_secret\":\n",
" # exclude client_secret in url for token\n",
" continue\n",
" else:\n",
" url += \"&\" + key + \"=\" + value\n",
"\n",
"print(url)"
],
"metadata": {
"id": "CmV5pL6q2bSs",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a6f93627-8059-40f5-a873-f5999ec28aea"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"https://cloud.ouraring.com/oauth/authorize?client_id=WOTA24GUEPZ63APX&state=XXXX&redirect_uri=https%3A%2F%2F127.0.0.1%3A8080&response_type=token\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Click on the resulting link above, and click accept. Afterwards, make sure to copy the URL as it provides the access token required to extract data.\n",
"\n",
"This is shown below\n",
"\n",
""
],
"metadata": {
"id": "uKcQTcUF9oYu"
}
},
{
"cell_type": "markdown",
"source": [
"[link text](https://127.0.0.1:8080/#access_token=ZEGOXLJXB6STEBBMQLRZBKYFGM2HPTOO&token_type=bearer&expires_in=2592000&scope=email%20personal%20daily%20heartrate%20workout%20tag%20session&state=XXX) From this URL, take the combination of letters and numbers after `access_token=` until `&token_type=`. This constitutes the `access_token` required to extract data for this user.\n",
"\n",
"> https://127.0.0.1:8080/#access_token=NZA3DRAHG37X2GDV34YV4Q7XETOKSXKE&token_type=bearer&expires_in=2592000&scope=email%20personal%20daily%20heartrate%20workout%20tag%20session&state=XXXX\n",
"\n",
"In this case, the `access_token` is **NZA3DRAHG37X2GDV34YV4Q7XETOKSXKE**. This is added to the `variables` Python dictionary in the code below (hereafter known as `oauth_access_token`).\n",
"\n",
"*Note: This access token will expire in 2592000 seconds, which is one month. If the `oauth_access_token` no longer works due to expiry, please obtain a new `oauth_access_token`."
],
"metadata": {
"id": "XVtKgcfF92gI"
}
},
{
"cell_type": "code",
"source": [
"variables[\"oauth_access_token\"] = \"NZA3DRAHG37X2GDV34YV4Q7XETOKSXKE\""
],
"metadata": {
"id": "bfeCUcBQ_Ete"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 2.2 Personal Access Token\n"
],
"metadata": {
"id": "TUYKNkjkFD_f"
}
},
{
"cell_type": "markdown",
"source": [
"A personal access token (PAT) is an easy and simple alternative to OAuth 2.0. PATs can be created [here](https://cloud.ouraring.com/personal-access-tokens) and revoked at anytime if tokens are compromised. PATs are visible only at creation and are otherwise hidden from view, save for the first five characters. For privacy reasons, PATs are typically only meant for personal use, and experiments where one entity requires access to multiple entities' data should utilize OAuth 2.0.\n",
"\n",
"This is shown below\n",
"\n",
""
],
"metadata": {
"id": "Xl4V2oPU_AUi"
}
},
{
"cell_type": "markdown",
"source": [
"The `personal_access_token` similarly requires `\"Bearer \"` in front of the token."
],
"metadata": {
"id": "BZDwfRSvA3Ah"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3kJdWgMtb7F1"
},
"outputs": [],
"source": [
"variables[\"personal_access_token\"] = \"SZTX3OON6SAX7GOOOWQX5SK2GGZGBDVG\""
]
},
{
"cell_type": "markdown",
"source": [
"# 3. Data extraction\n",
"***Note: Documentation, data availability, and function arguments (most likely endpoints) are subject to change.**"
],
"metadata": {
"id": "ATcqnTjidMOm"
}
},
{
"cell_type": "markdown",
"source": [
"Data can be extracted using the Oura Ring API. There are currently two versions (VERSION 2 and VERSION 1) of which VERSION 1 is most likely soon to be deprecated. VERSION 2 is the newer version and has more detailed documentation. Oura allows users to access data from the categories in section 3.2.\n",
"\n",
"Some data categories are still available through VERSION 1 endpoints for at least one year after the resealse of V2 in Jan-2022 [according to Oura](https://cloud.ouraring.com/v2/docs#tag/Daily-Sleep).\n",
"\n",
"In the following sections, VERSION 2 will precede VERSION 1 as it is more up-to-date.\n",
"\n"
],
"metadata": {
"id": "3CkBVoV_poR0"
}
},
{
"cell_type": "markdown",
"source": [
"## 3.1 Set Up The Parameters\n",
"\n"
],
"metadata": {
"id": "gEDEZajqO_vb"
}
},
{
"cell_type": "markdown",
"source": [
"First, we select the start and end dates that we want to extract data from. We can use either `personal_access_token` or `oauth_access_token` depending on which one used in the athentication."
],
"metadata": {
"id": "xNK2GUfa2d7O"
}
},
{
"cell_type": "code",
"source": [
"#@title You can change these parameters\n",
"START_DATE = \"2022-03-20\" #@param {type:\"string\"}\n",
"END_DATE = \"2022-04-01\" #@param {type:\"string\"}\n",
"\n",
"# change key to \"personal_access_token\" if using PAT\n",
"TOKEN_TYPE = \"personal_access_token\" #@param {type:\"string\"}\n",
"ACCESS_TOKEN = variables[TOKEN_TYPE] "
],
"metadata": {
"id": "V3WrA_yydLfz"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 3.2 VERSION 2 Endpoints"
],
"metadata": {
"id": "-3b69nsr1dA5"
}
},
{
"cell_type": "markdown",
"source": [
"These data categories are available through VERSION 2 endpoints as of 21 April 2022.\n",
"\n",
"* [Personal Info](https://cloud.ouraring.com/docs/personal-info)\n",
"* [Heart Rate](https://cloud.ouraring.com/v2/docs#tag/Heart-Rate)\n",
"* [Sessions](https://cloud.ouraring.com/v2/docs#tag/Sessions)\n",
"* [Tags](https://cloud.ouraring.com/v2/docs#tag/Tags)\n",
"* [Workouts](https://cloud.ouraring.com/v2/docs#tag/Workouts)\n",
"\n",
"We write a function which takes the following parameters:\n",
"* `start_date`: Starting date\n",
"* `end_date`: Ending date\n",
"* `access_token`: Access token\n",
"The following parameters may vary depending on the type of data we access. They are specified below and should be left alone.\n",
"\n",
"* STATIC `endpoint`: URL to extract data from\n",
"* STATIC `start_date_col`: Column name for start date\n",
"* STATIC `end_date_col`: Column name for end date\n",
"* STATIC `call`: Type of [CRUD](https://www.codecademy.com/article/what-is-crud) call in [HTTP](https://www.w3schools.com/whatis/whatis_http.asp), defaults to \"GET\""
],
"metadata": {
"id": "nkDVGyyCVmuk"
}
},
{
"cell_type": "code",
"source": [
"# function that executes a GET request on the API (for version 2 endpoints)\n",
"def call_API_version_2(\n",
" url: str,\n",
" start_date = START_DATE, \n",
" end_date = END_DATE,\n",
" access_token = ACCESS_TOKEN,\n",
" start_date_col: str = \"start_date\",\n",
" end_date_col: str = \"end_date\",\n",
" call: str = \"GET\"\n",
"):\n",
" headers = { \"Authorization\": \"Bearer \" + access_token }\n",
" params = { start_date_col: start_date, end_date_col: end_date }\n",
" return requests.request(\n",
" call, url=url, headers=headers, params=params).json()"
],
"metadata": {
"id": "fE8Pxbh8mPx7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Next, we call all the data from each corresponding URL.\n",
"\n",
"\n",
"# heart_rate\n",
"heart_rate = call_API_version_2(\n",
" url=\"https://api.ouraring.com/v2/usercollection/heartrate\",\n",
" start_date_col=\"start_datetime\",\n",
" end_date_col=\"end_datetime\",\n",
" start_date=START_DATE + \"T00:00:00-23:59\",\n",
" end_date=END_DATE + \"T00:00:00-23:59\"\n",
")\n",
"\n",
"# personal_info\n",
"personal_info = call_API_version_2(url=\"https://api.ouraring.com/v2/usercollection/personal_info\")\n",
"\n",
"# sessions\n",
"sessions = call_API_version_2(url=\"https://api.ouraring.com/v2/usercollection/sessions\")\n",
"\n",
"# tag\n",
"tag = call_API_version_2(url=\"https://api.ouraring.com/v2/usercollection/tag\")\n",
"\n",
"# workout\n",
"workout = call_API_version_2(url=\"https://api.ouraring.com/v2/usercollection/workout\")\n",
"\n",
"# daily_activity\n",
"daily_activity = call_API_version_2(url=\"https://api.ouraring.com/v2/usercollection/daily_activity\")\n"
],
"metadata": {
"id": "goPZXhfI2KrD"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 3.3 VERSION 1 Endpoints"
],
"metadata": {
"id": "-tOS092tPNyW"
}
},
{
"cell_type": "markdown",
"source": [
"These data categories are available through VERSION 1 endpoints as of 25 April 2022.\n",
"\n",
"* [Daily Sleep](https://cloud.ouraring.com/docs/sleep)\n",
"* [Daily Activity](https://cloud.ouraring.com/docs/activity)\n",
"* [Daily Readiness](https://cloud.ouraring.com/docs/readiness)\n",
"* [Ideal Bedtime](https://cloud.ouraring.com/docs/bedtime)\n",
"\n",
"The VERSION 1 function similarly takes the following parameters:\n",
"* `start_date`: Starting date\n",
"* `end_date`: Ending date\n",
"* `access_token`: Access token (either `personal_access_token` or `oauth_access_token`)\n",
"\n",
"The remaining parameters are similarly specified and should be left alone."
],
"metadata": {
"id": "zy1yRa9UVcMi"
}
},
{
"cell_type": "code",
"source": [
"# function that executes a GET request on the API (for version 1 endpoints)\n",
"def call_API_version_1(\n",
" url: str,\n",
" start_date = START_DATE,\n",
" end_date = END_DATE,\n",
" access_token = ACCESS_TOKEN,\n",
" start_date_col: str = \"start\",\n",
" end_date_col: str = \"end\",\n",
" call: str = \"GET\"\n",
"):\n",
" params = {\n",
" \"access_token\": access_token,\n",
" start_date_col: start_date,\n",
" end_date_col: end_date\n",
" }\n",
" return requests.request(\n",
" call, url=url, params=params).json()"
],
"metadata": {
"id": "hOCiih36PQBA"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# sleep\n",
"sleep = call_API_version_1(url=\"https://api.ouraring.com/v1/sleep\")\n",
"\n",
"# activity\n",
"activity = call_API_version_1(url=\"https://api.ouraring.com/v1/activity\")\n",
"\n",
"# readiness\n",
"readiness = call_API_version_1(url=\"https://api.ouraring.com/v1/readiness\")\n",
"\n",
"# ideal_bedtimes\n",
"ideal_bedtimes = call_API_version_1(url=\"https://api.ouraring.com/v1/bedtime\")"
],
"metadata": {
"id": "r5mMv3bJbfpp"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 3.4 Data Aggregation"
],
"metadata": {
"id": "hVd43d2U192f"
}
},
{
"cell_type": "markdown",
"source": [
"After this point, there are two variables to take note of.\n",
"* `api_data` - a [dictionary](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict) type which contain all extracted data as a [list](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range) of dictionaries\n",
"* `meta_api_data` - a dictionary type which contains the keys of all extracted data as dictionaries\n",
"\n",
"To access information from these dictionaries, call the keys or values functions on either variable. Simply printing may suffice as well (but may be way too long to visualize properly). Examples and some code is listed below to better visualize the data.\n",
" \n",
"---\n",
" \n",
"api_data\n",
"* `api_data.keys()`\n",
"* `api_data.values()`\n",
"* `print(api_data)`\n",
"\n",
"meta_api_data\n",
"* `meta_api_data.keys()`\n",
"* `meta_api_data.values()`\n",
"* `print(meta_api_data)`"
],
"metadata": {
"id": "DKocUiSgV3vJ"
}
},
{
"cell_type": "code",
"source": [
"# aggregate data for version 2 endpoints\n",
"api_data = dict()\n",
"api_data[\"personal_info\"] = [personal_info]\n",
"# api_data[\"heart_rate\"] = heart_rate[\"data\"]\n",
"print(heart_rate)\n",
"api_data[\"sessions\"] = sessions[\"detail\"] if sessions[\"detail\"] != \"Not Found\" else [{}]\n",
"api_data[\"tag\"] = tag[\"data\"] if tag[\"data\"] else [{}]\n",
"api_data[\"workout\"] = workout[\"data\"] if workout[\"data\"] else [{}]\n",
"api_data[\"daily_activity\"] = daily_activity[\"data\"] if daily_activity[\"data\"] else [{}]"
],
"metadata": {
"id": "YvX35jDeV36C",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 258
},
"outputId": "244b64cf-7eb1-4bf4-e6c0-35cfd7a5035c"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"{'message': 'Subscription not valid'}\n"
]
},
{
"output_type": "error",
"ename": "KeyError",
"evalue": "ignored",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m# api_data[\"heart_rate\"] = heart_rate[\"data\"]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mheart_rate\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mapi_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"sessions\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msessions\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"detail\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0msessions\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"detail\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0;34m\"Not Found\"\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0mapi_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"tag\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtag\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"data\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mtag\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"data\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0mapi_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"workout\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mworkout\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"data\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mworkout\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"data\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32melse\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mKeyError\u001b[0m: 'detail'"
]
}
]
},
{
"cell_type": "code",
"source": [
"# aggregate data for version 1 (in addition to VERSION 2)\n",
"api_data[\"sleep\"] = sleep[\"sleep\"]\n",
"api_data[\"activity\"] = activity[\"activity\"]\n",
"api_data[\"readiness\"] = readiness[\"readiness\"]\n",
"api_data[\"ideal_bedtimes\"] = ideal_bedtimes[\"ideal_bedtimes\"]"
],
"metadata": {
"id": "M29293BmaKiR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# initalize metadata with information for all api_data value keys\n",
"meta_api_data = {i:{j for j in api_data[i][0].keys()} for i in api_data.keys()}"
],
"metadata": {
"id": "7cyYF9T17hsH"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# print api_data\n",
"for key, value in api_data.items():\n",
" print(key, \":\", value)"
],
"metadata": {
"id": "gFIRFw5yeE9d"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# print meta_api_data\n",
"for key, value in meta_api_data.items():\n",
" print(key, value)"
],
"metadata": {
"id": "7PwP8AlWeGdp"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# 4. Data visualization"
],
"metadata": {
"id": "lxnVVZvBy1pl"
}
},
{
"cell_type": "markdown",
"source": [
"## 4.1 Sleep Stages Visualization"
],
"metadata": {
"id": "IGnxwDt1Fp61"
}
},
{
"cell_type": "markdown",
"source": [
"When using the Oura app, the user has access to their sleep history as a plot under the sleep tab. The stages of sleep are delineated and the amount of time under each sleep stage can be visualized on a daily basis (shown below).\n",
"
\n",
"\n",
"
\n",
"In the above plot, we've recreated a plot using `matplotlib` and extracting sleep parameters from the Oura device.\n",
"\n",
"Oura stacks different sleep stages on top of one-another. From the bottom, the plot shows **Deep Sleep**, followed by **Light Sleep**, **REM Sleep**, and **Time Awake**. When stacked on top of one another, it gives a visual overview of the sleep quality and time spent slept (Deep + Light + REM).\n",
"\n",
"To create the plot, first, we extract our data of interest, the amount of deep sleep, light sleep, rem sleep, awake time, and the dates from api_data dictionary into separate arrays. Then we will use `matplotlib`'s `pyplots` library on [stackplot](https://matplotlib.org/stable/gallery/lines_bars_and_markers/stackplot_demo.html#sphx-glr-gallery-lines-bars-and-markers-stackplot-demo-py). The stackplot allows us to represent the different sleep stages, including the awake state, stacked on top of each other. After this, we adjust the aesthetic components of the plot to make it look like the original."
],
"metadata": {
"id": "vpTF0Iq3-UpG"
}
},
{
"cell_type": "code",
"source": [
"\"\"\"\n",
"1. Extract the parameters of interest from our api call\n",
" - api_data is a list of daily logs\n",
" - looping over api_data will store daily sleep parameters in separate lists\n",
"\"\"\"\n",
"deep, light, rem, awake, dates = [], [], [], [], []\n",
"for elem in api_data['sleep']:\n",
" deep.append(elem['deep'] / 60 / 60) # sec -> min -> hour\n",
" light.append(elem['light'] / 60 / 60)\n",
" rem.append(elem['rem'] / 60 / 60)\n",
" awake.append(elem['awake'] / 60 / 60)\n",
" dates.append(elem['summary_date'])\n",
"api_data['sleep'][0]['summary_date']\n",
"\n",
"# put data into a dictionary for the stackplot function below\n",
"sleepData ={\n",
" 'Deep': deep,\n",
" 'Light': light,\n",
" 'REM': rem,\n",
" 'Awake': awake\n",
"}\n",
"start_date = dates[0]\n",
"end_date = dates[-1]\n",
"\n",
"\"\"\"\n",
"2. Plot stackplot with matplotlib\n",
"\"\"\"\n",
"# set style to dark\n",
"plt.style.use('dark_background')\n",
"\n",
"# define colors for Deep, Light, REM, Awake\n",
"color_map = [\"#2c4677\", \"#568bbd\", \"#85c9fa\", \"#f2f3f5\"]\n",
"\n",
"# setup and plot\n",
"fig, ax = plt.subplots()\n",
"fig.set_size_inches(8, 4)\n",
"plt.rc('font', size=22)\n",
"ax.stackplot(dates, sleepData.values(),\n",
" labels=sleepData.keys(), colors=color_map, alpha=0.8)\n",
"\n",
"# removing the borders from four sides\n",
"plt.gca().spines['left'].set_visible(False)\n",
"plt.gca().spines['right'].set_visible(False)\n",
"plt.gca().spines['top'].set_visible(False)\n",
"plt.gca().spines['bottom'].set_visible(False)\n",
"\n",
"\n",
"#remove x and y tics\n",
"plt.xticks([])\n",
"plt.yticks([])\n",
"\n",
"#set a new order for handles and lables\n",
"handles, labels = plt.gca().get_legend_handles_labels()\n",
"order = [3,2,1,0]\n",
"\n",
"# set legend below plot\n",
"plt.legend([handles[idx] for idx in order],[labels[idx] for idx in order], loc='upper center', bbox_to_anchor=(0.5, 0.26),\n",
" fancybox=True, shadow=True, ncol=5, fontsize= 18, frameon=False, handlelength=0.9)\n",
"ax.set_title('Thurs, March 24', color = '#85c9fa')\n",
"\n",
"#getting total sleep and time in bed for march 24th\n",
"for metric in api_data[\"sleep\"]:\n",
" if metric['summary_date'] == '2022-03-24':\n",
" total_in_bed = metric['total']\n",
" restless = metric['restless']\n",
" total_sleep = total_in_bed - restless\n",
"\n",
"#get hours and minutes separated\n",
"totalinbed_h = total_in_bed // 60 // 60 #convert to hours\n",
"totalinbed_m = (total_in_bed // 60) - (totalinbed_h * 60) #see how many mins left\n",
"\n",
"totalsleep_h = total_sleep // 60 // 60\n",
"totalsleep_m = (total_sleep // 60) - (totalsleep_h * 60)\n",
"\n",
"# updated strings\n",
"inbed_h_m = str(totalinbed_h) + \"h \"+ str(totalinbed_m) + 'm'\n",
"sleep_h_m = str(totalsleep_h) + \"h \"+ str(totalsleep_m) + 'm'\n",
"\n",
"#adding the total sleep and time in bed\n",
"plt.figtext(0.3,0,'Total sleep', fontsize=18, ha='center', color ='#959caa', fontweight = 'bold')\n",
"plt.figtext(0.72,0,'Time in bed', fontsize=18, ha='center', color ='#959caa', fontweight = 'bold')\n",
"plt.figtext(0.3,-0.15,sleep_h_m, fontsize=24, ha='center', color ='w', fontweight = 'bold')\n",
"plt.figtext(0.72,-0.15,inbed_h_m, fontsize=24, ha='center', color ='w', fontweight = 'bold')\n",
"\n",
"#create the horizontal lines for the legend\n",
"line = plt.Line2D((-1, 9), (-0.1, -0.1), lw=1.3)\n",
"line2 = plt.Line2D((-1, 9), (-3, -3), lw=1.3)\n",
"plt.gca().add_line(line)\n",
"plt.gca().add_line(line2)\n",
"\n",
"#creating the vertical line using a different and definetly not a neat\n",
"#way since using the same way alters the whole plot\n",
"i = 0.13\n",
"while i > -0.27:\n",
" plt.figtext(0.5,i,'|', fontsize=10, ha='center', color ='w')\n",
" i -= 0.01\n",
"\n",
"plt.show()"
],
"metadata": {
"id": "c9LapMAvv87p"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 4.2 Visualizing Resting Heart Rate "
],
"metadata": {
"id": "GmM7OduLHkug"
}
},
{
"cell_type": "markdown",
"source": [
"Next, we will try can recreate the resting heart rate plot found in the Oura Ring app. The heart rate measured during sleep is visualized and represented in the plot. First, we gather the wake and sleep times and format them. Then, we collect the heart rates and the times at which they were recorded from the api_data dictionary. After this, we plot the graph and edit its aesthetic aspects. Finally, we place labels at the top of the sleep time and the waking up time."
],
"metadata": {
"id": "KMEclahHlXAl"
}
},
{
"cell_type": "markdown",
"source": [
"Here is the visual we are trying to replicate from the app.\n",
"\n",
""
],
"metadata": {
"id": "4gWHQSqVKMsZ"
}
},
{
"cell_type": "code",
"source": [
"the_date = \"2022-03-23\" #@param {type:\"date\"}\n",
"\n",
"#protecting againist dates with no data\n",
"try:\n",
" \n",
" # first we get the bed time and waking up time from the sleep data\n",
" for metric in api_data[\"sleep\"]:\n",
" if metric['summary_date'] == the_date:\n",
" start_time = metric['bedtime_start'][11:19].replace(\":\",\"\")\n",
" start_in_hours = float((start_time)[:2]) + float((start_time)[2:4]) / 60 + float((start_time)[4:6]) / 3600\n",
"\n",
" end_time = metric['bedtime_end'][11:19].replace(\":\",\"\")\n",
" end_in_hours = float((end_time)[:2]) + float((end_time)[2:4]) / 60 + float((end_time)[4:6]) / 3600\n",
" break\n",
"\n",
"\n",
" with plt.style.context('dark_background'):\n",
"\n",
" # getting the bpm and the time stamp of the given date \n",
" bpm, timestamp = [], []\n",
" for elem in api_data['heart_rate']:\n",
" # Here we get the date from the timestamp\n",
" date_temp = elem['timestamp'][0:10] \n",
" \n",
" # checking for entries with the desired date\n",
" if date_temp == the_date:\n",
" # getting times from timestamp\n",
" time_temp = elem['timestamp'][11:19]\n",
" time_string = time_temp.replace(\":\",\"\")\n",
" # converting the times to hours\n",
" time_hours = float((time_string)[:2]) + float((time_string)[2:4]) / 60 + float((time_string)[4:6]) / 3600\n",
" timestamp.append(time_hours)\n",
" bpm.append(elem['bpm'])\n",
" # smoothing the data for visulaization\n",
" filtered_data = gaussian_filter(timestamp, sigma=3)\n",
"\n",
" # creating the plot and setting the background color\n",
" fig = plt.figure(figsize=(6.5,4))\n",
" fig.patch.set_facecolor('#171b1e')\n",
"\n",
" #getting the average bpm\n",
" av = sum(bpm)/len(bpm)\n",
" #creating an average line \n",
" plt.axhline(y=av, dashes = (2,5), linewidth = 1)\n",
"\n",
" plt.plot(filtered_data, bpm, linewidth=3, color='#07c8d8')\n",
"\n",
"\n",
" # setting the x labels\n",
" times = [22,2,6,10,14,18,22]\n",
" plt.xticks(ticks=times, labels=['10 PM', '2 AM', '6 AM', '10 AM', '2 PM', '6 PM','10 PM'])\n",
" \n",
" #setting the minor tics as the sleep time and the wake up time\n",
" start_and_end = [start_in_hours, end_in_hours]\n",
" plt.gca().set_xticks(ticks = start_and_end, minor=True)\n",
"\n",
" # adjusting the label size\n",
" plt.tick_params(axis='x', labelsize=12)\n",
" plt.tick_params(axis='y', labelsize=12)\n",
"\n",
" # removing the borders from four sides\n",
" plt.gca().spines['left'].set_visible(False)\n",
" plt.gca().spines['right'].set_visible(False)\n",
" plt.gca().spines['top'].set_visible(False)\n",
" plt.gca().spines['bottom'].set_visible(False)\n",
"\n",
" # change the y-labels to the right side \n",
" plt.gca().yaxis.tick_right()\n",
"\n",
" #remove the tic dashes\n",
" plt.gca().yaxis.set_tick_params(length=0,labelbottom=False)\n",
"\n",
" # set the colors of the grid\n",
" plt.gca().grid(axis='y', color='#383838', dashes = (8,5))\n",
" plt.gca().grid(axis='x', which = \"major\", visible = False)\n",
" plt.gca().grid(axis='x', which = \"minor\", color = '#383838')\n",
"\n",
" # place a title on the graph\n",
" plt.figtext(0.52,1,'Resting heart rate', fontsize=18, ha='center', color ='w', fontweight=0)\n",
" \n",
" # adjust the facecolor\n",
" plt.gca().set_facecolor('#171b1e')\n",
"\n",
" #increase the limits of the yaxis\n",
" plt.ylim(top=200)\n",
" plt.ylim(bottom=40)\n",
"\n",
" #keeping on only some of the y values by adjusting the labels\n",
" bpms = [60, 100, 140, 180]\n",
" plt.yticks(ticks=bpms, labels=['60', '100', '140','180'])\n",
" \n",
" #placing the labels on the vertical bars\n",
" #first convert to the hours and minutes format\n",
" start_hours = int(start_in_hours) \n",
" start_mins = int((start_in_hours - start_hours) * 60) \n",
" end_hours = int(end_in_hours)\n",
" end_mins = int((end_in_hours - end_hours) * 60) \n",
"\n",
" #formating the minutes\n",
" if start_mins < 10:\n",
" start_mins = '0'+ str(start_mins) \n",
" else:\n",
" start_mins = str(start_mins)\n",
" \n",
" if end_mins < 10:\n",
" end_mins = '0'+ str(end_mins) \n",
" else:\n",
" end_mins = str(end_mins)\n",
"\n",
" #convert to 12_hour system and prepare the labels\n",
" if start_hours == 0:\n",
" startlabel = \"12\" + \":\" + (start_mins) + \" am\"\n",
" elif start_hours < 12:\n",
" startlabel = str(start_hours) + \":\" + (start_mins) + \" am\"\n",
" elif start_hours == 12:\n",
" startlabel = \"12\" + \":\" + (start_mins) + \" pm\"\n",
" else:\n",
" startlabel = str(start_hours-12) + \":\" + (start_mins) + \" pm\"\n",
"\n",
" if end_hours == 0:\n",
" endlabel = \"12\" + \":\" + (end_mins) + \" am\"\n",
" elif end_hours < 12:\n",
" endlabel = str(end_hours) + \":\" + (end_mins) + \" am\"\n",
" elif end_hours == 12:\n",
" endlabel = \"12\" + \":\" + (end_mins) + \" pm\"\n",
" else:\n",
" endlabel = str(end_hours-12) + \":\" + (end_mins) + \" pm\"\n",
"\n",
"\n",
"\n",
" #notice that the graph spans from 0.15 to 0.9 which makes\n",
" #the range equal to 0.75 so we place the label in its\n",
" #right location relative to this range\n",
" plt.figtext(0.15+((start_in_hours/24)*0.75), 0.9,startlabel, fontsize=11, ha='center', color ='w', fontweight = 'bold')\n",
" plt.figtext(0.15+((end_in_hours/24)*0.75), 0.9,endlabel, fontsize=11, ha='center', color ='w', fontweight = 'bold')\n",
"\n",
" plt.show(block=True)\n",
"except:\n",
" print('No data at this date')"
],
"metadata": {
"id": "9zaUqTwPKKFu"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"It looks like we were able to replicate the main elements of the plot with matplotlib!"
],
"metadata": {
"id": "zz6GKIyGm0mH"
}
},
{
"cell_type": "markdown",
"source": [
"## 4.3 Visualizing Body temperature "
],
"metadata": {
"id": "iFG2XEbcmsXL"
}
},
{
"cell_type": "markdown",
"source": [
"Here is another visualization from the Oura app that we will try to replicate.\n",
"To create this plot, first we extract the temperature_delta data and the dates from the the api_data dictionary and save them in separate arrays. After this we create a bar graph, adjust the aesthetic elements to make it look like the original. We finally make the bars have rounded corners, and voila!"
],
"metadata": {
"id": "NkL9_hbwmn6R"
}
},
{
"cell_type": "markdown",
"source": [
"\n",
""
],
"metadata": {
"id": "hfNyxGIWnTtt"
}
},
{
"cell_type": "code",
"source": [
"with plt.style.context('dark_background'):\n",
" \n",
" temperature_delta, summary_date = [], []\n",
"\n",
" for elem in api_data['sleep']:\n",
" temperature_delta.append(elem[\"temperature_delta\"])\n",
" summary_date.append(elem['summary_date'])\n",
" \n",
"\n",
" fig,ax = plt.subplots()\n",
" \n",
" fig.set_size_inches(6, 6)\n",
"\n",
"\n",
" # bar graph\n",
" plt.bar(summary_date, temperature_delta, color = '#96c146', width = 0.3)\n",
"\n",
" plt.tick_params(axis='x', labelsize=12)\n",
" plt.tick_params(axis='y', labelsize=12)\n",
"\n",
" #removing the borders\n",
" plt.gca().spines['left'].set_visible(False)\n",
" plt.gca().spines['right'].set_visible(False)\n",
" plt.gca().spines['top'].set_visible(False)\n",
" plt.gca().spines['bottom'].set_visible(False)\n",
"\n",
" #make the y tics on the right\n",
" plt.gca().yaxis.tick_right()\n",
" \n",
" #change the colors of the grid\n",
" plt.gca().grid(axis='y', color='#1d2021', dashes = (8,5))\n",
" plt.gca().grid(axis='x', color='#1d2021')\n",
" \n",
"\n",
" #remove the tic dashes\n",
" ax.yaxis.set_tick_params(length=0,labelbottom=False)\n",
"\n",
" #put a title \n",
" plt.title(\"Body Temperature\", fontsize=15,fontweight = 'bold')\n",
" \n",
" # making the graph roundish like. from https://stackoverflow.com/questions/58425392/bar-chart-with-rounded-corners-in-matplotlib\n",
" new_patches = []\n",
" for patch in reversed(ax.patches):\n",
" bb = patch.get_bbox()\n",
" color=patch.get_facecolor()\n",
" p_bbox = FancyBboxPatch((bb.xmin/4, bb.ymin),\n",
" abs(bb.width)/2, abs(bb.height),\n",
" boxstyle=\"round,pad=-0.0040,rounding_size=0.015\",\n",
" ec=\"none\", fc=color)\n",
" patch.remove()\n",
" new_patches.append(p_bbox)\n",
" for patch in new_patches:\n",
" ax.add_patch(patch)\n",
" \n",
" #remove x tics\n",
" plt.xticks([])\n",
"\n",
" #increase the limits of the yaxis\n",
" plt.ylim(top=0.6)\n",
" plt.ylim(bottom=-0.6)\n",
"\n",
" plt.show(block=True)\n",
" "
],
"metadata": {
"id": "TY9lZ92uyjKc"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"##4.4 Visualizing Non-Wear Time"
],
"metadata": {
"id": "jHvtw0mAjpJ2"
}
},
{
"cell_type": "markdown",
"source": [
"Although this can't be shown in the Oura app, we are going to see how much time was the Oura Ring wasn't worn here. First, we gather the non-wear time and dates and place them in separate arrays. Then we create a simple bar plot and make it visually appealing using seaborn library. "
],
"metadata": {
"id": "XDzkrJjXk0ib"
}
},
{
"cell_type": "code",
"source": [
"arr1 = []\n",
"arr2 = []\n",
"for metric in api_data[\"activity\"]:\n",
" arr1.append(metric['summary_date'])\n",
"for metric in api_data[\"activity\"]:\n",
" arr2.append(metric['non_wear'])\n",
"\n",
"d = {'Non Wear Time (mins)': arr2, 'Date': arr1}\n",
"df = pd.DataFrame(data=d)\n",
"\n",
"sns.set_theme(style=\"whitegrid\")\n",
"ax = sns.barplot(x=\"Non Wear Time (mins)\", y=\"Date\", data=df)"
],
"metadata": {
"id": "pvD0jpJIlSLS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 4.5 Visualizing Four Parameters In One Plot"
],
"metadata": {
"id": "8zr9nhGaaF3D"
}
},
{
"cell_type": "markdown",
"source": [
"\n",
"Let's overlay medium activity time, REM sleep time, resting time, and non-wear time, all in one plot and investigate the trends among them. It is possible because they are all represented the amount of time spended in each activity.\n",
"We represent each parameter with a different graph color that is illustrated in the graph's legend."
],
"metadata": {
"id": "O7QC1AJaba8B"
}
},
{
"cell_type": "code",
"source": [
"\n",
"\n",
"# first we get those arrays for the wanted parameters\n",
"arr_medium = []\n",
"arr_rem = []\n",
"arr_nonwear = []\n",
"arr_resting = []\n",
"arr_date = []\n",
"\n",
"# Make sure the values plotted correspond to the same days, since sometimes the ring \n",
"# skips certain data of some days, maybe because the user disabled it\n",
"\n",
"hmap = {}\n",
"for metric in api_data[\"sleep\"]:\n",
" hmap[metric[\"summary_date\"]] = metric['rem']\n",
"\n",
"for metric in api_data['activity']:\n",
" if metric['summary_date'] in hmap:\n",
" arr_medium.append(metric['medium']//60)\n",
" arr_nonwear.append(metric['non_wear']//60)\n",
" arr_resting.append(metric['rest']//60)\n",
" arr_rem.append(hmap[metric['summary_date']]//60)\n",
" arr_date.append(metric['summary_date'][5:10])\n",
"\n",
"#creating the plots with the legends\n",
"plt.xlabel('Day')\n",
"plt.ylabel('Time (mins)')\n",
"sns.lineplot(x = arr_date, y = arr_medium, color = 'g', legend = 'auto', label = 'Medium Activity Time')\n",
"sns.lineplot(x = arr_date, y = arr_rem, color = 'r', legend = 'auto', label = 'Rem Sleep Time')\n",
"sns.lineplot(x = arr_date, y = arr_resting, color = 'b', legend = 'auto', label = 'Resting Time')\n",
"sns.lineplot(x = arr_date, y = arr_nonwear, color = 'y', legend = 'auto', label = 'Non-wear Time')\n",
"\n",
"#using seaborn library to make the graph look better\n",
"sns.set(context='notebook', style='whitegrid', font='sans-serif', font_scale=1, color_codes=True)\n",
"\n",
"#resize the figure\n",
"#@title You can change the size of the figure if needed\n",
"width = 11 #@param {type:\"integer\"}\n",
"height = 8 #@param {type:\"integer\"}\n",
"\n",
"plt.rcParams[\"figure.figsize\"] = (width,height) #Set (width, height)\n",
"\n",
"#plotting outliers with a different color\n",
"plt.show(block=True)\n"
],
"metadata": {
"id": "tMZRLOyNaav5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# 5. Data analysis\n",
"\n",
"**(Note that the analyses below may not be 100% biologically or scientifically grounded; the code is here to assist in your process, if you are interested in asking these kinds of questions.)**"
],
"metadata": {
"id": "kitR-nsOpTAC"
}
},
{
"cell_type": "markdown",
"source": [
"## 5.1 Finding Outliers (Anomaly Detection)\n"
],
"metadata": {
"id": "sXM_NjhEzQVV"
}
},
{
"cell_type": "markdown",
"source": [
"We find outliers and remove them in order to get better analysis accuracy by removing the possibility of measurement errors, but at the same time it can affect the result's accuracy since some outliers are true outliers: outliers that is important in the data itelf. Check [this](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7485938) out to learn more about the effects of removing outliers.\n",
"\n",
"There are multiple ways of annotating outliers. According to the [National Institute of Standards and Technology's handbook](https://https://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm#:~:text=An%20outlier%20is%20an%20observation,what%20will%20be%20considered%20abnormal.). We assume that a \"mild outlier\" is a datapoint that is higher than or lower than the first quartile or higher than the third quartile by a distance of the interquartile range multiplied by 1.5 points, while \"extreme outliers\" are ones that distance multipled by 3 points instead.\n",
"\n"
],
"metadata": {
"id": "n9nIU-Xw5Lfw"
}
},
{
"cell_type": "code",
"source": [
"def find_Outliers(category_name, sub_Category_Name):\n",
" # create an array with the data\n",
" arr1 = []\n",
" for metric in api_data[category_name]:\n",
" arr1.append(metric[sub_Category_Name])\n",
" \n",
" # add an outlier data for testing (relevant to sleep data)\n",
" arr1.append(9000)\n",
" \n",
" #find interquartile range\n",
" \n",
" quartiles = numpy.quantile(arr1, [0.25,0.75])\n",
" q1, q3 = quartiles[0],quartiles[1]\n",
" interquartile_Range = q3 - q1\n",
" \n",
" #append outliers\n",
" outliers = []\n",
" for item in arr1:\n",
" if item >= q3+(1.5* interquartile_Range) or item <= q1 - (1.5*interquartile_Range):\n",
" outliers.append(item) \n",
" return outliers"
],
"metadata": {
"id": "Wi9_95kM5Jak"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# testing with the sleep data\n",
"list_Of_Outliers = find_Outliers(\"sleep\", \"rem\")\n",
"print(list_Of_Outliers)"
],
"metadata": {
"id": "KcJOCrLmHCko"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"An alternative way is to use the Z-scores of the data points. Since z-scores represent how many standard deviations away a given observation is from the mean. For example, a Z-score of 2.5 means that the data point is 2.5 standard deviations from the mean. \n",
"\n",
"An arbitrary threshold is usually used in order to detect outliers. In this example, I used 2 and -2 as the thresholds, but you can change that. Usually 3,-3 are the standard cut-off. However, these are somewhat arbitrary can be changed. Check chapters 2.2 and 2.3 [here](http://etheses.dur.ac.uk/2432/1/2432_443.pdf) to learn more."
],
"metadata": {
"id": "OX9I9HaxHTy-"
}
},
{
"cell_type": "code",
"source": [
"def find_Outliers(category_name, sub_Category_Name):\n",
" arr1 = []\n",
" for metric in api_data[category_name]:\n",
" arr1.append(metric[sub_Category_Name])\n",
" \n",
" # add an outlier data for testing (relevant to sleep data)\n",
" arr1.append(9000)\n",
" \n",
" #find z-scores\n",
" z_scores_array = stats.zscore(arr1)\n",
" \n",
" #add outliers\n",
" outliers = []\n",
" i = 0\n",
"\n",
" #@title You can change the size of the figure if needed\n",
" Upper_Threshold = 2 #@param {type:\"integer\"}\n",
" Lower_Threshold = -2 #@param {type:\"integer\"}\n",
"\n",
" for i in range(len(z_scores_array)):\n",
" if z_scores_array[i] >= Upper_Threshold or z_scores_array[i] <= Lower_Threshold: # you can change the threshold here\n",
" outliers.append(arr1[i])\n",
" return outliers"
],
"metadata": {
"id": "HQagXtaFOMWM"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# testing with the sleep data\n",
"list_Of_Outliers = find_Outliers(\"sleep\", \"rem\")\n",
"print(list_Of_Outliers)"
],
"metadata": {
"id": "T9BsfZqZOQYR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Now lets plot the REM data with outlier(s) highlighted in a different color."
],
"metadata": {
"id": "aFqojfBwGqDI"
}
},
{
"cell_type": "code",
"source": [
"arr1 = []\n",
"for metric in api_data[\"sleep\"]:\n",
" arr1.append(metric[\"rem\"])\n",
"\n",
"# add an outlier data for testing (relevant to sleep data)\n",
"arr1.append(9000)\n",
"\n",
"#creating an array for day numbers.\n",
"arr2 = []\n",
"for i in range(len(arr1)):\n",
" arr2.append(i)\n",
"\n",
"#Creating a dataframe with the arrays\n",
"d = {'Rem sleep time': arr1, 'Day #': arr2}\n",
"df = pd.DataFrame(data=d)\n",
"\n",
"#Detecting the outliers\n",
"list_Of_Outliers = find_Outliers(\"sleep\", \"rem\")\n",
"\n",
"#creting a list of days that correspond to the outlier values in rem sleep\n",
"outlier_days = []\n",
"for i in range(len(arr2)):\n",
" if arr1[i] in list_Of_Outliers:\n",
" outlier_days.append(arr2[i])\n",
"\n",
"#creating the plot\n",
"plt.xlabel('Day #')\n",
"plt.ylabel('Rem sleep time')\n",
"plt.scatter(x = df['Day #'], y = df['Rem sleep time'], color = 'b')\n",
"plt.rcParams[\"figure.figsize\"] = (5,5)\n",
"plt.show(block=True)\n",
"\n",
"#rcreating the second plot without highlighting the outlier\n",
"plt.xlabel('Day #')\n",
"plt.ylabel('Rem sleep time')\n",
"plt.scatter(x = df['Day #'], y = df['Rem sleep time'], color = 'b')\n",
"plt.rcParams[\"figure.figsize\"] = (5,5)\n",
"\n",
"\n",
"#plotting outliers with a different color\n",
"plt.scatter(x = outlier_days, y = list_Of_Outliers, color='r')\n",
"sns.set(context='notebook', style='whitegrid', font='sans-serif', font_scale=1, color_codes=True)\n",
"\n",
"plt.show(block=True)\n"
],
"metadata": {
"id": "eeLFLmEWG9l8"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Here appears the injected datapoint in red since it was detected as an outlier"
],
"metadata": {
"id": "OpZ8zxH1T6pW"
}
},
{
"cell_type": "markdown",
"source": [
"## 5.2 Dispersion analysis between day and night readiness scores"
],
"metadata": {
"id": "JlkJfYWfCwMW"
}
},
{
"cell_type": "markdown",
"source": [
"Statistical dispersion is basically a measure of how spread out a set of data points is. It can show how much the data varies accross entries, and it becomes useful in the cases of large datasets where we need to know if the data entry varies more in case A or case B. In this case we are checking whether readiness score has higher dispersion (varies more) during day or at night.\n",
"\n",
"Dispersion has multiple ways of representation. One of them, according to [The National Institute of Statistics and Economic Studies](https://https://www.insee.fr/en/metadonnees/definition/c1366#:~:text=The%20coefficient%20of%20variation%20(CV,generally%20expressed%20as%20a%20percentage.) is the coefficient of variance, also called the relative standard deviation, which is the standard deviation divided by the mean of the set of values."
],
"metadata": {
"id": "Pu7WKViMDLAV"
}
},
{
"cell_type": "code",
"source": [
"# first we prepare the data that we are trying to get dispersion of \n",
"arr_Day = []\n",
"for metric in api_data[\"readiness\"]:\n",
" arr_Day.append(metric[\"score_previous_day\"])\n",
" \n",
"arr_Night = []\n",
"for metric in api_data[\"readiness\"]:\n",
" arr_Night.append(metric[\"score_previous_night\"])\n",
"\n",
"# calculate standard deviation and mean\n",
"standard_deviation_day = numpy.std(arr_Day)\n",
"standard_deviation_night = numpy.std(arr_Night)\n",
"mean_day = numpy.mean(arr_Day)\n",
"mean_night = numpy.mean(arr_Night)\n",
"\n",
"# calculate variance coefficient\n",
"variability_coefficient_day = standard_deviation_day / mean_day\n",
"variability_coefficient_Night = standard_deviation_night / mean_night\n",
"\n",
"if variability_coefficient_day > variability_coefficient_Night:\n",
" print('Day data variability is higher')\n",
"else:\n",
" print('Night data variability is higher')"
],
"metadata": {
"id": "iGL6YPXtUVct"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"From the output, it's evident that the coefficient of variability of the night readiness score data is higher than that of the day which means that the values are have higher statistical dispersion. Let's visualize this"
],
"metadata": {
"id": "x3Eispk_bGRK"
}
},
{
"cell_type": "code",
"source": [
"#preparing a dataframe to visualize the data\n",
"d = {'day score': arr_Day, 'night score': arr_Night}\n",
"df = pd.DataFrame(data=d)\n",
"\n",
"# creating a relplot with seaborn library, using the data frame and specifying the x, and y values\n",
"graph = sns.relplot(data=df, y=\"day score\", x=\"night score\")\n",
"\n",
"# mapping the graph and showing it\n",
"graph.map(plt.scatter, \"night score\",\"day score\", edgecolor =\"w\").add_legend()\n",
"sns.set(context='notebook', style='whitegrid', font='sans-serif', font_scale=1, color_codes=True)\n",
"\n",
"plt.show(block=True)"
],
"metadata": {
"id": "XtTTJOJObiRS"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"This is very obvious visually at this small sample; most of the day scores are concentrated at the area between 90 and 95 while night scores vary more, but we are performing this analysis as a case for larger samples where dispersion can't be compared visually."
],
"metadata": {
"id": "FhbmLsbhbq2T"
}
},
{
"cell_type": "markdown",
"source": [
"## 5.3 Correlation between equivalent walking distance and the amount of deep sleep\n",
"\n"
],
"metadata": {
"id": "15F0bTsnPuI8"
}
},
{
"cell_type": "markdown",
"source": [
"Let's perform some more analysis on the data trying to find correlations between different data categories that may be statistically significant relationship. The first hypothesis that comes to mind is that exercise throughout the day may somehow have a correlation with the amount of deep sleep the person gets at night. Let's see if this is true."
],
"metadata": {
"id": "r8g7-nd2h3bx"
}
},
{
"cell_type": "code",
"source": [
"# First identify the outliers to remove them\n",
"sleep_Outliers = find_Outliers(\"sleep\", \"deep\")\n",
"walking_distance_outliers = find_Outliers(\"daily_activity\", \"equivalent_walking_distance\")\n",
"\n",
"# Make sure the values plotted correspond to the same days, since sometimes the ring \n",
"# skips certain data of some days, maybe because the user disabled it\n",
"hmap = {}\n",
"for metric in api_data[\"daily_activity\"]:\n",
" hmap[metric[\"day\"]] = metric[\"equivalent_walking_distance\"]\n",
"\n",
"arr1 = []\n",
"arr2 = []\n",
"for metric in api_data[\"sleep\"]:\n",
" if metric[\"summary_date\"] in hmap:\n",
" # remove the outliers\n",
" if metric[\"deep\"] not in sleep_Outliers and hmap[metric[\"summary_date\"]] not in walking_distance_outliers:\n",
" arr1.append(metric[\"deep\"]/3600) #converting to hours\n",
" arr2.append(hmap[metric[\"summary_date\"]])\n",
"\n",
"# plot the data \n",
"d = {'deep sleep (hours)': arr1, 'walking distance (steps)': arr2}\n",
"df = pd.DataFrame(data=d)\n",
"\n",
"graph = sns.lmplot(data=df, y=\"deep sleep (hours)\", x=\"walking distance (steps)\")\n",
"\n",
"graph.map(plt.scatter, \"walking distance (steps)\",\"deep sleep (hours)\", edgecolor =\"w\").add_legend()\n",
"sns.set(context='notebook', style='whitegrid', font='sans-serif', font_scale=1, color_codes=True)\n",
"\n",
"plt.show(block=True)"
],
"metadata": {
"id": "tXqFzFZeie3I"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"We then calculate the p_value, which is the probability a null hypothesis is true, to see if there is a correlation.\n",
"\n"
],
"metadata": {
"id": "63LNetHlwKfF"
}
},
{
"cell_type": "code",
"source": [
"slope, intercept, r_value, p_value, std_err = stats.linregress(arr1,arr2)\n",
"print(p_value)"
],
"metadata": {
"id": "Pb2zUFDGvh2I"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"But it appears that it is statistically insignificant since it is >0.05, \n",
"since there is a 75.5% chance this data may have occured by chance. So the original hypothesis is false, but the same methodology can be applied to almost every other data category to check whether there is a statistically significant correlation or not.\n"
],
"metadata": {
"id": "9E2ahm_Bw4HI"
}
},
{
"cell_type": "markdown",
"source": [
"## 5.4 Correlation between resting time and sleep score "
],
"metadata": {
"id": "hP5j1fVJ24pV"
}
},
{
"cell_type": "markdown",
"source": [
"Here is a different hyptotheisis that the amount of resting time during the day has a correlation between the sleep overall score which represents its quality.\n",
"With a few changes to the code, we can test for this hypothesis."
],
"metadata": {
"id": "byfJ4oty3H98"
}
},
{
"cell_type": "code",
"source": [
"# First identify the outliers to remove them\n",
"sleep_Outliers = find_Outliers(\"sleep\", \"score\")\n",
"resting_time_outliers = find_Outliers(\"daily_activity\", \"resting_time\")\n",
"\n",
"# Make sure the values plotted correspond to the same days, since sometimes the ring \n",
"# skips certain data of some days, maybe because the user disabled it\n",
"hmap = {}\n",
"for metric in api_data[\"daily_activity\"]:\n",
" hmap[metric[\"day\"]] = metric[\"resting_time\"]\n",
"\n",
"arr1 = []\n",
"arr2 = []\n",
"for metric in api_data[\"sleep\"]:\n",
" if metric[\"summary_date\"] in hmap:\n",
" # remove the outliers\n",
" if metric[\"score\"] not in sleep_Outliers and hmap[metric[\"summary_date\"]] not in resting_time_outliers:\n",
" arr1.append(metric[\"score\"]) \n",
" arr2.append(hmap[metric[\"summary_date\"]]/3600) #converting to hours\n",
"\n",
"# plot the data\n",
"d = {'sleep score': arr1, 'resting time(hours)': arr2}\n",
"df = pd.DataFrame(data=d)\n",
"graph = sns.lmplot(data=df, y=\"sleep score\", x=\"resting time(hours)\")\n",
"graph.map(plt.scatter, \"resting time(hours)\",\"sleep score\", edgecolor =\"w\").add_legend()\n",
"sns.set(context='notebook', style='whitegrid', font='sans-serif', font_scale=1, color_codes=True)\n",
"\n",
"plt.show(block=True)"
],
"metadata": {
"id": "YiIaoJvB3W8P"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"slope, intercept, r_value, p_value, std_err = stats.linregress(arr1,arr2)\n",
"print(p_value)"
],
"metadata": {
"id": "_PI-np_53qle"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The P-value is ~0.009 < 0.05 meaning there is only a ~0.9% probability that this data was arbitrary, which implies statistical significance - the amount of resting time during the day actually affects the sleep score (quality) and there is actual statistical significance here!"
],
"metadata": {
"id": "PP72J6Qz3ucG"
}
}
]
}