Guide to Extracting Data w/ APIs from the Fitbit Sense§

fitbit sense.png

A picture of the Fitbit Sense

At a price of around 250$, the Fitbit Sense is a sleep and physical activity tracker that tracks stress, exercise, ECG, SpO2 and more!

This is a comprehensive, clear guide to extract data from the Fitbit Sense using the Fitbit Web API. Links to external resources and official Fitbit documentation are provided throughout the guide for further reference.

If you want to know more about the Fitbit Sense, see the README for a detailed analysis of performances, sensors, data privacy, and extraction pipelines.

A list of the most important accessible data categories is provided below, For the full list, access the api data in section 3.

Category Name (API version)

Parameter Name (subcategory)

Frequency of Sampling

sleep

date

during the night

sleep

duration

during the night

sleep

efficiency

during the night

sleep

end time

during the night

sleep

sleep levels

during the night

steps

date and time

daily

steps

value (number of steps)

daily

minutesVeryActive

date and time

daily

minutesVeryActive

value

daily

minutesFairlyActive

date and time

daily

minutesFairlyActive

value

daily

minutesLightlyActive

date and time

daily

minutesLightlyActive

value

daily

distance moved

date and time

daily

distance moved

value

daily

minutesSedentary

date and time

daily

minutesSedentary

value

daily

heart rate

resting heart rate

daily (per minute)

heart rate

heart rate zones

daily (per minute)

heart rate

heart rate variability

during sleep (per minute)

temperature

skin temperature

daily

temperature

core temperature

daily

Spo2

date and time

during sleep

Spo2

value

during sleep

In this guide, we sequentially cover the following five topics to extract data from the Fitbit API:

  1. Setup

    • 1.1: Study participant setup and usage

    • 1.2: Library imports

  2. Authentication/Authorization

    • 2.1: Regestering an application

    • 2.2: Authorizing the app

    • 2.3: Retrieving The Authorization Code

    • 2.4: Calling the API

  3. Data extraction

    • Select the dates

  4. Data visualization

    • 4.1: Visualizing Non-Wear days and filtering the data from them

    • 4.2: Visualizing Heart Rate

    • 4.2: Visualizing distance moved

  5. Data analysis

    • 5.1: Finding Outliers (Anomaly Detection). We provide two ways to find outliers in any set of output data.

    • 5.2: Checking for correlation between distance moved walked and heart rate

*Note: Full documentation of APIs by Fitbit can be found here.

Setup§

1.1 Study participant setup and usage§

After creating your Fitbit account, charging the watch and connecting it to your account, download the Fitbit app from the appstore/playstore and start using your Fitbit. You will see that the data is being collected and visualized in the app. Once you have some data, it’s easy to access them through the Fitbit Web API if you follow the notebook.

1.2 Library imports§

[ ]:
import base64
import hashlib
import html
import json
import os
import re
import urllib.parse
import requests
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
from sklearn.covariance import EllipticEnvelope
from scipy import stats

Authentication/Authorization§

To obtain access to the data using the Web API, authentication and authorization is required. Fitbit supports the OAuth 2.0 protocol, with three different models (read more about it here).

Briefly, Fitbit offers three workflows for their Web APIs in order from lowest level of security to the highest: * Implicit Grant Flow * Authorization Code Grant Flow * Authorization Code Grant Flow with PKCE (Proof Key for Code Exchange)

We will discuss the Authorization Code Grant Flow.

The full documentation for all workflows are provided here.

*Note: Tokens have a specified TTL (time to live) determined earlier by the expires parameter. Once that time is over, a new token must be issued.

2.1 Registering An Application§

First, register an application on here while logged in. OAuth 2.0 Application Type should be set to Client or Personal and the Callback URL is the address through which you can receive your token (https://127.0.0.1/, also known as the localhost, is provided as an example, but any link accessible locally should suffice; 8080 is the port). Other sections can be filled without particular specifications (e.g. https://google.com for all website links). An image with the important sections highlight are provided below for clarity.

1. Register Application.PNG

The client_id and client_secret can be accessed under Manage My Apps as shown below.

p2.png

Afterwards, we initialize a dictionary to hold all variables relevant to authenticating, authorizing, and calling the API later.

[ ]:
code_verifier = base64.urlsafe_b64encode(
    os.urandom(43)
).decode("utf-8") if "code_verifier" not in locals() else code_verifier
code_challenge = base64.urlsafe_b64encode(
    hashlib.sha256(code_verifier.encode("utf-8")).digest()
).decode("utf-8").replace('=', '')

variables = dict()

# user specified
variables["client_id"] = "238RJ5"
variables["client_secret"] = "f56c7bc6d3cce8cc78edf723ce37f444"
variables["expires_in"] = "31536000"  # expiry of token in seconds

# constants or one-time generated
variables["code_verifier"] = code_verifier
variables["code_challenge"] = code_challenge
variables["code_challenge_method"] = "S256"
variables["response_type"] = "token"  # code
variables["scope"] = (
    "weight%20location%20settings%20profile%20nutrition%20" +
    "activity%20sleep%20heartrate%20social"
)
variables["prompt"] = "none"
variables["redirect_uri"] = "https%3A%2F%2F127.0.0.1%3A8080%2F"
variables["grant_type"] = "authorization_code"
variables["authorization"] = base64.urlsafe_b64encode(
    bytes(variables["client_id"] + ":" + variables["client_secret"], "utf-8")
).decode("utf-8")

2.2 Authorize the app§

Next, we display Fitbit’s authorization page by typing a specific URL on a web browser. A code challenge and code verifier is required to progress further. The concept is comprehensively outlined here.

The URL should consist of the following required parameters (split into “variable” and “non-variable” parameters).

  • Variable parameters

  • client_id: Fitbit API application ID (under manage my apps specified in 2.1, link here, requires login)

  • code_challenge: base64url-encoded SHA256 hash of the code verifier, can be obtained here

  • code_challenge_method: S256

  • Non-variable parameters

  • scope: space-delimited list of data collections requested by the application

  • response_type: code

The resulting URL is demonstrated below.

[ ]:
# combine all parameters into the url string
url = "https://www.fitbit.com/oauth2/authorize"  # authorization endpoint
for key in ["client_id", "redirect_uri", "code_challenge", "code_challenge_method", "scope", "response_type", "expires_in"]:
    if url == "https://www.fitbit.com/oauth2/authorize":
        url += "?" + key + "=" + variables[key]
    else:
        url += "&" + key + "=" + variables[key]

print(url)
https://www.fitbit.com/oauth2/authorize?client_id=238RJ5&redirect_uri=https%3A%2F%2F127.0.0.1%3A8080%2F&code_challenge=GlrjTiiXE-cZe3MGpBSjMQeYMiJcGEs6dGk26hujl-E&code_challenge_method=S256&scope=weight%20location%20settings%20profile%20nutrition%20activity%20sleep%20heartrate%20social&response_type=token&expires_in=31536000

Click the URL above to access the Authorization page. Check Allow All and click the Allow button in red as shown below

p3.png

2.3 Retrieving The Authorization Code§

If the response_type is token, an access_token is provided to you as part of the url.

https://127.0.0.1:8080/#access_token=eyJhbGciOiJIUzI1NiJ9.eyJhdWQiOiIyMzhSSjUiLCJzdWIiOiI5RlJHUFQiLCJpc3MiOiJGaXRiaXQiLCJ0eXAiOiJhY2Nlc3NfdG9rZW4iLCJzY29wZXMiOiJyc29jIHJzZXQgcmFjdCBybG9jIHJ3ZWkgcmhyIHJwcm8gcm51dCByc2xlIiwiZXhwIjoxNjkyODMzNDcwLCJpYXQiOjE2NjEyOTc0NzB9.9pcX9pYrk-si6OsvAFz68MvTVm21S1OZUna921g46NA&user_id=9FRGPT&scope=sleep+profile+weight+heartrate+location+settings+nutrition+activity+social&token_type=Bearer&expires_in=31536000

In the example above, the access_token is eyJhbGciOiJIUzI1NiJ9.eyJhdWQiOiIyMzhSSjUiLCJzdWIiOiI5RlJHUFQiLCJpc3MiOiJGaXRiaXQiLCJ0eXAiOiJhY2Nlc3NfdG9rZW4iLCJzY29wZXMiOiJyc29jIHJzZXQgcmFjdCBybG9jIHJ3ZWkgcmhyIHJwcm8gcm51dCByc2xlIiwiZXhwIjoxNjkyODMzNDcwLCJpYXQiOjE2NjEyOTc0NzB9.9pcX9pYrk-si6OsvAFz68MvTVm21S1OZUna921g46NA.

Store the access_token inside the variables dictionary

[ ]:
variables["access_token"] = 'eyJhbGciOiJIUzI1NiJ9.eyJhdWQiOiIyMzhSSjUiLCJzdWIiOiI5RlJHUFQiLCJpc3MiOiJGaXRiaXQiLCJ0eXAiOiJhY2Nlc3NfdG9rZW4iLCJzY29wZXMiOiJyc29jIHJzZXQgcmFjdCBybG9jIHJ3ZWkgcmhyIHJwcm8gcm51dCByc2xlIiwiZXhwIjoxNjkyODMzNDcwLCJpYXQiOjE2NjEyOTc0NzB9.9pcX9pYrk-si6OsvAFz68MvTVm21S1OZUna921g46NA'

2.4 Calling the API§

[ ]:
# executes a GET request on the API
def call_API(
    access_token: str,
    url: str,
    call: str = "GET"
):
    headers = {
        "Authorization": "Bearer " + access_token
    }
    return requests.request(
        call, url=url, headers=headers).json()
[ ]:
# calls user profile
call_API(
    access_token=variables["access_token"],
    url="https://api.fitbit.com/1/user/-/profile.json")
{'user': {'age': 31,
  'ambassador': False,
  'autoStrideEnabled': True,
  'avatar': 'https://static0.fitbit.com/images/profile/defaultProfile_100.png',
  'avatar150': 'https://static0.fitbit.com/images/profile/defaultProfile_150.png',
  'avatar640': 'https://static0.fitbit.com/images/profile/defaultProfile_640.png',
  'averageDailySteps': 14055,
  'challengesBeta': True,
  'clockTimeDisplayFormat': '12hour',
  'corporate': False,
  'corporateAdmin': False,
  'dateOfBirth': '1991-06-29',
  'displayName': 'Alexander J.',
  'displayNameSetting': 'name',
  'distanceUnit': 'en_US',
  'encodedId': '9FRGPT',
  'features': {'exerciseGoal': True},
  'firstName': 'Alexander',
  'foodsLocale': 'en_US',
  'fullName': 'Alexander Johansen',
  'gender': 'MALE',
  'glucoseUnit': 'en_US',
  'height': 182.8,
  'heightUnit': 'en_US',
  'isBugReportEnabled': False,
  'isChild': False,
  'isCoach': False,
  'languageLocale': 'en_US',
  'lastName': 'Johansen',
  'legalTermsAcceptRequired': False,
  'locale': 'en_US',
  'memberSince': '2021-06-07',
  'mfaEnabled': False,
  'offsetFromUTCMillis': -21600000,
  'sdkDeveloper': False,
  'sleepTracking': 'Normal',
  'startDayOfWeek': 'SUNDAY',
  'strideLengthRunning': 114.80000000000001,
  'strideLengthRunningType': 'auto',
  'strideLengthWalking': 75.9,
  'strideLengthWalkingType': 'auto',
  'swimUnit': 'en_US',
  'temperatureUnit': 'en_US',
  'timezone': 'America/Chicago',
  'topBadges': [{'badgeGradientEndColor': 'B0DF2A',
    'badgeGradientStartColor': '00A550',
    'badgeType': 'DAILY_STEPS',
    'category': 'Minions Badges',
    'cheers': [],
    'dateTime': '2022-12-12',
    'description': '22,222 steps in a day',
    'earnedMessage': 'Congrats on earning your first Minions: Kevin badge!',
    'encodedId': '22B656',
    'image100px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/100px/badge_daily_steps22222.png',
    'image125px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/125px/badge_daily_steps22222.png',
    'image300px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/300px/badge_daily_steps22222.png',
    'image50px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/badge_daily_steps22222.png',
    'image75px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/75px/badge_daily_steps22222.png',
    'marketingDescription': 'Cheers! Way to make it a mission to move. Next up: Take 32,100 steps in a day to get the Bob Badge.',
    'mobileDescription': 'Cheers! Way to make it a mission to move.',
    'name': 'Minions: Kevin (22,222 steps in a day)',
    'shareImage640px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/386px/shareLocalized/en_US/badge_daily_steps22222.png',
    'shareText': 'I took 22,222 steps in a day',
    'shortDescription': '22,222 steps in a day',
    'shortName': 'Minions: Kevin',
    'timesAchieved': 2,
    'value': 22222},
   {'badgeGradientEndColor': '38D7FF',
    'badgeGradientStartColor': '2DB4D7',
    'badgeType': 'LIFETIME_DISTANCE',
    'category': 'Lifetime Distance',
    'cheers': [],
    'dateTime': '2022-07-23',
    'description': '70 lifetime miles',
    'earnedMessage': "Whoa! You've earned the Penguin March badge!",
    'encodedId': '22B8M7',
    'image100px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/100px/badge_lifetime_miles70.png',
    'image125px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/125px/badge_lifetime_miles70.png',
    'image300px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/300px/badge_lifetime_miles70.png',
    'image50px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/badge_lifetime_miles70.png',
    'image75px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/75px/badge_lifetime_miles70.png',
    'marketingDescription': "By reaching 70 lifetime miles, you've earned the Penguin March badge!",
    'mobileDescription': 'You matched the distance of the March of the Penguins—the annual trip emperor penguins make to their breeding grounds.',
    'name': 'Penguin March (70 lifetime miles)',
    'shareImage640px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/386px/shareLocalized/en_US/badge_lifetime_miles70.png',
    'shareText': 'I covered 70 miles with my #Fitbit and earned the Penguin March badge.',
    'shortDescription': '70 miles',
    'shortName': 'Penguin March',
    'timesAchieved': 1,
    'unit': 'MILES',
    'value': 70},
   {'badgeGradientEndColor': '38D7FF',
    'badgeGradientStartColor': '2DB4D7',
    'badgeType': 'DAILY_FLOORS',
    'category': 'Daily Climb',
    'cheers': [],
    'dateTime': '2022-08-25',
    'description': '50 floors in a day',
    'earnedMessage': 'Congrats on earning your first Lighthouse badge!',
    'encodedId': '228TT7',
    'image100px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/100px/badge_daily_floors50.png',
    'image125px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/125px/badge_daily_floors50.png',
    'image300px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/300px/badge_daily_floors50.png',
    'image50px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/badge_daily_floors50.png',
    'image75px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/75px/badge_daily_floors50.png',
    'marketingDescription': "You've climbed 50 floors to earn the Lighthouse badge!",
    'mobileDescription': "With a floor count this high, you're a beacon of inspiration to us all!",
    'name': 'Lighthouse (50 floors in a day)',
    'shareImage640px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/386px/shareLocalized/en_US/badge_daily_floors50.png',
    'shareText': 'I climbed 50 flights of stairs and earned the Lighthouse badge! #Fitbit',
    'shortDescription': '50 floors',
    'shortName': 'Lighthouse',
    'timesAchieved': 4,
    'value': 50},
   {'badgeGradientEndColor': 'FFDB01',
    'badgeGradientStartColor': 'D99123',
    'badgeType': 'LIFETIME_FLOORS',
    'category': 'Lifetime Climb',
    'cheers': [],
    'dateTime': '2022-09-06',
    'description': '500 lifetime floors',
    'earnedMessage': "Yipee! You've earned the Helicopter badge!",
    'encodedId': '228TB8',
    'image100px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/100px/badge_lifetime_floors500.png',
    'image125px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/125px/badge_lifetime_floors500.png',
    'image300px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/300px/badge_lifetime_floors500.png',
    'image50px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/badge_lifetime_floors500.png',
    'image75px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/75px/badge_lifetime_floors500.png',
    'marketingDescription': "By climbing 500 lifetime floors, you've earned the Helicopter badge!",
    'mobileDescription': "You've reached the altitude of a helicopter. Upward and onward as you soar to new heights!",
    'name': 'Helicopter (500 lifetime floors)',
    'shareImage640px': 'https://www.gstatic.com/fitbit/badge/images/badges_new/386px/shareLocalized/en_US/badge_lifetime_floors500.png',
    'shareText': 'I climbed 500 floors with my #Fitbit and earned the Helicopter badge.',
    'shortDescription': '500 floors',
    'shortName': 'Helicopter',
    'timesAchieved': 1,
    'value': 500}],
  'waterUnit': 'en_US',
  'waterUnitName': 'fl oz',
  'weight': 82.5,
  'weightUnit': 'en_US'}}

Data Extraction§

Now, we can extract data by calling the API using the call_API() function. A full list of data types and endpoints are available here.

In brief, the categories are: * Activity * Activity Intraday Time Series * Activity Time Series * Body and Weight * Body and Weight Time Series * Devices * Food and Water * Food and Water Time Series * Friends * Heart Rate Intraday Time Series * Heart Rate Time Series * Sleep * Subscriptions * User

[ ]:
#@title Set up the start and end dates (YYYY-MM-DD) or a single date for day's data extraction
start_date = "2022-07-01" #@param {type:"string"}
end_date = "2022-08-30" #@param {type:"string"}
single_date = "2022-12-11" #@param {type:"string"}

# store arguments for some of the categories
categories = {
    "sleep": {
        "url": "https://api.fitbit.com/1.2/user/-/sleep/date/" + start_date + "/" + end_date + ".json"},
    "steps": {
        "url": "https://api.fitbit.com/1/user/-/activities/steps/date/" + start_date + "/" + end_date + ".json"},
    "minutesVeryActive": {
        "url": "https://api.fitbit.com/1/user/-/activities/minutesVeryActive/date/" + start_date + "/" + end_date + ".json"},
    "minutesFairlyActive": {
        "url": "https://api.fitbit.com/1/user/-/activities/minutesFairlyActive/date/" + start_date + "/" + end_date + ".json"},
    "minutesLightlyActive": {
        "url": "https://api.fitbit.com/1/user/-/activities/minutesLightlyActive/date/" + start_date + "/" + end_date + ".json"},
    "distance": {
        "url": "https://api.fitbit.com/1/user/-/activities/distance/date/" + start_date + "/" + end_date + ".json"},
    "minutesSedentary": {
        "url": "https://api.fitbit.com/1/user/-/activities/minutesSedentary /date/" + start_date + "/" + end_date + ".json"},
    'heart_rate_day':{
        'url': 'https://api.fitbit.com/1/user/-/activities/heart/date/'+ single_date +'/1d.json'},
    'hrv':{
        'url': 'https://api.fitbit.com/1/user/-/hrv/date/'+ single_date +".json"},
    'distance_day':{
        'url': "https://api.fitbit.com/1/user/-/activities/distance/date/" + single_date + "/1d.json"},
    'ECG':{
        'url': "https://api.fitbit.com/1/user/-/ecg/list.json?afterDate="+ single_date +"&sort=asc&limit=1&offset=0"},
    }


# initialize empty dictionary to aggregate values
api_data = dict()

# loop api calls for all categories
for category, values in categories.items():
    response = call_API(
        url=values["url"],
        access_token=variables["access_token"]
    )

    api_data[category] = [response]

# initalize metadata with information for all api_data value keys
meta_api_data = {i:{j for j in api_data[i][0].keys()} for i in api_data.keys()}
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-8-0ede11f3d0ca> in <module>
     36 # loop api calls for all categories
     37 for category, values in categories.items():
---> 38     response = call_API(
     39         url=values["url"],
     40         access_token=variables["access_token"]

<ipython-input-5-54922d78684a> in call_API(access_token, url, call)
      8         "Authorization": "Bearer " + access_token
      9     }
---> 10     return requests.request(
     11         call, url=url, headers=headers).json()

/usr/local/lib/python3.8/dist-packages/requests/models.py in json(self, **kwargs)
    896                     # used.
    897                     pass
--> 898         return complexjson.loads(self.text, **kwargs)
    899
    900     @property

/usr/lib/python3.8/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    355             parse_int is None and parse_float is None and
    356             parse_constant is None and object_pairs_hook is None and not kw):
--> 357         return _default_decoder.decode(s)
    358     if cls is None:
    359         cls = JSONDecoder

/usr/lib/python3.8/json/decoder.py in decode(self, s, _w)
    335
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

/usr/lib/python3.8/json/decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)
[ ]:
meta_api_data
[ ]:
api_data

Data Visualization§

4.1 Visualizing Non-Wear Days and Filtering The Data§

Here we are going to visualize what days data wasn’t collected. Since the days in which data wasn’t collected show “zero” values for the data, then filter this out to increase the accuracy of the analysis. This is because zero values will be consided data points in the analysis which will alter the results.

So we will create a function that can come later in aid to delete the data which have no value, but first lets take a look at our sample of data.

[ ]:
# First we are going to aggregate the data in arrays. We are taking steps as an example here
dates = []
steps = []

for datapoint in api_data['steps'][0]['activities-steps']:
    dates.append(datapoint['dateTime'])
    steps.append(float(datapoint['value']))

#Using a pandas dataframe to aggregate the data
d = {'Date': dates, 'steps': steps}
df = pd.DataFrame(data=d)

with plt.style.context('fivethirtyeight'):
    #Creating the plot
    #Resizing the plot
    fig,ax = plt.subplots()
    fig.set_size_inches(12,10)

    sns.set_theme(style="dark")
    ax = sns.barplot(x="Date", y="steps", data=df, palette = 'Dark2_r' )

    # adjust tick sizes
    plt.tick_params(axis='x', labelsize=8)
    plt.tick_params(axis='y', labelsize=8)

    # rotates and right-aligns the x labels so they don't crowd each other.
    for label in ax.get_xticklabels(which='major'):
        label.set(rotation=90, horizontalalignment='right')
    plt.show()

As shown above, alot of the data is not recorded, and that’s what we don’t want to include in the analysis so lets filter this out.

[ ]:
#since the data will consist of multiple arrays carrying multiple data points,
# we will create the function such that it gets the list of arrays as a parameter
# and the refrence index for the array to examine the data from

def remove_non_wear(lst, refrence_index):
  newlst = []
  for i in range(len(lst)):
    newlst.append([])

  for index in range(len(lst[refrence_index])):
    if lst[refrence_index][index] != 0:
      for array in lst:
        newlst[lst.index(array)].append(array[index])
  return newlst
[ ]:
# Lets put that to test

#Now let's test this
new_arrays = remove_non_wear([dates, steps], 1)

#creating a new plot with the new data
dates = new_arrays[0]
steps = new_arrays[1]

#creating a new plot with the new data
d = {'Date': dates, 'steps': steps}
df = pd.DataFrame(data=d)

with plt.style.context('fivethirtyeight'):
    #Resizing the plot
    fig,ax = plt.subplots()
    fig.set_size_inches(5,6)

    sns.set_theme(style="dark")
    ax = sns.barplot(x="Date", y="steps", data=df, palette = 'Dark2_r' )

    # adjust tick sizes
    plt.tick_params(axis='x', labelsize=8)
    plt.tick_params(axis='y', labelsize=8)

    # rotates and right-aligns the x labels so they don't crowd each other.
    for label in ax.get_xticklabels(which='major'):
        label.set(rotation=90, horizontalalignment='right')
    plt.show()

Initially, you might think that the data on 22-07-25 wasn’t filtered out but lets check if it has a non-zero value

[ ]:
print(steps)

The number of steps is only 11 which indicates that the fitbit was taken off most of the day. It is considered an outlier value for this reason and we will filter this out in section 5.1.

Now that we know this filteration works, lets recreate some visuals from the app

4.2 Visualizing Heart Rate§

Here, we are going to replicate the following visual from the fitbit app that presents the heart rate in a single day.

p1.jpeg

First we are going to collect the data in arrays then create the plot and make it look like the original plot. At the end, we place the labels and adjust the font sizes.

[ ]:
# First we are going to aggregate the data in arrays. We are taking steps as an example here
time = []
heart_rate = []

for datapoint in api_data['heart_rate_day'][0]['activities-heart-intraday']['dataset']:
    time.append(int(datapoint['time'][0:2]) * 60 + int(datapoint['time'][3:5]))
    heart_rate.append(float(datapoint['value']))

# Lets smooth out the data the way I think fitbit does something similar to
new_heart_rate = []
new_time = []
temp_time = []
counter = 0
for i in range(len(time)):
    temp_time.append(time[i])
    if counter % 9 == 0 or i == len(time)-1:
       new_heart_rate.append(heart_rate[i])
       new_time.append(sum(temp_time) / len(temp_time))
       temp_time = []
    counter+=1



with plt.style.context('dark_background'):

      # creating the plot and setting the background color
      fig = plt.figure(figsize=(9, 5))
      fig.patch.set_facecolor('#0a7192')
      # adjust the facecolor
      plt.gca().set_facecolor('#0a7192')

      plt.plot(new_time, new_heart_rate, linewidth=3, color='#4fbecc')


      # setting the width of the X axis
      times = [0, 24*60]
      plt.xticks(ticks=times, labels=['', ''])

      #keeping on only some of the y values by adjusting the labels
      bpms = [54, 74, 94]
      plt.yticks(ticks=bpms, labels=['54', '74', '94'])

      # removing the borders from four sides
      plt.gca().spines['left'].set_visible(False)
      plt.gca().spines['right'].set_visible(False)
      plt.gca().spines['top'].set_visible(False)
      plt.gca().spines['bottom'].set_visible(False)

      # adjusting the label size
      plt.tick_params(axis='y', labelsize=16)

      # set the colors of the grid
      plt.gca().grid(axis='y', color='#1f8ba0')


      #adding labels
      plt.figtext(0.5,1.0, 'Monday', fontsize=22, ha='center', color ='w', fontweight = 'bold')
      plt.figtext(0.5,0.90, 'Beats Per Minute', fontsize=17, ha='center', color ='w')
      plt.figtext(0.5,0.03, 'NOON', fontsize=18, ha='center', color ='w', fontweight = 'bold')
      plt.figtext(0.15,0.02, '12:00 AM', fontsize=18, ha='center', color ='w', fontweight = 'bold')
      plt.figtext(0.85,0.02, '11:59 PM', fontsize=16, ha='center', color ='w', fontweight = 'bold')




Now it looks very similar to the original plot!

4.3 Visualizing Distance§

Now lets replicate this next visual which plots the distance walked in a certain day.

WhatsApp Image 2022-08-23 at 8.07.11 PM.jpeg

To build this plot, we are going to collect the data in arrays, plot them with matplotlib, and edit the aesthetics to look similar to the original. One interesting thing that fitbit does is that it divides the day into sections and adds the data in those sections then visualizes them. That’s what we are going to do despite that it may produce different aggregates numbers.

[ ]:
# First we are going to aggregate the data in arrays with only 96 entries
# Since we have 1440 datapoint, each of the 96 entries will the sum of 15 of those

time = []
distances = [0]*96

counter = 0
entry = 0
for datapoint in api_data['distance_day'][0]['activities-distance-intraday']['dataset']:
    if entry == 96:
      break
    distances[entry] += float(datapoint['value'])
    if counter % 15 == 0:
        time.append(int(datapoint['time'][0:2]) * 60 + int(datapoint['time'][3:5]))
        entry += 1
    counter += 1



with plt.style.context('dark_background'):

      # creating the plot and setting the background color
      fig = plt.figure(figsize=(9, 5))
      fig.patch.set_facecolor('#02575c')
      # adjust the facecolor
      plt.gca().set_facecolor('#02575c')

      plt.bar(time, distances, color = 'w', edgecolor = '#80a9ab', width = 15)

      # removing the borders from four sides
      plt.gca().spines['left'].set_visible(False)
      plt.gca().spines['right'].set_visible(False)
      plt.gca().spines['top'].set_visible(False)
      plt.gca().spines['bottom'].set_visible(False)

      #keeping on only some of the y values by adjusting the labels
      bpms = [0, 0.9]
      plt.yticks(ticks=bpms, labels=['0', '0.9'])

      # Creating a horizontal lines at 0 and 0.9
      plt.axhline(y=0.005, linewidth = 0.5) # can't make one at zero so using a small number
      plt.axhline(y=0.9, linewidth = 0.5)

      # removing x labels
      times = [0, 24*60]
      plt.xticks(ticks=times, labels=['', ''])

      # adjusting the label size
      plt.tick_params(axis='y', labelsize=16)

      #adding labels
      plt.figtext(0.5,0.90, 'Distance', fontsize=19, ha='center', color ='w', fontweight = 'bold')
      plt.figtext(0.5,0.02, 'NOON', fontsize=18, ha='center', color ='w', fontweight = 'bold')
      plt.figtext(0.15,0.02, '12:00 AM', fontsize=18, ha='center', color ='w', fontweight = 'bold')
      plt.figtext(0.85,0.02, '11:59 PM', fontsize=16, ha='center', color ='w', fontweight = 'bold')

Although the numbers are the same because we don’t know how many parts are in the orginal graph, the pattern looks close enough.

Data Analysis§

5.1 Detecting Anomalies (Outliers)§

Detecting outliers is a uselful thing to do before an analysis which is evident from the datapoint shown in section 4.1 where the device was worn for a very little time. Data points like this may affect the results of a data analysis done on the data that contain them. There are multiple ways of annotating outliers and here we are going to present two of them.

First Way to Detect Outliers: Iterquartile range§

According to the National Institute of Standards and Technology’s handbook, we assume that a “mild outlier” is a datapoint that is higher than or lower than the first quartile or higher than the third quartile by a distance of the interquartile range multiplied by 1.5 points, while “extreme outliers” are ones that distance multipled by 3 points instead. Lets create a function that applies this on a given array.

[ ]:
def find_Outliers(array):
    #find interquartile range
    quartiles = np.quantile(array, [0.25,0.75])
    q1, q3 = quartiles[0],quartiles[1]
    interquartile_Range = q3 - q1

    #append outliers
    outliers = []
    for item in array:
        if item >= q3+(1.5* interquartile_Range) or item <= q1 - (1.5*interquartile_Range):
            outliers.append(item)
    return outliers

Lets test this function

[ ]:
#we aleady have the new_heart_rate from the past plot
#lets inject an outlier value
new_heart_rate.append(12)

#run the function
outliers = find_Outliers(new_heart_rate)
print(outliers)

Looks like that works!

Second Way To Detect Outliers: Elleptic Envelope§

The Elliptic Envelope algorithm is a machine learning algorith that creates a hypothetical ellipse around the set of data and points outside of this envelope are considered outliers. Check this to learn more about the algorithm.

[ ]:
def find_outliers2(arr):
  list_of_outliers = []
  # Create a dataframe
  d = {'arr': arr}
  df = pd.DataFrame(data=d)

  # here we return the a list where the indexies with -1 values are where the
  # outliers are at. learn more about the implementation here:
  # https://www.datatechnotes.com/2020/04/anomaly-detection-with-elliptical-envelope-in-python.html
  pred = EllipticEnvelope(assume_centered=False, contamination=0.02, random_state=None,
                 store_precision=True, support_fraction=None).fit_predict(df['arr'].array.reshape(-1, 1))
  for i in range(len(pred)):
    if pred[i] == -1:
      list_of_outliers.append(arr[i])
  return list_of_outliers

Lets test it

[ ]:
#run the function
outliers = find_outliers2(new_heart_rate)
print(outliers)

That works as well! There are numerous other ways to detect outliers. You can check this out for more! Now lets create a plot to highlight the outliers.

[ ]:
outlier_time = [123]
new_time.append(123)

with plt.style.context('fivethirtyeight'):

    #creating the plot without highlighting outliers
    plt.xlabel('heart rate')
    plt.ylabel('dates')
    plt.scatter(x = new_heart_rate, y = new_time, color = 'B')
    plt.rcParams["figure.figsize"] = (5,5)
    plt.show(block=True)

    #recreating the plot with highlighting outliers
    plt.xlabel('heart_rate')
    plt.ylabel('dates')
    plt.scatter(x = new_heart_rate, y = new_time, color = 'B')
    plt.rcParams["figure.figsize"] = (5,5)
    plt.scatter(x = outliers, y = outlier_time, color='r')
    plt.show(block=True)

5.2 Checking for correlation between distance walked and heart rate§

Here we are trying to see if there is a correlation between the distance walked and heart_rate. The very intuitive hypothesis is that walking for a longer distance will correlate with the heart rate at this time, and we are checking for the validity of this hyposthesis.

To test for this hypothesis, we are going to collect the data, make the times match and see if there is correlation by calculating its p-value

[ ]:
# first, let's collect the data

time_distances = []
distances = []

for datapoint in api_data['distance_day'][0]['activities-distance-intraday']['dataset']:
    distances.append(float(datapoint['value']))
    time_distances.append(int(datapoint['time'][0:2]) * 60 + int(datapoint['time'][3:5]))

time_heart = []
heart_rate = []

for datapoint in api_data['heart_rate_day'][0]['activities-heart-intraday']['dataset']:
    time_heart.append(int(datapoint['time'][0:2]) * 60 + int(datapoint['time'][3:5]))
    heart_rate.append(float(datapoint['value']))


#since all times are collected in distances but not heart rate we have fix this
#to get matching size arrays
new_dist = []
new_bpm = []

for i in range(len(distances)):
  if distances[i] != 0:
    if time_distances[i] in time_heart:
      new_dist.append(distances[i])
      new_bpm.append(heart_rate[time_heart.index(time_distances[i])])


#We remove the outliers after this
outliers1 = find_outliers2(new_dist)
outliers2 = find_outliers2(new_bpm)
for item in outliers1:
  if item in new_dist:
    index = new_dist.index(item)
    new_dist.remove(item)
    new_bpm.pop(index)
for item in outliers2:
  if item in new_bpm:
    index = new_bpm.index(item)
    new_bpm.remove(item)
    new_dist.pop(index)

# create a dataframex
d = {'distance (miles)': new_dist, 'bpm': new_bpm}
df = pd.DataFrame(data=d)

#plot the data
with plt.style.context('fivethirtyeight'):
    graph = sns.lmplot(data=df, y="distance (miles)", x="bpm")
    plt.show(block=True)
[ ]:
slope, intercept, r_value, p_value, std_err = stats.linregress(new_dist,new_bpm)
print(p_value)

The p value here is about 0.00000057 which is definetly smaller than 0.05 which is the cutoff value for whether the correlation is significant or not. This means that there is a 0.000057% chance that this data is generated at random which means the correlation is significant!