Thursday, January 12, 2023

Case Study 2: How Can a Wellness Technology Company Play It Smart?

This is the second of two case studies I completed for the Google Analytics Certificate. For this case study, I am a junior data analyst working for Bellabeat, a high-tech manufacturer of health-focused products for women. To successfully complete this study, I will need to answer key business questions by following the steps of the data analysis process.

Urška Sršen, the co-founder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data can help the company unlock new growth opportunities. I have been tasked to investigate some FitBit Fitness Tracker Data, a public domain dataset available through Mobius. This data set consists of personal fitness tracker data from thirty Fitbit users. There is minute-by-minute data on their tracked activity, including physical activity, heart rate, sleep monitoring, and daily steps. Ms. Sršen hopes that my analysis of non-Bellabeat smart device data will provide insight into how consumers use these devices.

Specifically, Ms. Sršen wants the following 3 questions to guide my analysis:

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat's marketing strategy?

The business task

The ultimate goal of my analysis is to help guide the marketing strategy of Bellabeat as they look to expand their opportunity to grow their market share. My specific task is analyzing FitBit tracker data to see how these consumers use the product. The ultimate goal is to take any insights from this analysis and relate them to Bellabeat's marketing strategy. My immediate supervisor, Urška Sršen, and the Bellabeat executive team are stakeholders.

My data

The FitBit Fitness Tracker Data is an open-source dataset available for download from Kaggle. It is a public-domain dataset. The online location is https://www.kaggle.com/arashnic/fitbit.

While the dataset provides some incredibly detailed data from 33 Fitbit users over 31 days, there are some clear and apparent limitations of the data:

  1. 33 users is a small sample size when identifying general trends for smart device usage.
  2. Bellabeat markets its products to women, but we have no way of knowing how many participants in the Fitbit study were women. Data from male users would be less valuable to us.
  3. The study is dated, taking place in April and May of 2016. 6 years is a long time in an industry constantly growing and innovating.
  4. We can assume the data was unbiased, in that it was collected directly from a Fitbit tracker. Still, we do not know how the 30 participants were selected, nor whether they provided a representative cross-section of the public.

As I know that Urška Sršen is aware of the limitations of the data set she assigned me for my analysis, I am comfortable that she understands that my observations would only provide a starting point for any new marketing strategy.

After examining the data, I was interested in the detailed activity levels provided. I wanted to explore how the type of activity the study's participants engaged in affected two key areas people use these devices to track: calories burned (weight loss) and sleep tracking. I could get the information I needed from 2 of the available CSV files: dailyActivity_merged.csv for the activity information and sleepDay_merged.csv for the sleep data.

I began with the dailyActivity_merged.csv. I knew the data was from 33 participants tracked over 31 days. I wanted an overview of how many days each participant recorded at least some data. I began by converting the CSV into an Excel spreadsheet.

I pulled data from the following columns:
  • Id: a unique identifier for each of the 33 participants
  • ActivityDate
  • VeryActiveMinutes
  • FairlyActiveMinutes
  • LightlyActiveMinutes
  • SedentaryMinutes
  • Calories: tracked calories burned per user/day
On examination, I found the data consistently formatted and reasonably clean. Entries in the ActivityDate column were correctly formatted as a date. The VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, and Calories were all formatted as general cells. I changed them all to numbers.

For my purposes, I created a new column that I named TotalActiveMinutes. This number was arrived at by adding VeryActiveMinutes, FairlyActiveMinutes, and LightlyActiveMinutes together. My final version of the dailyActivity_merged.xlsx is shown below:

Click any image in this case study
to see a larger version of that image


There was only minor cleaning that remained to be done. I found 4 entries with 0 Total_Active_Minutes and 0 Calories burned. I cleaned those out as days when the user obviously didn't utilize the tracker. I also cleaned out 79 days spread out between 15 users with 0 Total_Active_Minutes, but some calories burned.

The analysis

After I removed the days, as outlined above, where the participant didn't track any activity, this left the number of days where the participants used their tracker and registered some data as follows:
  • 19 users - 31 days
  • 5 users  - 30 days
  • 6 users  - 20-29 days
  • 2 users  - 18 days
  • 1 user   -  4 days

This was very interesting, as everyone in this study had volunteered to have their activity tracked for 31 days. This is a short-term commitment, yet only 19 of the 33 participants showed activity for at least 30 of those days. People buy fitness trackers with the good intentions of working these devices into their day-to-day life, but my first insight gleaned from this study is that these good intentions do not always translate into reality.

Next, I wanted to investigate the amount of correlation between the activity that users averaged per day and the number of calories burned. One of the main reasons that people choose to use a fitness tracker is to lose weight. Of course, diet matters, but burning more calories significantly affects weight loss and aids in maintaining a healthy weight.

I first wanted to take all the active minutes together and see the correlation between that activity and the total calories burned. Remember, when I was preparing the data, I added what the study characterized as LightlyActiveMinutes, FairlyActiveMinutes, and VeryActiveMinutes together, creating a new column that I titled TotalActiveMinutes. For each participant, I compared their average TotalActiveMinutes daily with their average daily calories burned. The chart below shows there is a definite positive correlation:


The correlation displayed in the above chart is 0.197. As a reminder, the correlation coefficient is a statistical measure of the strength of a linear relationship between two variables. It can range from -1, which is no correlation at all, to 1, indicating a perfect correlation. So, while the correlation above is positive, it is not dramatically so.

I wished to explore the data further. I was interested in how the different activity levels correlate with average calories burned. I began by exploring the correlation between the study's lowest activity level, LightlyActiveMinutes, and total calories burned.


I expected that the people who averaged the most LightlyActiveMinutes in a day would average lower total calories burned. However, I was still surprised to find a negative correlation for these folks. The correlation number was -0.0094. The significance of this finding was that many people use a fitness tracker to count their steps. While steps are helpful for fitness and sedentary people should be encouraged to move more, clearly, amassing even a good number of walking steps isn't going to translate into burning many calories and losing some weight.

Of course, I understand that I am working with a small sample size, but this finding is worth exploring further for my company. 

Next up was a look at what happened to the participants who averaged the highest FairlyActiveMinutes, the next highest activity level in the study:


We're back to a positive correlation for the participants who averaged the highest number of FairlyActiveMinutes, with a correlation of 0.173. This is still below the correlation coefficient for those who averaged the highest amount of TotalActiveMinutes, but it is undoubtedly a step in the right direction. The trend spotted here is that getting users of fitness trackers to increase their activity levels is essential in helping them to burn more calories.

It's obvious what we'll see when we look at the participants who average the highest amount of VeryActiveMinutes.

As expected, the highest correlation between activity minutes and calories burned was for the participants who averaged the highest number of VeryActiveMinutes, with a correlation coefficient of 0.624. This is a dramatic increase even over those who averaged the most FairlyActiveMinutes.

Exploring sleep tracking

Another popular use for Fitness Tracker technology is in tracking sleep. According to the Centers for Disease Control and Prevention, more than 1/3 of all American adults do not get sufficient sleep, negatively affecting their health. (https://www.cdc.gov/media/releases/2016/p0215-enough-sleep.html) An ideal use of monitoring sleep would be to help a user develop better habits with the goal of getting more sleep. 

Because I found such a positive correlation between the intensity of activity and calories burned, I decided to look at whether there was a correlation between the participants' activity and the amount of sleep they were able to get. The specific question I wanted to answer was, does being more active help a person get more sleep?

I took the sleepDay_merged.csv and converted it into an Excel sheet. It was a fairly simple sheet consisting of 5 columns:

  • Id (Same unique user IDs as dailyActivity_merged)
  • SleepDay (Date of sleep record)
  • TotalSleepRecords (# of records for each user per day)
  • TotalMinutesAsleep
  • TotalTimeInBed

To prepare it, I ensured the date column was correctly formatted, and all the other columns were formatted as numbers. For more analysis, I created a new column titled InBedAwake, the total time spent by participants in bed but not sleeping. The data was derived by subtracting TotalMinutesAsleep from TotalTimeInBed.

As I mentioned, the Fitbit study was limited by only including data for 33 participants over 31 days. The information available for analysis in the sleepDay_merged.csv was even more limited: there was only sleep information from 24 study participants, and the number of sleeps tracked averaged 19.3 per participant. There needed to be more data to draw any solid conclusions, but it was still worth analyzing to satisfy my boss's directions to see if we could spot trends worth further exploring. 

My first exploration of this new data was to investigate if there was a correlation between the activity levels and the amount of sleep that the participants logged, similar to what I found with activity levels and calories burned.

I created a new spreadsheet titled Sleep_and_Activity.xlsx. I had a pivot table from the dailyActivity_merged spreadsheet from which to pull the activity and calorie data, shown below:

I added a second pivot table from the sleepDay_merged spreadsheet for the sleep data:


I created some new tables that combined the average minutes of sleep per night with the three activity levels. Since I only had sleep data from 24 participants, I only used their combined information for this analysis.


I was ready to begin the analysis. My first step was to take the combined activity to see if there was a correlation between being more active and getting more sleep.


The result was a negative correlation of -0.230. This was not promising for finding any useful results, but I checked the individual activity levels to be certain. There was a negative correlation of -0.203 between light activity and minutes of sleep, -0.145 for fairly active minutes and total minutes of sleep, and -0.064 for very active minutes and total minutes of sleep.




All levels of activity show a negative correlation with average minutes of sleep. Interestingly, the negative correlation goes down as the intensity of activity increases, but there is nothing to recommend further investigation into these results.

I wanted to go back to my boss Urška Sršen with something useful to explore further in sleep tracking, specifically what may help a user get more sleep. In the sleepDay_merged.xlsx, I wanted to explore whether spending time in bed while not asleep led to more or less total sleep. The columns I used were TotalMinutesAsleep, TotalTimeInBed, and the column I added for time spent in bed but not asleep, InBedAwake.

For this purpose, I created a new column titled InBedAwake. I subtracted TotalMinutesAsleep from TotalTimeInBed. The result was the total amount of minutes a participant spent in bed but not asleep. Now I could determine whether those spending more time in bed while awake averaged more or less sleep. If so, this would warrant further exploration with more data.

The results showed a significant positive correlation of 0.412. Those who spent more time in bed while awake also averaged more minutes of sleep per night. At least for this small sample, there was a benefit to being in bed, even if not asleep.

Key Findings and Recommendations

My manager Urška Sršen's directive,, was to take the limited data available in this Fitbit study and find some trends worth exploring further to help guide Bellabeat's marketing strategy. This study took place over a relatively short time, 31 days. The participants were all volunteers, yet only 19 of the 33 participants tracked data for at least 30 of the 31 days, as shown in the chart below:

This points to the difficulty of getting people who purchase a fitness tracker to use it regularly. But the people who will be most satisfied with the product long term are those who utilize it regularly to achieve their goals. So, the first thing worth exploring is what makes owners of these devices use them regularly?

To that end, our company should do everything possible to make the devices more fun. One possibility is to make it as easy as possible through the phone app, to share positive results with friends on social media, and to challenge our friends to best our achievements.

Another possibility is to find a way to encourage users to set goals and reward them in some manner for achieving these goals, therefore encouraging the setting of future goals and the continued use of the product.

The second helpful finding was the correlation between higher levels of activity and burning more calories. As a reminder, there was actually a slight negative correlation between average calories burned per day with averaging the most LightlyActiveMinutes per day:


The correlation becomes positive between average calories burned and average FairlyActiveMinutes per day:


The strongest correlation is between average calories burned and average VeryActiveMinutes per day:


So, while counting steps is an effective way to encourage people to begin tracking their activity, helping these folks to be more successful with fitness tracking and their results over time would involve rewarding and incentivizing these users to increase the intensity of their activity. Trying to nag them into doing this is not an effective strategy for the app to pursue. Finding fun and innovative ways to motivate users to increase their activity levels would be a long-term winning strategy.

The final insight worth further investigation was the link I found with the limited study data between spending more time in bed while awake and amassing more total sleep per night:


If this discovery is borne out with more extensive data, people who use our app in conjunction with a sleep-tracking device should be encouraged to get in bed earlier, even if they are not immediately sleepy. Then, we want to communicate to them over time how this decision has translated into more average time asleep per night.

Whether tracking fitness activity, sleep, or a combination of the two is the use case for a particular consumer, their ultimate happiness with the product will be based on achieving results. In my opinion, the best marketing strategy for Bellabeat would be to create happy, satisfied users of its products. We can showcase these successful, happy users alongside our innovative technologies that helped them to achieve their goals.

Return to Main Page

No comments:

Post a Comment