This is a case study performed for the Google Data Analytics Professional Certificate. For the purposes of this case study, I am a junior data analyst working for Cylistic, a bike share company located in Chicago. I am a member of the Cyclistic marketing analytics team. I joined the team 6 months ago and have spent that time learning about the company, its mission, and its long-term goals. My job is to help my company achieve its goals by helping to focus its marketing strategies.
Therefore, [my marketing analytics] team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, [my] team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve [my] recommendations, so they must be backed up with compelling data insights and professional data visualizations.
About the Company
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.
Our analytics team has been given 3 questions whose answers will guide the future marketing program:
- How do annual members and casual riders use Cyclistic bikes differently?
- Why would casual riders buy Cyclistic annual memberships?
- How can Cyclistic use digital media to influence casual riders to become members?
The business task
Lily Moreno has assigned the first question to me. The business task that I am looking to solve is to determine how members and casual riders use the company’s bikes differently, with the ultimate goal of developing strategies to convince casual riders to convert to annual members. I will present my findings to my boss Lily Moreno, the company's director of marketing, and the Cyclistic executive team.
Accessing the data
I have been asked to download the 12 most recent monthly files. As I am performing this analysis in November 2022, the files I downloaded were from 11/2021 – 10/2022. I made a copy of each file as an Excel spreadsheet, the format and tool I will use for my analysis. I kept the original CSV files in a separate folder as a backup.
Since, for the purposes of this study, the data was downloaded from Cyclistic's server and was their own internal data, I considered it reliable. As it contained raw trip data, I wasn't worried about bias affecting the data. Also, the files contained no personal data on individual users, so there were no privacy concerns. Finally, as there were thousands of records of individual bike rides, I felt that I had more than enough data available to me to perform meaningful analysis.
Organizing and cleaning the data
The ride_id column contained unique alpha-numeric ID numbers for each ride. I checked for duplicates or blanks in this column in each sheet. There were none. I checked the rest of the data to make sure that the numbers were properly formatted throughout the sheet.
One area where I found the data was lacking was with the start_station and end_station columns. They were blank for a significant amount of rides. I was interested in this data in that identifying the most active stations for casual users might be useful for targeting those stations for promotions to convert casual users to annual members. For casual_riders, start_station_name information was missing from 15.5% of the results. That number jumped to 18.2% for the end_station_name for casual riders. This was clearly due to the limitations of the GPS tracking, as the lines that lacked this information also had truncated GPS results.
The started_at and ended_at columns contained a date and time for the start and end of each ride. I created a new column for ride_length, which was arrived at by subtracting started_at from ended_at and formatting the result as a time. This was not only useful for my future analysis, but also for further cleaning the data. All of the spreadsheets had some rows with 0:00 results. I removed those rows from the sheets. Some of the spreadsheets had negative ride lengths, where the start time was later than the end time. I removed those rows as well. The total amount of rows removed per month was as follows:
- 11/2021 53 records with negative ride lengths, 33 with 0:00. 86 total
- 12/2021 21 records with 0:00 trip duration
- 01/2022 5 records with 0:00 trip duration
- 02/2022 5 records with 0:00 trip duration
- 03/2022 2 records with negative ride lengths, 3,107 with 0:00. 3,109 total
- 04/2022 31 records with 0:00 trip duration
- 05/2022 47 records with 0:00 trip duration
- 06/2022 12 records with negative ride lengths, 54 with 0:00. 66 total
- 07/2022 16 records with negative ride lengths, 56 with 0:00, 72 total
- 08/2022 15 records with negative ride lengths, 62 with 0:00, 77 total
- 09/2022 9 records with negative ride lengths, 63 with 0:00, 72 total
- 10/2022 4 records with negative ride lengths, 61 with 0:00, 65 total
Once I was done with the cleaning and my additions, the spreadsheets I was working with looked like the example from 11/2021 shown below, with the columns I used in my analysis highlighted in green:
![]() |
Click on any image in this case study to see a larger version of that image |
Working with the data
Now that my files were cleaned, and prepared I created pivot tables in each monthly spreadsheet. Below is my pivot table for the month of 11/2021.
Since the specific business task assigned to me was to determine how members and casual riders use the bikes differently, I wanted to see at a glance the total number of rides for members and casual riders for the month. This was further broken down by the day of the week and the type of bike used.
As the ultimate goal was to convert casual riders into yearly members, I thought it might be useful to have the information for start_station and end_station ranked in reverse order by the number of rides. Each month would have a list for both start_station and end_station. The example below is my start_station ranking for 11/2021:
Now I had the information I needed to answer the question my supervisor had assigned to me: How do annual members and casual riders use Cyclistic bikes differently?
In looking at my tables, one glaringly obvious difference was in seasonal usage. The busiest months for Cyclistic were June, July, and August. The split in usage between annual members and casual riders was fairly close, with casual riders taking 48% of the rides in June, 49% in July, and 46% in August. However, in the 3 slowest months, December, January, and February, casual riders made up only 28%, 18%, and 19% of total riders. This disparity is clearly demonstrated in the chart below:
Another difference in usage between annual members and casual riders was by day of the week. The busiest days for annual members were weekdays, while casual rider usage peaked on weekend days. Combined with the seasonal differences outlined earlier, My theory would be that broadly speaking, annual members depended on Cyclistic more for practical tasks, such as getting back and forth to work, while casual members were more focused on recreational uses. A survey of both annual members and casual riders would be recommended to verify this theory.
One other difference between annual members and casual riders that came up in the data was the choice of bike type. There were 3 types of bikes: classic bikes, electric bikes, and docked bikes. Note that docked bikes are a self-service option only used by casual riders.
As demonstrated in the chart above, members slightly favor classic bikes over electric bikes. 51.2% of their rides were on classic bikes, while 48.8 % were on electric bikes. Casual riders, on the other hand, demonstrated a clear preference for electric bikes. They chose them 54.1% of the time. Classic bikes were used for 38.1 % of the rides, while docked bikes made up the remaining 7.7 % of rides.
Key Findings and Recommendations