Cyclistic, a bike-share company launched in 2016 has now grown to a fleet of 5,824 bicycles and 692 stations across Chicago. Users can be grouped into “casual riders” (purchase single-ride or full-day passes) and “annual members”.
The company’s finance analysts concluded that annual members generate more profit. The director of marketing, Lily Moreno believes that an ad campaign to convert casual riders to annual members would be optimum, as they already aware of and have chosen Cyclistic for their mobility needs.
I have been tasked with answering the question “How do annual members and casual riders use Cyclistic bikes differently?”
How do casual riders and annual members use Cyclistic bikes differently?
To answer this question, I collected the data from January to December 2023 from here. The data has been made available by Motivate International Inc. under this license. Data privacy prevents me from accessing any of the users’ personal information. This historical data from the previous year is reliable, original, comprehensive, cited and current thus making it suitable for this analysis.
I chose RStudio to work with the data, as it is best equipped to work with datasets as large as these. The 12 data sets contain 13 columns and about 160,000 to 780,000 rows. Some of the columns had datetime values stored as characters so I saved all the data sets as excel workbooks before uploading to Rstudio and merging them into a data frame trip_data
library(tidyverse)
library(readxl)
jan_2023 <- read_excel("jan_2023.xlsx")
feb_2023 <- read_excel("feb_2023.xlsx")
mar_2023 <- read_excel("mar_2023.xlsx")
apr_2023 <- read_excel("apr_2023.xlsx")
may_2023 <- read_excel("may_2023.xlsx")
jun_2023 <- read_excel("jun_2023.xlsx")
jul_2023 <- read_excel("jul_2023.xlsx")
aug_2023 <- read_excel("aug_2023.xlsx")
sep_2023 <- read_excel("sep_2023.xlsx")
oct_2023 <- read_excel("oct_2023.xlsx")
nov_2023 <- read_excel("nov_2023.xlsx")
dec_2023 <- read_excel("dec_2023.xlsx")
trip_data <- jan_2023 %>%
bind_rows(feb_2023) %>%
bind_rows(mar_2023) %>%
bind_rows(apr_2023) %>%
bind_rows(may_2023) %>%
bind_rows(jun_2023) %>%
bind_rows(jul_2023) %>%
bind_rows(aug_2023) %>%
bind_rows(sep_2023) %>%
bind_rows(oct_2023) %>%
bind_rows(nov_2023) %>%
bind_rows(dec_2023)
The merged data has the same number of columns and 5,693,156 rows. It contains information such as ride IDs, start and end station names and longitude and latitude of start and end stations
Below is a summary of all the steps taken in the data cleaning and manipulation processes. The code can be viewed in my GitHUB repository.
The cleaned data frame now contains 3,931,001 rows and 12 columns
In the Analysis phase of my work, I calculated summary statistics that would help answer the business question. Some of which were:
For my data Visualisation I exported the results from my analysis as csv files and used Tableau Public Desktop to create a dashboard to display my findings.
Tableau Dashboard
Insights
Users’ Bike Preferences
Daily Average Trip Duration
Trips per Time of Day (24-hour clock)
Monthly Usage
Top 5 Start Stations (Casual riders)
Recommendations
From the findings above, here are some recommendations that could prove useful in converting casual riders to annual members
Thank you for reading! Stay tuned for the next data journey.
Eyitayo, O. 2023. Google Data Analytics Capstone Project: Cyclistic Bike-Share Case Study