Facebook
TwitterWelcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path.
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
How do annual members and casual riders use Cyclistic bikes differently?
What is the problem you are trying to solve?
How do annual members and casual riders use Cyclistic bikes differently?
How can your insights drive business decisions?
The insight will help the marketing team to make a strategy for casual riders
Where is your data located?
Data located in Cyclistic organization data.
How is data organized?
Dataset are in csv format for each month wise from Financial year 22.
Are there issues with bias or credibility in this data? Does your data ROCCC?
It is good it is ROCCC because data collected in from Cyclistic organization.
How are you addressing licensing, privacy, security, and accessibility?
The company has their own license over the dataset. Dataset does not have any personal information about the riders.
How did you verify the data’s integrity?
All the files have consistent columns and each column has the correct type of data.
How does it help you answer your questions?
Insights always hidden in the data. We have the interpret with data to find the insights.
Are there any problems with the data?
Yes, starting station names, ending station names have null values.
What tools are you choosing and why?
I used R studio for the cleaning and transforming the data for analysis phase because of large dataset and to gather experience in the language.
Have you ensured the data’s integrity?
Yes, the data is consistent throughout the columns.
What steps have you taken to ensure that your data is clean?
First duplicates, null values are removed then added new columns for analysis.
How can you verify that your data is clean and ready to analyze?
Make sure the column names are consistent thorough out all data sets by using the “bind row” function.
Make sure column data types are consistent throughout all the dataset by using the “compare_df_col” from the “janitor” package.
Combine the all dataset into single data frame to make consistent throught the analysis.
Removed the column start_lat, start_lng, end_lat, end_lng from the dataframe because those columns not required for analysis.
Create new columns day, date, month, year, from the started_at column this will provide additional opportunities to aggregate the data
Create the “ride_length” column from the started_at and ended_at column to find the average duration of the ride by the riders.
Removed the null rows from the dataset by using the “na.omit function”
Have you documented your cleaning process so you can review and share those results?
Yes, the cleaning process is documented clearly.
How should you organize your data to perform analysis on it? The data has been organized in one single dataframe by using the read csv function in R Has your data been properly formatted? Yes, all the columns have their correct data type.
What surprises did you discover in the data?
Casual member ride duration is higher than the annual members
Causal member widely uses docked bike than the annual members
What trends or relationships did you find in the data?
Annual members are used mainly for commute purpose
Casual member are preferred the docked bikes
Annual members are preferred the electric or classic bikes
How will these insights help answer your business questions?
This insights helps to build a profile for members
Were you able to answer the question of how ...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File List seedData.csv (MD5: 1046a13d7dc78e8de7ff15ed1569c743) es.csv (MD5: 83f92d3db02c919782c024c23f02476c) analyses.r (MD5: c2975e34a5540cf99e285a98d83bf817)
Description
This supplement provides the data and the R code to perform the analyses described in the methods section. The data is contained in seedData.csv and es.csv. The file seedData.csv contains data on seed production, es.csv contains data on emergence and survival. analyses.r is the code to perform analyses. For clarity, we divided the code in five sections: (1) models of seed production in unplanted quadrats; (2) create complete demographic data sets; (3) lambda models; (4) vital rates models; (6) LTRE analyses. In the first section, we use data from seedData.csv to create treatment specific means of seed production in unplanted quadrats. In the second section, we subtract these means to the seed production of planted quadrats. We use the resulting data set in sections three, four, and five to fit the mixed-effect models and to perform the LTRE analyses.
Here we present the number, name and description of each column in the two data frames, seedData.csv and es.csv.
Metadata seedData.csv
1.Plot
Definition: Number identifying each experimental plots.
2. RemovalDefinition: A letter indicating whether the plot was subjected to vegetation removal ("R") or not ("N").
3. WarmingDefinition: A letter indicating whether the plot was subjected to warming ("W") or not ("C").
4. EcotypeDefinition: A word indicating whether the quadrat was planted with cheatgrass seeds collected at low ("low"), mid ("mid") or ("high") elevations. Moreover, every plot also contains a quadrat that has not been planted ("unplanted").
5. SiteDefinition: A word indicating whether the plots are located at the low ("low"), mid ("mid") or ("high") elevation site.
6. YearDefinition: A number indicating whether the data refers to seed collected in June 2010 or 2011.
7. SeedsDefinition: A number reporting the estimated number of seeds produced in each quadrat.
Metadata es.csv
Numbers refer to the column number.
1.PlotDefinition: Number identifying each experimental plots.
2. RemovalDefinition: A letter indicating whether the plot was subjected to vegetation removal ("R") or not ("N").
3. WarmingDefinition: A letter indicating whether the plot was subjected to warming ("W") or not ("C").
4. EcotypeDefinition: A word indicating whether the quadrat was planted with cheatgrass seeds collected at low ("low"), mid ("mid") or ("high") elevations. Moreover, every plot also has a quadrat that has not been planted ("unplanted").
5. SiteDefinition: A word indicating whether the plots are located at the low ("low"), mid ("mid") or ("high") elevation site.
6. YearDefinition: A number indicating whether the data refers to the growing season 2009-2010 ("2010") or 2010-2011 ("2011").
7. eDefinition: A number reporting the proportion of planted cheatgrass that emerged during the growing season.
8. sDefinition: A number reporting the proportion of emerged cheatgrass that survived to seed set.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterWelcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act. Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path.
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
How do annual members and casual riders use Cyclistic bikes differently?
What is the problem you are trying to solve?
How do annual members and casual riders use Cyclistic bikes differently?
How can your insights drive business decisions?
The insight will help the marketing team to make a strategy for casual riders
Where is your data located?
Data located in Cyclistic organization data.
How is data organized?
Dataset are in csv format for each month wise from Financial year 22.
Are there issues with bias or credibility in this data? Does your data ROCCC?
It is good it is ROCCC because data collected in from Cyclistic organization.
How are you addressing licensing, privacy, security, and accessibility?
The company has their own license over the dataset. Dataset does not have any personal information about the riders.
How did you verify the data’s integrity?
All the files have consistent columns and each column has the correct type of data.
How does it help you answer your questions?
Insights always hidden in the data. We have the interpret with data to find the insights.
Are there any problems with the data?
Yes, starting station names, ending station names have null values.
What tools are you choosing and why?
I used R studio for the cleaning and transforming the data for analysis phase because of large dataset and to gather experience in the language.
Have you ensured the data’s integrity?
Yes, the data is consistent throughout the columns.
What steps have you taken to ensure that your data is clean?
First duplicates, null values are removed then added new columns for analysis.
How can you verify that your data is clean and ready to analyze?
Make sure the column names are consistent thorough out all data sets by using the “bind row” function.
Make sure column data types are consistent throughout all the dataset by using the “compare_df_col” from the “janitor” package.
Combine the all dataset into single data frame to make consistent throught the analysis.
Removed the column start_lat, start_lng, end_lat, end_lng from the dataframe because those columns not required for analysis.
Create new columns day, date, month, year, from the started_at column this will provide additional opportunities to aggregate the data
Create the “ride_length” column from the started_at and ended_at column to find the average duration of the ride by the riders.
Removed the null rows from the dataset by using the “na.omit function”
Have you documented your cleaning process so you can review and share those results?
Yes, the cleaning process is documented clearly.
How should you organize your data to perform analysis on it? The data has been organized in one single dataframe by using the read csv function in R Has your data been properly formatted? Yes, all the columns have their correct data type.
What surprises did you discover in the data?
Casual member ride duration is higher than the annual members
Causal member widely uses docked bike than the annual members
What trends or relationships did you find in the data?
Annual members are used mainly for commute purpose
Casual member are preferred the docked bikes
Annual members are preferred the electric or classic bikes
How will these insights help answer your business questions?
This insights helps to build a profile for members
Were you able to answer the question of how ...