This dataset was created by R. Naga Amrutha
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data sets and r studio code used for capstone project
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a marginal difference in how assessment categories were weighted by students, with reflections highlighting appreciation for student agency. In course content, students reported the greatest learning gains in describing variables, while collaborative activities (e.g., interacting with peers and instructor) were the most effective support. The most effective tasks to facilitate these learning gains included coding exercises and team-led assignments. Autocoding of student reflections identified 16 themes, and positive sentiments were written nearly 4x more often than negative sentiments. Students positively reflected on their growth in statistical analyses, and negative sentiments focused on how limited prior experience with statistics and coding made them feel nervous. As a group, we encountered several challenges and opportunities in using open science materials. I present key recommendations, based on student experiences, for scientists to consider when publishing open data to provide additional educational benefits to the open science community. Methods Article and dataset fairness To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored ‘1’ or ‘0’ based on whether it met that criteria, with a total possible score of ten. Open grading policies Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of variance (ANOVA) and examined pairwise differences with Tukey HSD. Assessment of perceived learning gains I used a student assessment of learning gains (SALG) survey to measure students’ perceptions of learning gains related to course objectives (Seymour et al. 2000). This Likert-scale survey provided five response categories ranging from ‘no gains’ to ‘great gains’ in learning and the option of open responses in each category. A summary report that converted Likert responses to numbers and calculated descriptive statistics was produced from the SALG instrument website. Student reflections In student reflections, I examined the frequency of the 100 most frequent words, with stop words excluded and a minimum length of four (letters), both “with synonyms” and “with generalizations”. Due to this paper's explorative nature, I used autocoding to identify students' broad themes and sentiments in their reflections. Autocoding examines the sentiment of each word and scores it as positive, neutral, mixed, or negative. In this process, I compared how students felt about each theme, focusing on positive (i.e., satisfaction) and negative (i.e., dissatisfaction) sentiments. The relationship of how sentiment was coded to themes was visualized in a treemap, where the size of a block is relative to the number of references for that code. All reflection processing and analyses were performed in NVivo 14 (Windows). All data were collected with institutional IRB approval (IRB-24–0314). All statistical analyses were performed in R (ver. 4.3.1; R Core Team 2023).
Cyclistic: Google Data Analytics Capstone Project
Cyclistic - Google Data Analytics Certification Capstone Project Moirangthem Arup Singh How Does a Bike-Share Navigate Speedy Success? Background: This project is for the Google Data Analytics Certification capstone project. I am wearing the hat of a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. Cyclistic is a bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore,my team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, my team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve the recommendations, so they must be backed up with compelling data insights and professional data visualizations. This project will be completed by using the 6 Data Analytics stages: Ask: Identify the business task and determine the key stakeholders. Prepare: Collect the data, identify how it’s organized, determine the credibility of the data. Process: Select the tool for data cleaning, check for errors and document the cleaning process. Analyze: Organize and format the data, aggregate the data so that it’s useful, perform calculations and identify trends and relationships. Share: Use design thinking principles and data-driven storytelling approach, present the findings with effective visualization. Ensure the analysis has answered the business task. Act: Share the final conclusion and the recommendations. Ask: Business Task: Recommend marketing strategies aimed at converting casual riders into annual members by better understanding how annual members and casual riders use Cyclistic bikes differently. Stakeholders: Lily Moreno: The director of marketing and my manager. Cyclistic executive team: A detail-oriented executive team who will decide whether to approve the recommended marketing program. Cyclistic marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Cyclistic’s marketing strategy. Prepare: For this project, I will use the public data of Cyclistic’s historical trip data to analyze and identify trends. The data has been made available by Motivate International Inc. under the license. I downloaded the ZIP files containing the csv files from the above link but while uploading the files in kaggle (as I am using kaggle notebook), it gave me a warning that the dataset is already available in kaggle. So I will be using the dataset cyclictic-bike-share dataset from kaggle. The dataset has 13 csv files from April 2020 to April 2021. For the purpose of my analysis I will use the csv files from April 2020 to March 2021. The source csv files are in Kaggle so I can rely on it's integrity. I am using Microsoft Excel to get a glimpse of the data. There is one csv file for each month and has information about the bike ride which contain details of the ride id, rideable type, start and end time, start and end station, latitude and longitude of the start and end stations. Process: I will use R as language in kaggle to import the dataset to check how it’s organized, whether all the columns have appropriate data type, find outliers and if any of these data have sampling bias. I will be using below R libraries
library(tidyverse) library(lubridate) library(ggplot2) library(plotrix) ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.5 ✔ purrr 0.3.4 ✔ tibble 3.1.4 ✔ dplyr 1.0.7 ✔ tidyr 1.1.3 ✔ stringr 1.4.0 ✔ readr 2.0.1 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag()
Attaching package: ‘lubridate’
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
setwd("/kaggle/input/cyclistic-bike-share")
r_202004 <- read.csv("202004-divvy-tripdata.csv") r_202005 <- read.csv("20...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset was created by R. Naga Amrutha