24 datasets found

Citi Bike Stations
kaggle.com
Updated Dec 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ethan Rosenthal (2021). Citi Bike Stations [Dataset]. http://doi.org/10.34740/kaggle/dsv/2902878
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/2902878
Dataset updated
Dec 8, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ethan Rosenthal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

The New York City bikeshare, Citi Bike, has a real time, public API. This API conforms to the General Bikeshare Feed Specification. As such, this API contains information about the number of bikes and docks available at every station in NYC.

Since 2016, I have been pinging the public API every 2 minutes and storing the results. This dataset contains all of these results, from 8/15/2016 - 12/8/2021. The data unfortunately comes in the form of a bunch of CSVs. I recognize that this is not the best format to read large datasets like this, but a CSV is still a pretty universal format! My suggestion would be to convert these CSVs to parquet or something similar if you plan to do lots of analysis on lots of files.

Originally, I setup an EC2 instance and pinged a legacy API (code). In 2019, I switched to pinging this API via a Lambda function (code).

As part of this 2019 switch, I also started pinging the station information API once per week in order to collect information about each station, such as the name, latitude and longitude. While this dataset contains columns for all of the station information, these columns are missing data between 2016 and 8/2019. It would probably be reasonable to backfill that data with the earliest info available for each station, although be warned that this is not guaranteed to be accurate.

Details

In order to reduce the individual file size, the full dataset has been bucketed by station_id into 50 separate files. All historical data for a given station_id are in the same file, and the stations are randomly distributed across the 50 files.

As previously mentioned, station information is missing for all data earlier than 8/2019. I have included a column, missing_station_information to indicate when this information is missing. You may wonder why I don't just create a separate station information file which can be joined to the file containing the time series. The reason is that the station information can technically change over time. When station information is provided in a given row, that information is accurate within sometime 7 days prior. This is because I pinged the station information weekly and then had to join it to the time series.

The CSV files are the result of a CREATE TABLE AS AWS Athena query using the TEXTFILE format. Consequently, null values are demarcated as N. The two timestamp columns, station_status_last_reported and station_information_last_updated are in units of POXIX/UNIX time (i.e. seconds since 1970-01-01 00:00:00 UTC). The following code may be helpful to get you started loading the data as a pandas DataFrame.

def read_csv(filename: str) -> pd.DataFrame: """ Read DataFrame from a CSV file ``filename`` and convert to a preferred schema. """ df = pd.read_csv( filename, sep=",", na_values="\N", dtype={ "station_id": str, # Use Pandas Int16 dtype to allow for nullable integers "num_bikes_available": "Int16", "num_ebikes_available": "Int16", "num_bikes_disabled": "Int16", "num_docks_available": "Int16", "num_docks_disabled": "Int16", "is_installed": "Int16", "is_renting": "Int16", "is_returning": "Int16", "station_status_last_reported": "Int64", "station_name": str, "lat": float, "lon": float, "region_id": str, "capacity": "Int16", # Use pandas boolean dtype to allow for nullable booleans "has_kiosk": "boolean", "station_information_last_updated": "Int64", "missing_station_information": "boolean" }, ) # Read in timestamps as UNIX/POSIX epochs but then convert to the local # bike share timezone. df["station_status_last_reported"] = pd.to_datetime( df["station_status_last_reported"], unit="s", origin="unix", utc=True ).dt.tz_convert("US/Eastern") df["station_information_last_updated"] = pd.to_datetime( df["station_information_last_updated"], unit="s", origin="unix", utc=True ).dt.tz_convert("US/Eastern") return df

The column names almost come directly from the station_status and station_information APIs. See the [GBFS schema](https://github.com/MobilityData/gbfs...
C
bublr bikes Open API (Application Programming Interface)
data.milwaukee.gov
gbfs, website
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Organizations (2024). bublr bikes Open API (Application Programming Interface) [Dataset]. https://data.milwaukee.gov/dataset/bublr
Explore at:
gbfs, websiteAvailable download formats
Dataset updated
Oct 1, 2024
Dataset authored and provided by
External Organizations
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BCycle, LLC is a proud supporter of the new General Bikeshare Feed Specification (GBFS). This new standard supports publicly available station bike/dock availability information and does not require the use of an API token.
Nextbike Rental Bikes Positions Real Time - Bicycle Rental System
hub.tumidata.org
url
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TUMI (2024). Nextbike Rental Bikes Positions Real Time - Bicycle Rental System [Dataset]. https://hub.tumidata.org/dataset/nextbike_rental_bikes_positions_real_time_bicycle_rental_system_bonn
Explore at:
urlAvailable download formats
Dataset updated
Jun 4, 2024
Dataset provided by
Tumi Inc.http://www.tumi.com/
Description
Nextbike Rental Bikes Positions Real Time - Bicycle Rental System
This dataset falls under the category Public Transport On Demand PT.
It contains the following data: The API provides a geo-referenced list of the current locations of parked bicycles of the bicycle rental system (Stadtwerke Bonn and nextbike) in real time for the Bonn city area. Note: Services that retrieve the data provided at intervals of less than 10 minutes are blocked. In addition to the bicycle positions, the rental locations are also available as an API. Further information: https://www.nextbike.de/de/bonn/information/#
This dataset was scouted on 2022-02-17 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing. The data can be accessed using the following URL / API Endpoint: https://api.nextbike.net/maps/nextbike-live.json?city=547
j
Bike Counts
data.jerseycitynj.gov
analyzejerseycity.opendatasoft.com
csv, excel, geojson +1
Updated Oct 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Bike Counts [Dataset]. https://data.jerseycitynj.gov/explore/dataset/bike-counts/
Explore at:
excel, geojson, json, csvAvailable download formats
Dataset updated
Oct 7, 2020
Description
This datasets shows the number of bikes passing through 3 intersections in Jersey CityIntersection NameData Available from Bergen & Academy4/16/2020Jersey Ave & Grand St5/6/2019Washington & Thomas Gangemi4/29/2020Click here to see Bike usage in Jersey City
Dublinbikes API DCC - Dataset - data.gov.ie
data.gov.ie
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.ie, Dublinbikes API DCC - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/dublinbikes-api
Explore at:
Dataset provided by
data.gov.ie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dublin Bikes is a docked bike-share scheme in Dublin City. This page includes an API developed according to the General Bikeshare Feed Specification (GBFS) (e.g.) information about vehicles, stations, pricing, etc. The current location of the vehicles is updated every minute. In addition, this page includes historical files of bike location data. Disclaimer - Please note that some of the historical files are empty due to historical data issues. .hidden { display: none }
C
Divvy Bicycle Stations
data.cityofchicago.org
s.cnmilf.com
+3more
Updated Oct 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Divvy (2025). Divvy Bicycle Stations [Dataset]. https://data.cityofchicago.org/Transportation/Divvy-Bicycle-Stations/bbyy-e7gq
Explore at:
xml, kml, application/geo+json, kmz, xlsx, csvAvailable download formats
Dataset updated
Oct 7, 2025
Dataset authored and provided by
Divvy
Description
A list of the stations where one can pick up and return bicycles from the Divvy bicycle sharing system (http://divvybikes.com/). This dataset contains all stations. For a list of only those stations currently in service, see https://data.cityofchicago.org/d/67g3-8ig8. For real-time status of stations in machine-readable format, see https://gbfs.divvybikes.com/gbfs/gbfs.json.
j
Bike Lanes 2023 (Division of Transportation)
data.jerseycitynj.gov
analyzejerseycity.opendatasoft.com
csv, excel, geojson +1
Updated May 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Bike Lanes 2023 (Division of Transportation) [Dataset]. https://data.jerseycitynj.gov/explore/dataset/bike-lanes-2020-division-of-transportation/
Explore at:
geojson, csv, json, excelAvailable download formats
Dataset updated
May 22, 2020
Description
This dataset presents the different bike lanes across Jersey City.Last Updated - November 2020Bike Facilities Full color MapJersey City Bike Facilities_2023
g
Bicycle permanent counting stations | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bicycle permanent counting stations | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_6ebde3b5-333e-4d94-85f7-d37763493b8c/
Explore at:
Description
The data set contains the measurement results of the bicycle counts of the urban permanent counting stations in the city of Constance. This data can be used to determine exactly when and how many cyclists have been on the road in these places. The measurement results are recorded every 15 minutes. The census points serve as an important data basis for urban cycling policy. Currently, the city of Konstanz operates five permanent counting stations in the city area. The oldest bicycle counting station is at the end of the bicycle bridge at Herosépark (further information and pictures here). Since 2023, there are four more: one at the old Rhine bridge, one at Fürstenberg station, one at Petershausen station and one on Friedrichstraße. In addition, the city runs mobile traffic censuses from time to time at different locations (Mobile traffic censuses Cycling and foot traffic ◀ Open data Constance (offenedaten-konstanz.de)). There is also a dashboard here: Bicycle counting stations Konstanz (arcgis.com) Data of the bicycle counting stations of the city of Constance: The statistics are provided on this portal annually in CSV format. More up-to-date data can be requested if required. Access to the live API data can be assigned on a case-by-case basis. If you want to retrieve the data of the Konstanz bicycle counting stations for a specific day, you can do so on an interactive map of the manufacturer Eco Counter (http://eco-public.com/ParcPublic/?id=4586). The Bicycle Data Initiative has an interactive tool that offers the following features: (Attention: currently only works for data until 08.11.2020) Download raw data based on the intervals you choose: https://www.bicycle-data.de/bicycles-data Creation of automated and standardized analyses from the data (e.g. average values for day, month or weather conditions): https://www.bicycle-data.de/city-analysis Comparison of Konstanz with other cities: https://www.bicycle-data.de/city-comparison For this dataset we also published a Blogeintrag. (Source: City of Konstanz, Office for Urban Planning and Environment)
Presentation Of The City Bikes Program
hub.tumidata.org
url
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TUMI (2024). Presentation Of The City Bikes Program [Dataset]. https://hub.tumidata.org/dataset/presentation_of_the_city_bikes_program_salvador_de_bahia
Explore at:
urlAvailable download formats
Dataset updated
Jun 4, 2024
Dataset provided by
Tumi Inc.http://www.tumi.com/
Description
Presentation Of The City Bikes Program
This dataset falls under the category Individual Transport Other.
It contains the following data: Presentation of the bicycle city project
This dataset was scouted on 2022-02-14 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing. The data can be accessed using the following URL / API Endpoint: http://www.planmob.salvador.ba.gov.br/index.php/13-estudos-projetos-e-programas?ml=1See URL for data access and license information.
a
Bicycle Routes
hub.arcgis.com
data-markham.opendata.arcgis.com
+2more
Updated Aug 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Markham (2018). Bicycle Routes [Dataset]. https://hub.arcgis.com/maps/markham::bicycle-routes
Explore at:
Dataset updated
Aug 2, 2018
Dataset authored and provided by
City of Markham
Area covered

Description
This layer shows bike paths within the City of Markham
g
City of Sydney Bicycle Parking | gimi9.com
gimi9.com
Updated Jul 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). City of Sydney Bicycle Parking | gimi9.com [Dataset]. https://gimi9.com/dataset/au_nsw-2-city-sydney-bicycle-parking/
Explore at:
Dataset updated
Jul 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Sydney, Council of the City of Sydney, Sydney
Description
This data is provided by the City of Sydney and provides bicycle parking locations. There are more than 3000 public bike parking spaces in the City of Sydney's area. There are 32 free bicycle parking spaces at Kings Cross car park level 5. Goulburn Street car park has 9 individual bike cages for casual use, as well as a free secure cage with 24 spaces that can be accessed with a pass. The API provides data in GeoJSON format, for more information visit City of Sydney.
Seattle Fremont Bridge Hourly Bicycle Counts
kaggle.com
Updated Dec 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Seattle (2019). Seattle Fremont Bridge Hourly Bicycle Counts [Dataset]. https://www.kaggle.com/city-of-seattle/seattle-fremont-bridge-hourly-bicycle-counts/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2019
Dataset provided by
Kaggle
Authors
City of Seattle
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Seattle
Description
Content

The Fremont Bridge Bicycle Counter began operation in October 2012 and records the number of bikes that cross the bridge using the pedestrian/bicycle pathways. Inductive loops on the east and west pathways count the passing of bicycles regardless of travel direction. The data consists of a date/time field: Date, east pathway count field: Fremont Bridge NB, and west pathway count field: Fremont Bridge SB. The count fields represent the total bicycles detected during the specified one hour period. Direction of travel is not specified, but in general most traffic in the Fremont Bridge NB field is travelling northbound and most traffic in the Fremont Bridge SB field is travelling southbound.

Context

This is a dataset hosted by the City of Seattle. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore the City of Seattle using Kaggle and all of the data sources available through the City of Seattle organization page!

Update Frequency: This dataset is updated monthly.

Acknowledgements

This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

Cover photo by Brina Blum on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
o
Bicycle Network
melbournetestbed.opendatasoft.com
data.melbourne.vic.gov.au
csv, excel, geojson +1
Updated Nov 19, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Bicycle Network [Dataset]. https://melbournetestbed.opendatasoft.com/explore/dataset/bicycle-network/api/
Explore at:
csv, geojson, json, excelAvailable download formats
Dataset updated
Nov 19, 2019
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a simplified network representation of bike paths across City of Melbourne. The dataset can be used to create a digital bicycle network with route modelling capabilities that integrated existing bicycle infrastructure. The network has been created to be used with ArcGIS network analyst. The resulting network was connected to the City of Melbourne property layer through centroids created for this project:

The network can assist in multiple modelling tasks including catchment analysis and route analysis. The download is a zip file containing compressed .json files

Please see the metadata attached for further information.
SEA NW 58th St Greenway at 22nd Ave Bike Counter
kaggle.com
Updated Dec 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Seattle (2019). SEA NW 58th St Greenway at 22nd Ave Bike Counter [Dataset]. https://www.kaggle.com/city-of-seattle/sea-nw-58th-st-greenway-at-22nd-ave-bike-counter/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 1, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
City of Seattle
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Content

The counters consist of two small tube sensors stretching across the street, which are attached to a small metal counting box made by Eco-Counter. The tubes only count people riding bikes. They are very accurate and designed to be used on greenways.

Context

This is a dataset hosted by the City of Seattle. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore the City of Seattle using Kaggle and all of the data sources available through the City of Seattle organization page!

Update Frequency: This dataset is updated monthly.

Acknowledgements

This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

Cover photo by Patrick Hendry on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
a
Bicycle Parking
arc-gis-hub-home-arcgishub.hub.arcgis.com
esri-olympia-office.hub.arcgis.com
+2more
Updated Dec 31, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Portland, Oregon (2018). Bicycle Parking [Dataset]. https://arc-gis-hub-home-arcgishub.hub.arcgis.com/datasets/PDX::bicycle-parking/about
Explore at:
Dataset updated
Dec 31, 2018
Dataset authored and provided by
City of Portland, Oregon
Area covered

Description
Bike racks installed in the public right-of-way in the City of Portland.The bicycle parking, streetlight and sidewalk asset datasets are actively maintained. All of PBOT’s asset data is manually digitized from work order diagrams and construction plans after work is completed and accepted. This does lag somewhat behind completion of construction, but is typically within a few weeks or months of the conclusion of construction work. -- Additional Information: Category: Transportation - Assets Purpose: No purpose information available. Update Frequency: Weekly-- Metadata Link:https://www.portlandmaps.com/metadata/index.cfm?&action=DisplayLayer&LayerID=52921
m
Super Sunday Bike Count
data.melbourne.vic.gov.au
melbournetestbed.opendatasoft.com
csv, excel, geojson +1
Updated Feb 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Super Sunday Bike Count [Dataset]. https://data.melbourne.vic.gov.au/explore/dataset/super-sunday-bike-count/
Explore at:
json, excel, geojson, csvAvailable download formats
Dataset updated
Feb 26, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains observed bike counts from sites across the city known as "Super Sunday". This is Australia’s biggest survey of recreational travel. Held annually in mid-November, the count looks at how runners, walkers, bike riders and other recreational users move around

There is a large number of fields captured for this dataset, which has been compiled into an attached metadata document.
Data from: Bike Lanes
hub.tumidata.org
url
Updated Jun 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TUMI (2024). Bike Lanes [Dataset]. https://hub.tumidata.org/dataset/bike_lanes_curitiba
Explore at:
urlAvailable download formats
Dataset updated
Jun 4, 2024
Dataset provided by
Tumi Inc.http://www.tumi.com/
Description
Bike Lanes
This dataset falls under the category Individual Transport Street Network Geometries (Geodata).
It contains the following data: Bicycle routes, paracicles, lanes.
This dataset was scouted on 2022-02-12 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing. The data can be accessed using the following URL / API Endpoint: https://ippuc.org.br/geodownloads/geo.htmSee URL for data access and license information.
Seattle Spokane St Bridge Counter
kaggle.com
zip
Updated Aug 1, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Seattle (2018). Seattle Spokane St Bridge Counter [Dataset]. https://www.kaggle.com/city-of-seattle/seattle-spokane-st-bridge-counter
Explore at:
zip(222657 bytes)Available download formats
Dataset updated
Aug 1, 2018
Dataset authored and provided by
City of Seattle
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Southwest Spokane Street Bridge, Seattle
Description
Content

The Spokane Bridge Bicycle Counter records the number of bikes that cross the bridge using the pedestrian/bicycle pathway on the south side. Inductive loops on the pathway count the passing of bicycles with travel direction. The data consists of a date/time field, east pathway count field and west pathway count field. The count fields represent the total bicycles detected during the specified one hour period.

Context

This is a dataset hosted by the City of Seattle. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore the City of Seattle using Kaggle and all of the data sources available through the City of Seattle organization page!

Update Frequency: This dataset is updated monthly.

Acknowledgements

This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

Cover photo by Kasper Rasmussen on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Public Bicycles
hub.tumidata.org
csv, url, zip
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TUMI (2024). Public Bicycles [Dataset]. https://hub.tumidata.org/dataset/public_bicycles_buenos_aires
Explore at:
zip(103953010), url, csv(209), (2732), csv(178401), zip(95213104), zip(98600372)Available download formats
Dataset updated
Jun 4, 2024
Dataset provided by
Tumi Inc.http://www.tumi.com/
Description
Public Bicycles
This dataset falls under the category Public Transport Other.
It contains the following data: Origin, destination, time, gender and age of trips made on the City's public bicycle system.
This dataset was scouted on 2022-02-20 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing. The data can be accessed using the following URL / API Endpoint: https://data.buenosaires.gob.ar/dataset/bicicletas-publicas
Data from: Bike Lanes
hub.tumidata.org
cpg, dbase, prj, qpj +5
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TUMI (2024). Bike Lanes [Dataset]. https://hub.tumidata.org/dataset/bike_lanes_sao_paulo
Explore at:
cpg(5), prj(143), zip(122450), shp(772720), url, xls(32256), dbase(550212), shx(15540), qpj(257)Available download formats
Dataset updated
Jun 4, 2024
Dataset provided by
Tumi Inc.http://www.tumi.com/
Description
Bike Lanes
This dataset falls under the category Planning & Policy Policy.
It contains the following data: Municipal cycling network made up of road interventions dedicated to the exclusive or non-exclusive circulation of bicycles. They are composed of bike lanes, cycle lanes, shared sidewalks, shared sidewalks and cycle routes.
This dataset was scouted on 2022-02-10 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing. The data can be accessed using the following URL / API Endpoint: http://dados.prefeitura.sp.gov.br/dataset/ciclovias

Facebook

Twitter

Click to copy link

Link copied

Cite

Ethan Rosenthal (2021). Citi Bike Stations [Dataset]. http://doi.org/10.34740/kaggle/dsv/2902878

Citi Bike Stations

High-resolution bike share station time series data from 2016-2021

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/2902878

Dataset updated

Dec 8, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ethan Rosenthal

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Context

The New York City bikeshare, Citi Bike, has a real time, public API. This API conforms to the General Bikeshare Feed Specification. As such, this API contains information about the number of bikes and docks available at every station in NYC.

Since 2016, I have been pinging the public API every 2 minutes and storing the results. This dataset contains all of these results, from 8/15/2016 - 12/8/2021. The data unfortunately comes in the form of a bunch of CSVs. I recognize that this is not the best format to read large datasets like this, but a CSV is still a pretty universal format! My suggestion would be to convert these CSVs to parquet or something similar if you plan to do lots of analysis on lots of files.

Originally, I setup an EC2 instance and pinged a legacy API (code). In 2019, I switched to pinging this API via a Lambda function (code).

As part of this 2019 switch, I also started pinging the station information API once per week in order to collect information about each station, such as the name, latitude and longitude. While this dataset contains columns for all of the station information, these columns are missing data between 2016 and 8/2019. It would probably be reasonable to backfill that data with the earliest info available for each station, although be warned that this is not guaranteed to be accurate.

Details

In order to reduce the individual file size, the full dataset has been bucketed by station_id into 50 separate files. All historical data for a given station_id are in the same file, and the stations are randomly distributed across the 50 files.

As previously mentioned, station information is missing for all data earlier than 8/2019. I have included a column, missing_station_information to indicate when this information is missing. You may wonder why I don't just create a separate station information file which can be joined to the file containing the time series. The reason is that the station information can technically change over time. When station information is provided in a given row, that information is accurate within sometime 7 days prior. This is because I pinged the station information weekly and then had to join it to the time series.

The CSV files are the result of a CREATE TABLE AS AWS Athena query using the TEXTFILE format. Consequently, null values are demarcated as N. The two timestamp columns, station_status_last_reported and station_information_last_updated are in units of POXIX/UNIX time (i.e. seconds since 1970-01-01 00:00:00 UTC). The following code may be helpful to get you started loading the data as a pandas DataFrame.

def read_csv(filename: str) -> pd.DataFrame:
  """
  Read DataFrame from a CSV file ``filename`` and convert to a 
  preferred schema.
  """
  df = pd.read_csv(
    filename,
    sep=",",
    na_values="\N",
    dtype={
      "station_id": str,
      # Use Pandas Int16 dtype to allow for nullable integers
      "num_bikes_available": "Int16",
      "num_ebikes_available": "Int16",
      "num_bikes_disabled": "Int16",
      "num_docks_available": "Int16",
      "num_docks_disabled": "Int16",
      "is_installed": "Int16",
      "is_renting": "Int16",
      "is_returning": "Int16",
      "station_status_last_reported": "Int64",
      "station_name": str,
      "lat": float,
      "lon": float,
      "region_id": str,
      "capacity": "Int16",
      # Use pandas boolean dtype to allow for nullable booleans
      "has_kiosk": "boolean",
      "station_information_last_updated": "Int64",
      "missing_station_information": "boolean"
    },
  )
  # Read in timestamps as UNIX/POSIX epochs but then convert to the local
  # bike share timezone.
  df["station_status_last_reported"] = pd.to_datetime(
    df["station_status_last_reported"], unit="s", origin="unix", utc=True
  ).dt.tz_convert("US/Eastern")

  df["station_information_last_updated"] = pd.to_datetime(
    df["station_information_last_updated"], unit="s", origin="unix", utc=True
  ).dt.tz_convert("US/Eastern")
  return df

The column names almost come directly from the station_status and station_information APIs. See the [GBFS schema](https://github.com/MobilityData/gbfs...

Clear search

Close search

Google apps

Main menu

Citi Bike Stations

Context

Details

bublr bikes Open API (Application Programming Interface)

Nextbike Rental Bikes Positions Real Time - Bicycle Rental System

Bike Counts

Dublinbikes API DCC - Dataset - data.gov.ie

Divvy Bicycle Stations

Bike Lanes 2023 (Division of Transportation)

Bicycle permanent counting stations | gimi9.com

Presentation Of The City Bikes Program

Bicycle Routes

City of Sydney Bicycle Parking | gimi9.com

Seattle Fremont Bridge Hourly Bicycle Counts

Content

Context

Acknowledgements

Bicycle Network

SEA NW 58th St Greenway at 22nd Ave Bike Counter

Content

Context

Acknowledgements

Bicycle Parking

Super Sunday Bike Count

Data from: Bike Lanes

Seattle Spokane St Bridge Counter

Content

Context

Acknowledgements

Public Bicycles

Data from: Bike Lanes

Citi Bike Stations

High-resolution bike share station time series data from 2016-2021

Context

Details