Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The New York City bikeshare, Citi Bike, has a real time, public API. This API conforms to the General Bikeshare Feed Specification. As such, this API contains information about the number of bikes and docks available at every station in NYC.
Since 2016, I have been pinging the public API every 2 minutes and storing the results. This dataset contains all of these results, from 8/15/2016 - 12/8/2021. The data unfortunately comes in the form of a bunch of CSVs. I recognize that this is not the best format to read large datasets like this, but a CSV is still a pretty universal format! My suggestion would be to convert these CSVs to parquet or something similar if you plan to do lots of analysis on lots of files.
Originally, I setup an EC2 instance and pinged a legacy API (code). In 2019, I switched to pinging this API via a Lambda function (code).
As part of this 2019 switch, I also started pinging the station information API once per week in order to collect information about each station, such as the name, latitude and longitude. While this dataset contains columns for all of the station information, these columns are missing data between 2016 and 8/2019. It would probably be reasonable to backfill that data with the earliest info available for each station, although be warned that this is not guaranteed to be accurate.
In order to reduce the individual file size, the full dataset has been bucketed by station_id
into 50 separate files. All historical data for a given station_id
are in the same file, and the stations are randomly distributed across the 50 files.
As previously mentioned, station information is missing for all data earlier than 8/2019. I have included a column, missing_station_information
to indicate when this information is missing. You may wonder why I don't just create a separate station information file which can be joined to the file containing the time series. The reason is that the station information can technically change over time. When station information is provided in a given row, that information is accurate within sometime 7 days prior. This is because I pinged the station information weekly and then had to join it to the time series.
The CSV files are the result of a CREATE TABLE AS AWS Athena query using the TEXTFILE
format. Consequently, null values are demarcated as N
. The two timestamp columns, station_status_last_reported
and station_information_last_updated
are in units of POXIX/UNIX time (i.e. seconds since 1970-01-01 00:00:00 UTC). The following code may be helpful to get you started loading the data as a pandas DataFrame.
def read_csv(filename: str) -> pd.DataFrame:
"""
Read DataFrame from a CSV file ``filename`` and convert to a
preferred schema.
"""
df = pd.read_csv(
filename,
sep=",",
na_values="\N",
dtype={
"station_id": str,
# Use Pandas Int16 dtype to allow for nullable integers
"num_bikes_available": "Int16",
"num_ebikes_available": "Int16",
"num_bikes_disabled": "Int16",
"num_docks_available": "Int16",
"num_docks_disabled": "Int16",
"is_installed": "Int16",
"is_renting": "Int16",
"is_returning": "Int16",
"station_status_last_reported": "Int64",
"station_name": str,
"lat": float,
"lon": float,
"region_id": str,
"capacity": "Int16",
# Use pandas boolean dtype to allow for nullable booleans
"has_kiosk": "boolean",
"station_information_last_updated": "Int64",
"missing_station_information": "boolean"
},
)
# Read in timestamps as UNIX/POSIX epochs but then convert to the local
# bike share timezone.
df["station_status_last_reported"] = pd.to_datetime(
df["station_status_last_reported"], unit="s", origin="unix", utc=True
).dt.tz_convert("US/Eastern")
df["station_information_last_updated"] = pd.to_datetime(
df["station_information_last_updated"], unit="s", origin="unix", utc=True
).dt.tz_convert("US/Eastern")
return df
The column names almost come directly from the station_status
and station_information
APIs. See the [GBFS schema](https://github.com/MobilityData/gbfs...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BCycle, LLC is a proud supporter of the new General Bikeshare Feed Specification (GBFS). This new standard supports publicly available station bike/dock availability information and does not require the use of an API token.
Nextbike Rental Bikes Positions Real Time - Bicycle Rental System
This dataset falls under the category Public Transport On Demand PT.
It contains the following data: The API provides a geo-referenced list of the current locations of parked bicycles of the bicycle rental system (Stadtwerke Bonn and nextbike) in real time for the Bonn city area. Note: Services that retrieve the data provided at intervals of less than 10 minutes are blocked. In addition to the bicycle positions, the rental locations are also available as an API. Further information: https://www.nextbike.de/de/bonn/information/#
This dataset was scouted on 2022-02-17 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: https://api.nextbike.net/maps/nextbike-live.json?city=547
This datasets shows the number of bikes passing through 3 intersections in Jersey CityIntersection NameData Available from Bergen & Academy4/16/2020Jersey Ave & Grand St5/6/2019Washington & Thomas Gangemi4/29/2020Click here to see Bike usage in Jersey City
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dublin Bikes is a docked bike-share scheme in Dublin City. This page includes an API developed according to the General Bikeshare Feed Specification (GBFS) (e.g.) information about vehicles, stations, pricing, etc. The current location of the vehicles is updated every minute. In addition, this page includes historical files of bike location data. Disclaimer - Please note that some of the historical files are empty due to historical data issues. .hidden { display: none }
A list of the stations where one can pick up and return bicycles from the Divvy bicycle sharing system (http://divvybikes.com/). This dataset contains all stations. For a list of only those stations currently in service, see https://data.cityofchicago.org/d/67g3-8ig8. For real-time status of stations in machine-readable format, see https://gbfs.divvybikes.com/gbfs/gbfs.json.
This dataset presents the different bike lanes across Jersey City.Last Updated - November 2020Bike Facilities Full color MapJersey City Bike Facilities_2023
The data set contains the measurement results of the bicycle counts of the urban permanent counting stations in the city of Constance. This data can be used to determine exactly when and how many cyclists have been on the road in these places. The measurement results are recorded every 15 minutes. The census points serve as an important data basis for urban cycling policy. Currently, the city of Konstanz operates five permanent counting stations in the city area. The oldest bicycle counting station is at the end of the bicycle bridge at Herosépark (further information and pictures here). Since 2023, there are four more: one at the old Rhine bridge, one at Fürstenberg station, one at Petershausen station and one on Friedrichstraße. In addition, the city runs mobile traffic censuses from time to time at different locations (Mobile traffic censuses Cycling and foot traffic ◀ Open data Constance (offenedaten-konstanz.de)). There is also a dashboard here: Bicycle counting stations Konstanz (arcgis.com) Data of the bicycle counting stations of the city of Constance: The statistics are provided on this portal annually in CSV format. More up-to-date data can be requested if required. Access to the live API data can be assigned on a case-by-case basis. If you want to retrieve the data of the Konstanz bicycle counting stations for a specific day, you can do so on an interactive map of the manufacturer Eco Counter (http://eco-public.com/ParcPublic/?id=4586). The Bicycle Data Initiative has an interactive tool that offers the following features: (Attention: currently only works for data until 08.11.2020) Download raw data based on the intervals you choose: https://www.bicycle-data.de/bicycles-data Creation of automated and standardized analyses from the data (e.g. average values for day, month or weather conditions): https://www.bicycle-data.de/city-analysis Comparison of Konstanz with other cities: https://www.bicycle-data.de/city-comparison For this dataset we also published a Blogeintrag. (Source: City of Konstanz, Office for Urban Planning and Environment)
Presentation Of The City Bikes Program
This dataset falls under the category Individual Transport Other.
It contains the following data: Presentation of the bicycle city project
This dataset was scouted on 2022-02-14 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: http://www.planmob.salvador.ba.gov.br/index.php/13-estudos-projetos-e-programas?ml=1See URL for data access and license information.
This layer shows bike paths within the City of Markham
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is provided by the City of Sydney and provides bicycle parking locations. There are more than 3000 public bike parking spaces in the City of Sydney's area. There are 32 free bicycle parking spaces at Kings Cross car park level 5. Goulburn Street car park has 9 individual bike cages for casual use, as well as a free secure cage with 24 spaces that can be accessed with a pass. The API provides data in GeoJSON format, for more information visit City of Sydney.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Fremont Bridge Bicycle Counter began operation in October 2012 and records the number of bikes that cross the bridge using the pedestrian/bicycle pathways. Inductive loops on the east and west pathways count the passing of bicycles regardless of travel direction. The data consists of a date/time field: Date, east pathway count field: Fremont Bridge NB, and west pathway count field: Fremont Bridge SB. The count fields represent the total bicycles detected during the specified one hour period. Direction of travel is not specified, but in general most traffic in the Fremont Bridge NB field is travelling northbound and most traffic in the Fremont Bridge SB field is travelling southbound.
This is a dataset hosted by the City of Seattle. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore the City of Seattle using Kaggle and all of the data sources available through the City of Seattle organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
Cover photo by Brina Blum on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a simplified network representation of bike paths across City of Melbourne. The dataset can be used to create a digital bicycle network with route modelling capabilities that integrated existing bicycle infrastructure. The network has been created to be used with ArcGIS network analyst. The resulting network was connected to the City of Melbourne property layer through centroids created for this project:
The network can assist in multiple modelling tasks including catchment analysis and route analysis. The download is a zip file containing compressed .json files
Please see the metadata attached for further information.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The counters consist of two small tube sensors stretching across the street, which are attached to a small metal counting box made by Eco-Counter. The tubes only count people riding bikes. They are very accurate and designed to be used on greenways.
This is a dataset hosted by the City of Seattle. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore the City of Seattle using Kaggle and all of the data sources available through the City of Seattle organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
Cover photo by Patrick Hendry on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Bike racks installed in the public right-of-way in the City of Portland.The bicycle parking, streetlight and sidewalk asset datasets are actively maintained. All of PBOT’s asset data is manually digitized from work order diagrams and construction plans after work is completed and accepted. This does lag somewhat behind completion of construction, but is typically within a few weeks or months of the conclusion of construction work. -- Additional Information: Category: Transportation - Assets Purpose: No purpose information available. Update Frequency: Weekly-- Metadata Link:https://www.portlandmaps.com/metadata/index.cfm?&action=DisplayLayer&LayerID=52921
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains observed bike counts from sites across the city known as "Super Sunday". This is Australia’s biggest survey of recreational travel. Held annually in mid-November, the count looks at how runners, walkers, bike riders and other recreational users move around
There is a large number of fields captured for this dataset, which has been compiled into an attached metadata document.
Bike Lanes
This dataset falls under the category Individual Transport Street Network Geometries (Geodata).
It contains the following data: Bicycle routes, paracicles, lanes.
This dataset was scouted on 2022-02-12 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: https://ippuc.org.br/geodownloads/geo.htmSee URL for data access and license information.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Spokane Bridge Bicycle Counter records the number of bikes that cross the bridge using the pedestrian/bicycle pathway on the south side. Inductive loops on the pathway count the passing of bicycles with travel direction. The data consists of a date/time field, east pathway count field and west pathway count field. The count fields represent the total bicycles detected during the specified one hour period.
This is a dataset hosted by the City of Seattle. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore the City of Seattle using Kaggle and all of the data sources available through the City of Seattle organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
Cover photo by Kasper Rasmussen on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Public Bicycles
This dataset falls under the category Public Transport Other.
It contains the following data: Origin, destination, time, gender and age of trips made on the City's public bicycle system.
This dataset was scouted on 2022-02-20 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: https://data.buenosaires.gob.ar/dataset/bicicletas-publicas
Bike Lanes
This dataset falls under the category Planning & Policy Policy.
It contains the following data: Municipal cycling network made up of road interventions dedicated to the exclusive or non-exclusive circulation of bicycles. They are composed of bike lanes, cycle lanes, shared sidewalks, shared sidewalks and cycle routes.
This dataset was scouted on 2022-02-10 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: http://dados.prefeitura.sp.gov.br/dataset/ciclovias
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The New York City bikeshare, Citi Bike, has a real time, public API. This API conforms to the General Bikeshare Feed Specification. As such, this API contains information about the number of bikes and docks available at every station in NYC.
Since 2016, I have been pinging the public API every 2 minutes and storing the results. This dataset contains all of these results, from 8/15/2016 - 12/8/2021. The data unfortunately comes in the form of a bunch of CSVs. I recognize that this is not the best format to read large datasets like this, but a CSV is still a pretty universal format! My suggestion would be to convert these CSVs to parquet or something similar if you plan to do lots of analysis on lots of files.
Originally, I setup an EC2 instance and pinged a legacy API (code). In 2019, I switched to pinging this API via a Lambda function (code).
As part of this 2019 switch, I also started pinging the station information API once per week in order to collect information about each station, such as the name, latitude and longitude. While this dataset contains columns for all of the station information, these columns are missing data between 2016 and 8/2019. It would probably be reasonable to backfill that data with the earliest info available for each station, although be warned that this is not guaranteed to be accurate.
In order to reduce the individual file size, the full dataset has been bucketed by station_id
into 50 separate files. All historical data for a given station_id
are in the same file, and the stations are randomly distributed across the 50 files.
As previously mentioned, station information is missing for all data earlier than 8/2019. I have included a column, missing_station_information
to indicate when this information is missing. You may wonder why I don't just create a separate station information file which can be joined to the file containing the time series. The reason is that the station information can technically change over time. When station information is provided in a given row, that information is accurate within sometime 7 days prior. This is because I pinged the station information weekly and then had to join it to the time series.
The CSV files are the result of a CREATE TABLE AS AWS Athena query using the TEXTFILE
format. Consequently, null values are demarcated as N
. The two timestamp columns, station_status_last_reported
and station_information_last_updated
are in units of POXIX/UNIX time (i.e. seconds since 1970-01-01 00:00:00 UTC). The following code may be helpful to get you started loading the data as a pandas DataFrame.
def read_csv(filename: str) -> pd.DataFrame:
"""
Read DataFrame from a CSV file ``filename`` and convert to a
preferred schema.
"""
df = pd.read_csv(
filename,
sep=",",
na_values="\N",
dtype={
"station_id": str,
# Use Pandas Int16 dtype to allow for nullable integers
"num_bikes_available": "Int16",
"num_ebikes_available": "Int16",
"num_bikes_disabled": "Int16",
"num_docks_available": "Int16",
"num_docks_disabled": "Int16",
"is_installed": "Int16",
"is_renting": "Int16",
"is_returning": "Int16",
"station_status_last_reported": "Int64",
"station_name": str,
"lat": float,
"lon": float,
"region_id": str,
"capacity": "Int16",
# Use pandas boolean dtype to allow for nullable booleans
"has_kiosk": "boolean",
"station_information_last_updated": "Int64",
"missing_station_information": "boolean"
},
)
# Read in timestamps as UNIX/POSIX epochs but then convert to the local
# bike share timezone.
df["station_status_last_reported"] = pd.to_datetime(
df["station_status_last_reported"], unit="s", origin="unix", utc=True
).dt.tz_convert("US/Eastern")
df["station_information_last_updated"] = pd.to_datetime(
df["station_information_last_updated"], unit="s", origin="unix", utc=True
).dt.tz_convert("US/Eastern")
return df
The column names almost come directly from the station_status
and station_information
APIs. See the [GBFS schema](https://github.com/MobilityData/gbfs...