These records are generated from the trip record submissions made by yellow taxi Technology Service Providers (TSPs). Each row represents a single trip in a yellow taxi. The trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off taxi zone locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
A Dispatch Service Provider (DSP) can dispatch trips on behalf of the FHV (For-Hire-Vehicle) Base.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
The yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP).
Column Description
Data is obtained from NYCTaxi & Limousine Commission website. https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
This dataset includes trip records from all trips completed in green taxis in NYC in 2014. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Livery Passenger Enhancement Program (LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.
This dataset is collected by the NYC Taxi and Limousine Commission (TLC) and includes trip records from all trips completed in Yellow and Green taxis in NYC, and all trips in for-hire vehicles (FHV) in the last 5 years. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. For detailed information about this dataset, go to TOC Trip Record Data This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The NYC Yellow Taxi dataset is a publicly available collection of yellow trip records across the five boroughs of New York City. It captures details about each ride including pickup and drop-off times, pickup and drop-off locations, trip distances, itemized fares, passenger count and payment types. The trip records are published monthly resulting in fragmented and large files for one full year.
In this dataset, all the raw monthly files for the year 2024 were merged and aggregated to reduce size of the 2024 dataset to a manageable level. The trip records were grouped by date, hour, passenger count, payment type, pickup and drop off boroughs. Each grouped row contains a summary of the trip count and the total sum of the distance travelled, time spent and fares charged by all the taxis in that group.
To provide a summary of the number of trips by zones, a separate file was created which aggregates the number of trips by month, boroughs and zones.
The cleaning steps involved removing trips with unrealistic travel distance, duration and fare. Feel free to view the merging and the cleaning code here.
Original data was sourced from the NYC Taxi & Limousine Commission (TLC).
The dataset is based on the 2015 NYC Yellow Cab trip record data published by the NYC Taxi and Limousine Commission (TLC). The train data was randomly sampled which is 2% of the whole year dataset. The test data is non overlapping with the train data and is 1% of the whole year dataset.
File descriptions
train.csv - the training set (contains 2874263 trip records)
test.csv - the testing set (contains 1446517 trip records)
Update:
I have uploaded the train and test dataset with geodesic distance calculated by library geopy (from the coordinates in the dataset) to save up the time for calculation.
Monthly report including total dispatched trips, total dispatched shared trips, and unique dispatched vehicles aggregated by FHV (For-Hire Vehicle) base. These have been tabulated from raw trip record submissions made by bases to the NYC Taxi and Limousine Commission (TLC).
This dataset is typically updated monthly on a two-month lag, as bases have until the conclusion of the following month to submit a month of trip records to the TLC. In example, a base has until Feb 28 to submit complete trip records for January. Therefore, the January base aggregates will appear in March at the earliest. The TLC may elect to defer updates to the FHV Base Aggregate Report if a large number of bases have failed to submit trip records by the due date.
Note: The TLC publishes base trip record data as submitted by the bases, and we cannot guarantee or confirm their accuracy or completeness. Therefore, this may not represent the total amount of trips dispatched by all TLC-licensed bases. The TLC performs routine reviews of the records and takes enforcement actions when necessary to ensure, to the extent possible, complete and accurate information.
PLEASE NOTE: This dataset, which includes all TLC Licensed Drivers who are in good standing and able to drive, is updated every day in the evening between 4-7pm. Please check the 'Last Update Date' field to make sure the list has updated successfully. 'Last Update Date' should show either today or yesterday's date, depending on the time of day. If the list is outdated, please download the most recent list from the link below. http://www1.nyc.gov/assets/tlc/downloads/datasets/tlc_medallion_drivers_active.csv
This is a list of drivers with a current TLC Driver License, which authorizes drivers to operate NYC TLC licensed yellow and green taxicabs and for-hire vehicles (FHVs). This list is accurate as of the date and time shown in the Last Date Updated and Last Time Updated fields. Questions about the contents of this dataset can be sent by email to: licensinginquiries@tlc.nyc.gov.
These records are generated from the trip record submissions made by green taxi Technology Service Providers (TSPs). Each row represents a single trip in a green taxi. The trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off taxi zone locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
This dataset includes trip records from all trips completed in green taxis in NYC in 2015. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Livery Passenger Enhancement Program (LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.
TaxisVIS1M
TaxisVIS1M is a subset of New York City’s 2015 Yellow Taxi trip records containing 1 million trips. It is sampled from the original TLC Trip Record Data to enable fast processing and visualization while preserving the temporal and spatial patterns of taxi mobility across the city.
Source Data: NYC Taxi & Limousine Commission (TLC) 2015 Yellow Taxi Trip Records (TLC Trip Record Data ).
Preprocessing:
Download the original Parquet files from TLC.
Load the data using… See the full description on the dataset page: https://huggingface.co/datasets/oscur/taxisvis1M.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission. The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
This dataset includes trip records from all trips completed in green taxis in NYC in 2015. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Livery Passenger Enhancement Program (LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.
THIS DATASET IS UPDATED SEVERAL TIMES PER DAY. TLC Driver application status check for applicants who had applied for a new TLC driver’s license. For more information and to upload missing requirements, visit www.nyc.gov/tlcup
For historical/archived data of past application statuses, please see- https://data.cityofnewyork.us/Transportation/Historical-Driver-Application-Status/p32s-yqxq
This dataset includes trip records from all trips completed in green taxis in NYC from January to June 2016. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Livery Passenger Enhancement Program (LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.
These records are generated from the trip record submissions made by High Volume For-Hire Vehicle (FHV) bases. On August 14, 2018, Mayor de Blasio signed Local Law 149 of 2018, creating a new license category for TLC-licensed FHV businesses that currently dispatch or plan to dispatch more than 10,000 FHV trips in New York City per day under a single brand, trade, or operating name, referred to as High-Volume For-Hire Services (HVFHS). This law went into effect on Feb 1, 2019. Each row represents a single trip in a FHV dispatched by a high volume base. The trip records include fields capturing the high volume license number, the pickup and drop-off date, time, and taxi zone location ID, which correspond with the NYC Taxi Zones open dataset.
This dataset includes trip records from all trips completed in green taxis in NYC from January to June 2016. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Livery Passenger Enhancement Program (LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.
NYC TLC Licensed FHV drivers that are currently active and in good standing. This list is accurate to the date and time represented in the Last Date Updated and Last Time Updated fields.
This dataset includes trip records from all trips completed in green taxis in NYC in 2015. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Livery Passenger Enhancement Program (LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.
These records are generated from the trip record submissions made by yellow taxi Technology Service Providers (TSPs). Each row represents a single trip in a yellow taxi. The trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off taxi zone locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.