21 datasets found

NYC Yellow Taxi Trip Data

kaggle.com

zip

Updated Dec 9, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Elemento (2021). NYC Yellow Taxi Trip Data [Dataset]. https://www.kaggle.com/datasets/elemento/nyc-yellow-taxi-trip-data

Explore at:

zip(1915626894 bytes)Available download formats

Dataset updated

Dec 9, 2021

Authors

Elemento

License

https://www.usa.gov/government-works/https://www.usa.gov/government-works/

Area covered

New York

Description

Context

New York City (NYC) Taxi & Limousine Commission (TLC) keeps data from all its cabs, and it is freely available to download from its official website. You can access it here. Now, the TLC primarily keeps and manages data for 4 different types of vehicles: - Yellow Taxi: Yellow Medallion Taxicabs: These are the famous NYC yellow taxis that provide transportation exclusively through street hails. The number of taxicabs is limited by a finite number of medallions issued by the TLC. You access this mode of transportation by standing in the street and hailing an available taxi with your hand. The pickups are not pre-arranged. - Green Taxi: Street Hail Livery: The SHL program will allow livery vehicle owners to license and outfit their vehicles with green borough taxi branding, meters, credit card machines, and ultimately the right to accept street hails in addition to pre-arranged rides. - For-Hire Vehicles (FHVs): FHV transportation is accessed by a pre-arrangement with a dispatcher or limo company. These FHVs are not permitted to pick up passengers via street hails, as those rides are not considered pre-arranged.

Complimentary Kernel

I have made a Kernel especially for this dataset, which uses Clustering, Regression, and Time-Series techniques for this dataset. You can check it out here.

Important Points

In this dataset, we are considering only the Yellow Taxis Data, for the months of Jan 2015 & Jan-mar 2016.
If you go over to the website of NYC TLC, and download any of the CSV files, you will find a different format of these files. This is because, the TLC regularly adds more data, alongside updating the existing one.
One of the key changes that they have made to their data is that, instead of providing the pickup & dropoff coordinates, they have divided the NYC into regions and indexed those regions, and in the CSV files, they have provided these indices.
Due to this reason only, I have made this dataset using the previous version of the CSV files. This dataset allows me to practice my clustering knowledge alongside my time-series knowledge.
If you want to leave out the clustering part, then just go over to their website, and download the new CSV files.

Attributes

...

Field Name	Description
VendorID	A code indicating the TPEP provider that provided the record. Creative Mobile Technologies VeriFone Inc.
tpep_pickup_datetime	The date and time when the meter was engaged.
tpep_dropoff_datetime	The date and time when the meter was disengaged.
Passenger_count	The number of passengers in the vehicle. This is a driver-entered value.
Trip_distance	The elapsed trip distance in miles reported by the taximeter.
Pickup_longitude	Longitude where the meter was engaged.
Pickup_latitude	Latitude where the meter was engaged.
RateCodeID	The final rate code in effect at the end of the trip. Standard rate JFK Newark Nassau or Westchester Negotiated fare Group ride
Store_and_fwd_flag	This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store and forward trip
Dropoff_longitude	Longitude where the meter was disengaged.
Dropoff_ latitude	Latitude where the meter was disengaged.
Payment_type	A numeric code signifying how the passenger paid for the trip. Credit card Cash No charge Dispute Unknown Voided trip
Fare_amount	The time-and-distance fare calculated by the meter.
Extra	Miscellaneous extras and surcharges. Currently, this only includes. the $0.50 and $1 rush hour and overnight charges.
MTA_tax	0.50 MTA tax that is automatically triggered based on the metered rate in use.
Improvement_surcharge	0.30 improvement surcharge assessed trips at the flag drop. the improvement surcharge began being levied in 2015.

d
2023 Yellow Taxi Trip Data
catalog.data.gov
data.cityofnewyork.us
Updated Jul 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). 2023 Yellow Taxi Trip Data [Dataset]. https://catalog.data.gov/dataset/2023-yellow-taxi-trip-data
Explore at:
Dataset updated
Jul 20, 2024
Dataset provided by
data.cityofnewyork.us
Description
These records are generated from the trip record submissions made by yellow taxi Technology Service Providers (TSPs). Each row represents a single trip in a yellow taxi. The trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off taxi zone locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
C
Taxi Trips (2013-2023)
data.cityofchicago.org
catalog.data.gov
csv, xlsx, xml
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2024). Taxi Trips (2013-2023) [Dataset]. https://data.cityofchicago.org/Transportation/Taxi-Trips-2013-2023-/wrvz-psew
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Feb 7, 2024
Dataset authored and provided by
City of Chicago
Description
This dataset ends with 2023. Please see the Featured Content link below for the dataset that starts in 2024.

Taxi trips from 2013 to 2023 reported to the City of Chicago in its role as a regulatory agency. To protect privacy but allow for aggregate analyses, the Taxi ID is consistent for any given taxi medallion number but does not show the number, Census Tracts are suppressed in some cases, and times are rounded to the nearest 15 minutes.

Due to the data reporting process, not all trips are reported but the City believes that most are.
Uber NYC for-hire vehicles trip data (2021)
kaggle.com
zip
Updated Feb 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
shuheng_mo (2023). Uber NYC for-hire vehicles trip data (2021) [Dataset]. https://www.kaggle.com/datasets/shuhengmo/uber-nyc-forhire-vehicles-trip-data-2021
Explore at:
zip(4539471170 bytes)Available download formats
Dataset updated
Feb 2, 2023
Authors
shuheng_mo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
New York
Description
In Newyork City, all taxi vehicles are managed by TLC (Taxi and Limousine Commission), here is a brief description about TLC:

The New York City Taxi and Limousine Commission (TLC), created in 1971, is the agency responsible for licensing and regulating New York City's Medallion (Yellow) taxi cabs, for-hire vehicles (community-based liveries, black cars and luxury limousines), commuter vans, and paratransit vehicles. The Commission's Board consists of nine members, eight of whom are unsalaried Commissioners. The salaried Chair/ Commissioner presides over regularly scheduled public commission meetings and is the head of the agency, which maintains a staff of approximately 600 TLC employees. Over 200,000 TLC licensees complete approximately 1,000,000 trips each day. To operate for hire, drivers must first undergo a background check, have a safe driving record, and complete 24 hours of driver training. TLC-licensed vehicles are inspected for safety and emissions at TLC's Woodside Inspection Facility.

Now NYC TLC has released its Trip Record data to public for research and study purposes. There are three main taxi types in NYC: Yellow taxis are traditionally hailed by signaling to a driver who is on duty and seeking a passenger (street hail), but now they may also be hailed using an e-hail app like Curb or Arro. Yellow taxis are the only vehicles permitted to respond to a street hail from a passenger in all five boroughs. Green taxis, also known as boro taxis and street-hail liveries, were introduced in August of 2013 to improve taxi service and availability in the boroughs. Green taxis may respond to street hails, but only in the areas indicated in green on the map (i.e. above W 110 St/E 96th St in Manhattan and in the boroughs). FHV data includes trip data from high-volume for-hire vehicle bases (bases for companies dispatching 10,000+ trip per day, meaning Uber, Lyft, Via, and Juno), community livery bases, luxury limousine bases, and black car bases. Uber as one of the biggest ride-hailing services providers, its trip records are collected in High Volume For-Hire Vehicle Trip Records as well.

Based on this dataset, there are some business goals we want to achieve to improve Uber's ride-hailing service: Exploratory data analysis, research data fhvhv_tripdata_2021 and figure out underlying trip patterns in 2021. Based on fhvhv_tripdata_2021 and weather data, build predict model to predict the peak footfall. Try explore Uber's user portrait in NYC (which orders are urgent and what kind of users should be given higher priorities?)

Some useful tips about this dataset: - The trip data of the for-hire vehicles named like fhvhv_tripdata_2021-0X.parquet - Columns' description of the trip data please refer to data_dictionary_trip_records_hvfhs.pdf. - taxi_zones folder contains the geospatial data of NYC taxi zones (geopandas would be helpful). - taxi_zone_lookup.csv stores taxi zones zip code and other relevant information. - nyc 2021-01-01 to 2021-12-31.csv record the weather data of year 2021,taxi+_zone_lookup.csv stored the zone information of all taxi, data file end with .parquet could be processed by pyarrow package and convert to Pandas DataFrame.

If you find this dataset helpful, please up-vote and more high-quality datasets will be published in future!❤️
Taxi Data Set
kaggle.com
Updated Jul 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mick Hirsh (2023). Taxi Data Set [Dataset]. https://www.kaggle.com/datasets/mickhirsh/taxi-data-set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mick Hirsh
Description
First I'm give credits to Raviiloveyou who create the original Taxi trip fare predictor data set. Modify the Taxi Set to included taxi fares from Philadelphia, PA. The following costs are calculations have been updated in the dataset to include all fares for taxis
First 1/10 mile (flag drop) or fraction thereof: $2.70 Each additional 1/10 mile or fraction thereof: $0.25 Each 37.6 seconds of wait time: $0.25 Include speed of the taxis in KPH (Kilometers per Hour)

Columns are the following: Trip Duration in second (part of the original data set)

Trip Duration in minutes

Trip Duration in Hours

Distance Traveled in Kilometers (part of the original data set)

KPH speed of the taxis in Kilometers per Hour

Wait Time Cost: Each 37.6 seconds of wait time: $0.25 is taxi time used to get the person to the location

Distance Cost: Each additional 1/10 mile (.1 mile = 0.160934 KM) or fraction thereof: $0.25

Fare w Flag: starting cost is $2.70 added into Wait Time Cost plus Distance Cost

TIP: how much money did the taxi drive get for the trip (part of the original data set)

Miscellaneous fees: part of the original data set

Total Fare New: is the total cost of the trip

Num of passengers: is the number of passengers Note there is no addition cost per passenger for Philadelphia, PA Taxis.

surge applied: (part of the original data set)
Cab Services Drivers Info
kaggle.com
zip
Updated Mar 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akash (2024). Cab Services Drivers Info [Dataset]. https://www.kaggle.com/datasets/akashpawar10/cab-services-drivers-info
Explore at:
zip(193328 bytes)Available download formats
Dataset updated
Mar 31, 2024
Authors
Akash
Description
Recruiting and retaining drivers is seen by industry watchers as a tough battle for XYZCab. Churn among drivers is high and it’s very easy for drivers to stop working for the service on the fly or jump to Uber depending on the rates.

As the companies get bigger, the high churn could become a bigger problem. To find new drivers, XYZCab is casting a wide net, including people who don’t have cars for jobs. But this acquisition is really costly. Losing drivers frequently impacts the morale of the organization and acquiring new drivers is more expensive than retaining existing ones.

You are working as a data scientist with the Analytics Department of XYZCab, focused on driver team attrition. You are provided with the monthly information for a segment of drivers for 2019 and 2020 and tasked to predict whether a driver will be leaving the company or not based on their attributes like • Demographics (city, age, gender etc.) • Tenure information (joining date, Last Date) • Historical data regarding the performance of the driver (Quarterly rating, Monthly business acquired, grade, Income)

Column Profiling: 1. MMMM-YY : Reporting Date (Monthly) 2. Driver_ID : Unique id for drivers 3. Age : Age of the driver 4. Gender : Gender of the driver – Male : 0, Female: 1 5. City : City Code of the driver 6. Education_Level : Education level – 0 for 10+ ,1 for 12+ ,2 for graduate 7. Income : Monthly average Income of the driver 8. Date Of Joining : Joining date for the driver 9. LastWorkingDate : Last date of working for the driver 10. Joining Designation : Designation of the driver at the time of joining 11. Grade : Grade of the driver at the time of reporting 12. Total Business Value : The total business value acquired by the driver in a month (negative business indicates cancellation/refund or car EMI adjustments) 13. Quarterly Rating : Quarterly rating of the driver: 1,2,3,4,5 (higher is better)
D
Taxi Trips
data.sfgov.org
catalog.data.gov
csv, xlsx, xml
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Taxi Trips [Dataset]. https://data.sfgov.org/Transportation/Taxi-Trips/m8hk-2ipk
Explore at:
xml, csv, xlsxAvailable download formats
Dataset updated
Apr 8, 2025
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
A. SUMMARY This dataset contains information on taxi trips including pickup location, destination, and fare. Additional fields have been integrated to the raw data through automated and manual procedures to facilitate easier data analysis. Those fields are indicated in the column metadata.

B. HOW THE DATASET IS CREATED As required by the Transportation Code, all taxi companies permitted to operate in the City and County of San Francisco transmit digital records of their fleet’s activity to SFMTA in real time through the SFMTA Taxi Application Programming Interface (API).

C. UPDATE PROCESS This dataset will be updated monthly with new taxi trip information.

D. HOW TO USE THIS DATASET This dataset is useful for tracking average daily taxi trip counts and monitoring the impact of the Taxi Upfront Pricing Pilot program on driver income.

E. RELATED DATASETS
Taxi Medallion Holders

New York City Taxi and Limousine project

kaggle.com

zip

Updated Apr 22, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Ramin Huseyn (2024). New York City Taxi and Limousine project [Dataset]. https://www.kaggle.com/datasets/raminhuseyn/new-york-city-taxi-and-limousine-project

Explore at:

zip(1043839 bytes)Available download formats

Dataset updated

Apr 22, 2024

Authors

Ramin Huseyn

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

New York

Description

The New York City Taxi and Limousine Commission (TLC) oversees the licensing and regulation of taxi cabs and for-hire vehicles in the city. The TLC gathers data from over 200,000 license holders, including taxi drivers and limousine operators, who collectively complete around one million trips each day.

Note: The dataset used for this project was designed for educational purposes and may not accurately represent the behavior of taxi cab riders in New York City.

Column name	Description
ID	Trip identification number
VendorID	A code indicating the TPEP provider that provided the record. 1= Creative Mobile Technologies, LLC; 2= VeriFone Inc.
tpep_pickup_datetime	The date and time when the meter was engaged
tpep_dropoff_datetime	The date and time when the meter was disengaged
Passenger_count	The number of passengers in the vehicle. This is a driver-entered value
Trip_distance	The elapsed trip distance in miles reported by the taximeter
RateCodeID	The final rate code in effect at the end of the trip. 1= Standard rate 2=JFK 3=Newark 4=Nassau or Westchester 5=Negotiated fare 6=Group ride
Store_and_fwd_flag	This flag indicates whether the trip record was held in vehicle memory before being sent to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store and forward trip
PULocationID	TLC Taxi Zone in which the taximeter was engaged
DOLocationID	TLC Taxi Zone in which the taximeter was disengaged
Payment_type	A numeric code signifying how the passenger paid for the trip. 1= Credit card 2= Cash 3= No charge 4= Dispute 5= Unknown 6= Voided trip
Fare_amount	The time-and-distance fare calculated by the meter
Extra	Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges
MTA_tax	$0.50 MTA tax that is automatically triggered based on the metered rate in use
Tip_amount	Tip amount – This field is automatically populated for credit card tips. Cash tips are not included
Tolls_amount	Total amount of all tolls paid in trip
Improvement_surcharge	$0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015
Total_amount	The total amount charged to passengers. Does not include cash tips

Taxi Trajectory Data

kaggle.com

zip

Updated Apr 12, 2018

Facebook

Twitter

Click to copy link

Link copied

Cite

Chris Cross (2018). Taxi Trajectory Data [Dataset]. https://www.kaggle.com/crailtap/taxi-trajectory

Explore at:

zip(540159049 bytes)Available download formats

Dataset updated

Apr 12, 2018

Authors

Chris Cross

Description

Context

Technology has many effects on the transportation industry.

Content

We have provided an accurate dataset describing a complete year (from 01/07/2013 to 30/06/2014) of the trajectories for all the 442 taxis running in the city of Porto, in Portugal (i.e. one CSV file named "train.csv"). These taxis operate through a taxi dispatch central, using mobile data terminals installed in the vehicles. We categorize each ride into three categories: A) taxi central based, B) stand-based or C) non-taxi central based. For the first, we provide an anonymized id, when such information is available from the telephone call. The last two categories refer to services that were demanded directly to the taxi drivers on a B) taxi stand or on a C) random street.

Each data sample corresponds to one completed trip. It contains a total of 9 (nine) features, described as follows:

TRIP_ID: (String) It contains an unique identifier for each trip;
CALL_TYPE: (char) It identifies the way used to demand this service. It may contain one of three possible values: ‘A’ if this trip was dispatched from the central; ‘B’ if this trip was demanded directly to a taxi driver on a specific stand; ‘C’ otherwise (i.e. a trip demanded on a random street).
ORIGIN_CALL: (integer) It contains an unique identifier for each phone number which was used to demand, at least, one service. It identifies the trip’s customer if CALL_TYPE=’A’. Otherwise, it assumes a NULL value;
ORIGIN_STAND: (integer): It contains an unique identifier for the taxi stand. It identifies the starting point of the trip if CALL_TYPE=’B’. Otherwise, it assumes a NULL value;
TAXI_ID: (integer): It contains an unique identifier for the taxi driver that performed each trip;
TIMESTAMP: (integer) Unix Timestamp (in seconds). It identifies the trip’s start;
DAYTYPE: (char) It identifies the daytype of the trip’s start. It assumes one of three possible values: ‘B’ if this trip started on a holiday or any other special day (i.e. extending holidays, floating holidays, etc.); ‘C’ if the trip started on a day before a type-B day; ‘A’ otherwise (i.e. a normal day, workday or weekend).
MISSING_DATA: (Boolean) It is FALSE when the GPS data stream is complete and TRUE whenever one (or more) locations are missing
POLYLINE: (String): It contains a list of GPS coordinates (i.e. WGS84 format) mapped as a string. The beginning and the end of the string are identified with brackets (i.e. [ and ], respectively). Each pair of coordinates is also identified by the same brackets as [LONGITUDE, LATITUDE]. This list contains one pair of coordinates for each 15 seconds of trip. The last list item corresponds to the trip’s destination while the first one represents its start;

The total travel time of the trip (the prediction target of this competition) is defined as the (number of points-1) x 15 seconds. For example, a trip with 101 data points in POLYLINE has a length of (101-1) * 15 = 1500 seconds. Some trips have missing data points in POLYLINE, indicated by MISSING_DATA column, and it is part of the challenge how you utilize this knowledge.

Acknowledgements

Data from ECML/PKDD 15: Taxi Trip Time Prediction (II) Competition

Inspiration

Added this dataset because competition datasets do not appear in the dataset search and this dataset could help learn basic methods in the area of geo-spatial analysis and trajectory handling

Replication Data for: Heat Causes Large Earnings Losses for Informal-Sector...

dataverse.harvard.edu

Updated Oct 18, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

E. Somanathan; Saudamini Das (2024). Replication Data for: Heat Causes Large Earnings Losses for Informal-Sector Workers in India [Dataset]. http://doi.org/10.7910/DVN/1Q5HZD

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.7910/DVN/1Q5HZD

Dataset updated

Oct 18, 2024

Dataset provided by

Harvard Dataverse

Authors

E. Somanathan; Saudamini Das

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

India

Description

This is a dataset of 400 workers collected daily in the months of May and June of 2019 in Delhi. These workers were working as launderers, construction workers, painters, coolies (manual laborers in transport or other sectors), cycle rickshaw drivers, electric rickshaw drivers, auto (three-wheeled taxi) drivers, taxi drivers, food vendors, street vendors, rag pickers, petty traders, fruit sellers, waste and scrap dealers, roadside barbers, cobblers, roadside cycle/auto mechanics, and others. We collect data on their earnings, expenditure and health. The data was merged with temperature data from the meteorological station at Delhi Airport.

Taxi Licences - Dataset - York Open Data

data.yorkopendata.org

Updated May 22, 2017

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2017). Taxi Licences - Dataset - York Open Data [Dataset]. https://data.yorkopendata.org/dataset/taxi-licenses

Explore at:

Dataset updated

May 22, 2017

License

Open Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
License information was derived automatically

Area covered

York

Description

• Hackney carriage vehicles on 1st of June • Private hire vehicles on 1st of November A list of all Hackney Carriage and Private Hire vehicle licences issued by City of York Council. The list can be filtered to identify Wheelchair Accessible Vehicles as per Section 167 of the Equality Act 2010. For further information please visit City of York Council's website.

Newyork Yellow Taxi Trip Data

kaggle.com

zip

Updated Jul 25, 2021

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Sripathi Mohanasundaram (2021). Newyork Yellow Taxi Trip Data [Dataset]. https://www.kaggle.com/microize/newyork-yellow-taxi-trip-data-2020-2019

Explore at:

zip(1938408118 bytes)Available download formats

Dataset updated

Jul 25, 2021

Authors

Sripathi Mohanasundaram

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Area covered

New York

Description

Context

The yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP).

Content

Column Description

VendorID : A code indicating the TPEP provider that provided the record. ---- 1= Creative Mobile Technologies, LLC; 2= VeriFone Inc.
tpep_pickup_datetime : The date and time when the meter was engaged.
tpep_dropoff_datetime : The date and time when the meter was disengaged.
Passenger_count : The number of passengers in the vehicle.( This is a driver-entered value )
Trip_distance : The elapsed trip distance in miles reported by the taximeter.
PULocationID : TLC Taxi Zone in which the taximeter was engaged
DOLocationID :TLC Taxi Zone in which the taximeter was disengaged *RateCodeID : The final rate code in effect at the end of the trip. ---- 1= Standard rate ---- 2=JFK ---- 3=Newark ---- 4=Nassau or Westchester ---- 5=Negotiated fare ---- 6=Group ride
Store_and_fwd_flag : This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. ---- Y= store and forward trip ---- N= not a store and forward trip
Payment_type A numeric code signifying how the passenger paid for the trip. ---- 1= Credit card ---- 2= Cash ---- 3= No charge ---- 4= Dispute ---- 5= Unknown ---- 6= Voided trip
Fare_amount : The time-and-distance fare calculated by the meter.
Extra : Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges.
MTA_tax : $0.50 MTA tax that is automatically triggered based on the metered rate in use.
Improvement_surcharge : $0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015.
Tip_amount : Tip amount – This field is automatically populated for credit card tips. Cash tips are not included.
Tolls_amount : Total amount of all tolls paid in trip.
Total_amount : The total amount charged to passengers. Does not include cash tips.

Acknowledgements

Data is obtained from NYCTaxi & Limousine Commission website. https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

Taxi Trip Fare Prediction Challenge

kaggle.com

zip

Updated Nov 17, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Gaurav Dutta (2022). Taxi Trip Fare Prediction Challenge [Dataset]. https://www.kaggle.com/gauravduttakiit/taxi-trip-fare-prediction-challenge

Explore at:

zip(1082094 bytes)Available download formats

Dataset updated

Nov 17, 2022

Authors

Gaurav Dutta

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Overview

Through a real-world challenge, this hackathon aims to enhance competitors' data science and innovative analytical thinking abilities. Get an opportunity to work on a remarkable data science technology by competing with the best brains in this area at this point in time, where artificial intelligence and machine learning are at the forefront of attention, and find out how you stack up!

This hackathon will try to address the challenges faced by taxi operators in quoting the right fare to customers before starting the trip. However, the details are shared with taxi drivers or operators related to the trip, they find it difficult to quote the right fare because of uncertainties and calculation complexities. The same issue is faced by passengers as well because of inaccurate or irrelevant fares quoted. To find a solution for this, this hackathon provides a historical dataset to participants that includes records of taxi trip details and fares of those trips. Using this dataset, the participants need to build machine learning models for predicting the trip fare based on the given other useful features of the trip.

Overall, it involves using a dataset, finding the best set of features from the dataset, building a machine learning model to predict trip fare based on other trip features and evaluating the predictions using mean squared error and finally submitting the predictions in the given template.

Data description:

Trip_distance: The elapsed trip distance in miles reported by the taximeter. Rate_code: The final rate code is in effect at the end of the trip. 1= Standard rate,2=JFK,3=Newark, 4=Nassau or Westchester, 5=Negotiated fare,6=Group ride Storeandfwd_flag: This flag indicates whether the trip record was held in vehicle memory before sending it to the vendor and determines if the trip was stored in the server and forwarded to the vendor. Y= store and forward trip N= not a store and forward trip Payment_type: A numeric code signifying how the passenger paid for the trip. 1= Credit card,2= Cash, 3= No charge, 4= Dispute, 5= Unknown, 6= Voided trip Fare_amount: The time-and-distance fare calculated by the meter Extra: Miscellaneous extras and surcharges. Mta_tax: $0.50 MTA tax that is automatically triggered based on the metered rate in use. Tip_amount: Tip amount credited to the driver for credit card transactions. Tolls_amount: Total amount of all tolls paid in the trip. Imp_surcharge: $0.30 extra charges added automatically to all rides Total_amount: The total amount charged to passengers. Does not include cash tips Pickuplocationid: TLC Taxi Zone in which the taximeter was engaged Dropofflocationid: TLC Taxi Zone in which the taximeter was disengaged Year: The year in which the taxi trip was taken. Month: The month on which the taxi trip was taken. Day: The day on which the taxi trip was taken. Day_of_week: The day of the week on which the taxi trip was taken Hour_of_day: Used to determine the hour of the day in 24 hours format Trip_duration: The total duration of the trip in seconds calculated_total_amount: The total amount the customer has to pay for the taxi.

NYC Yellow Taxi Trip Records

kaggle.com

zip

Updated Jun 18, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

psv (2023). NYC Yellow Taxi Trip Records [Dataset]. https://www.kaggle.com/datasets/psvishnu/nyc-yellow-taxi-trip-records

Explore at:

zip(29733373878 bytes)Available download formats

Dataset updated

Jun 18, 2023

Authors

psv

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

New York

Description

About TLC Trip Record Data

Yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemised fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorised under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.

For-Hire Vehicle (“FHV”) trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location ID (shape file below). These records are generated from the FHV Trip Record submissions made by bases. Note: The TLC publishes base trip record data as submitted by the bases, and we cannot guarantee or confirm their accuracy or completeness. Therefore, this may not represent the total amount of trips dispatched by all TLC-licensed bases. The TLC performs routine reviews of the records and takes enforcement actions when necessary to ensure, to the extent possible, complete and accurate information.

Data Source: TLC

https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page

Data dictionary

https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf

Sr no.	Field Name	Description
1.	VendorID	A code indicating the TPEP provider that provided the record. 1 = Creative Mobile Technologies, LLC 2 = VeriFone Inc.
2.	tpep_pickup_datetime	The date and time when the meter was engaged.
3.	tpep_dropoff_datetime	The date and time when the meter was disengaged.
4.	Passenger_count	The number of passengers in the vehicle. (Driver-entered value)
5.	Trip_distance	The elapsed trip distance in miles reported by the taximeter.
6.	PULocationID	TLC Taxi Zone in which the taximeter was engaged.
7.	DOLocationID	TLC Taxi Zone in which the taximeter was disengaged.
8.	RateCodeID	The final rate code in effect at the end of the trip. 1 = Standard rate 2 = JFK 3 = Newark 4 = Nassau or Westchester 5 = Negotiated fare 6 = Group ride
9.	Store_and_fwd_flag	This flag indicates whether the trip record was held in vehicle memory before sending to the vendor. Y = store and forward trip N = not a store and forward trip
10.	Payment_type	A numeric code signifying how the passenger paid for the trip. 1 = Credit card 2 = Cash 3 = No charge 4 = Dispute 5 = Unknown 6 = Voided trip
11.	Fare_amount	The time-and-distance fare calculated by the meter.
12.	Extra	Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges.
13.	MTA_tax	$0.50 MTA tax that is automatically triggered based on the metered rate in use.
14.	Improvement_surcharge	$0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015.
15.	Tip_amount	Tip amount – This field is automatically populated for credit card tips. Cash tips are not included.
16.	Tolls_amount	Total amount of all tolls paid in trip.
17.	Total_amount	The total amount charged to passengers. Does not include cash tips.
18.	Congestion_Surcharge	Total amount collected in trip for NYS congestion surcharge.
19.	Airport_fee	$1.25 for pick up only at LaGuardia and John F. Kennedy Airports.

Photo by Mourad Saadi on Unsplash

Gett Taxi Interview Assignment

kaggle.com

zip

Updated Nov 2, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Abilash Reddy (2024). Gett Taxi Interview Assignment [Dataset]. https://www.kaggle.com/datasets/datadoodler/gett-taxi-interview-assignment

Explore at:

zip(3196714 bytes)Available download formats

Dataset updated

Nov 2, 2024

Authors

Abilash Reddy

Description

https://www.gett.com/uk/wp-content/uploads/sites/6/2022/11/top_illustration_desktop.svg" alt=""> Gett, previously known as GetTaxi, is an Israeli-developed technology platform solely focused on corporate Ground Transportation Management (GTM). They have an application where clients can order taxis, and drivers can accept their rides (offers). At the moment, when the client clicks the Order button in the application, the matching system searches for the most relevant drivers and offers them the order. In this task, we would like to investigate some matching metrics for orders that did not completed successfully, i.e., the customer didn't end up getting a car.

Assignment Please complete the following tasks. 1. Build up distribution of orders according to reasons for failure: cancellations before and after driver assignment, and reasons for order rejection. Analyse the resulting plot. Which category has the highest number of orders? 2. Plot the distribution of failed orders by hours. Is there a trend that certain hours have an abnormally high proportion of one category or another? What hours are the biggest fails? How can this be explained? 3. Plot the average time to cancellation with and without driver, by the hour. If there are any outliers in the data, it would be better to remove them. Can we draw any conclusions from this plot? 4. Plot the distribution of average ETA by hours. How can this plot be explained? 5. BONUS Hexagons. Using the h3 and folium packages, calculate how many sizes 8 hexes contain 80% of all orders from the original data sets and visualise the hexes, colouring them by the number of fails on the map.

We have two data sets: data_orders and data_offers, both being stored in a CSV format. The data_orders data set contains the following columns: 1. order_datetime - time of the order 2. origin_longitude - longitude of the order 3. origin_latitude - latitude of the order 4. m_order_eta - time before order arrival 5. order_gk - order number 6. order_status_key - status, an enumeration consisting of the following mapping:= 4 - cancelled by client, 9 - cancelled by system, i.e., a reject 7. is_driver_assigned_key - whether a driver has been assigned 8. cancellation_time_in_seconds - how many seconds passed before cancellation

The data_offers data set is a simple map with 2 columns: 1. order_gk - order number, associated with the same column from the data_orders data set 2. offer_id - ID of an offer

Taxi trip data NYC

kaggle.com

zip

Updated Jun 11, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Anandaram Ganapathi (2022). Taxi trip data NYC [Dataset]. https://www.kaggle.com/datasets/anandaramg/taxi-trip-data-nyc/discussion?sort=undefined

Explore at:

zip(1710447 bytes)Available download formats

Dataset updated

Jun 11, 2022

Authors

Anandaram Ganapathi

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

New York

Description

NYC Cabs

If you live in a mid-to-large-sized city and take taxis, you have probably already tried Uber. What you may not know is that the transportation app has different rates in each city. New York City is arguably the taxi capital of America and home to the classic yellow taxicab.

They do have some similarities—both conventional taxis and Uber charge fares based on a combination of time and distance. Both also charge passengers for any bridge or road tolls in addition to the fare. However, there are also significant differences between Uber and taxis in New York City. Which is the quickest and most economical ride in New York City, and what are the differences between Uber and Yellow Cab?

Key Takeaways

Both conventional taxis and Uber charge fares based on a combination of time and distance. Taxis do not have surge pricing, but riders might have to wait longer when demand exceeds supply. Uber does not differentiate between cruising and stop-and-go traffic, while taxis do charge different rates based on speed.

Uber does not differentiate between cruising and stop-and-go traffic, while taxis do charge different rates based on speed. In addition, Uber has price hikes during times of high demand, while taxis have extra rush hour fees. Uber does provide fare estimates within the Uber app, but it does not guarantee the final fare because road conditions can change during the ride.

The service is only accessible through an up-to-date smartphone. If you do not own a smartphone, your smartphone is not up to date, or you forgot your phone, you will not be able to use Uber. New York City regulations prohibit street hails for private ride services (also called livery services).

Yellow Cabs

Getting into a taxi in an unfamiliar city can be nerve-wracking. You have no idea how much the trip should cost or if the driver is taking the most direct route. In New York City, taxi riders cannot get an advance estimate for taxi fares. The NYC Taxi and Limousine Commission’s official stance is that “it is impossible to pre-calculate a fare because the meter rate depends on traffic, construction, weather, and route to the destination.”

Yellow cabs accept street hails anywhere in New York City. Green Boro Taxis, which operate in the outer boroughs and parts of Manhattan north of certain streets, can either be prearranged or hailed on the street.

Uber Cabs

Uber has something called surge pricing, which refers to the higher fares it imposes during times of high rider demand. Surge pricing can take effect during rush hour, during a natural disaster, or during a random spike of requests on a Saturday afternoon. Uber claims these price increases are meant to encourage more Uber drivers to get out on the road, and that prices revert to normal when supply and demand even out—capitalism at its finest. The Uber app notifies users of surge pricing when they request a ride.

Uber used to offer a $60 flat rate between Manhattan and JFK but dropped that option. Rates are now calculated based on time and distance.

Taxis do not have surge pricing, but riders might have to wait longer when demand exceeds supply. Taxis do, however, add a $0.50 surcharge in the evening (8:00 p.m. to 6:00 a.m.) and a $1 surcharge during rush hour (4:00 p.m. to 8:00 p.m.), Monday through Friday. If Uber’s surge pricing is in effect, you will probably pay a lot less by taking a cab, if you can get one. Surge pricing will at least double your usual fare, and Uber has reported charging customers as much as $39 per mile. A New York City councilman introduced a bill in January 2015 proposing to limit surge pricing to twice the usual rate.

Yellow cabs have regulated fares to and from the Newark International and John F. Kennedy International airports. For trips between Newark International Airport and New York City, the price is the regular metered fare, plus a $17.50 surcharge, plus tolls. For trips between John F. Kennedy International Airport and Manhattan, it is a flat fare of $52 plus tolls. The regular metered fare applies to all trips to and from LaGuardia International Airport.

Payments and Tipping

Before you can call an Uber, you must download the app onto your smartphone and register a credit card or PayPal account to your Uber account. Uber automatically charges your account at the end of the ride. When you take a cab, you can pay with cash, credit card, or a payment app on your phone, like Apple Pay.

Tipping is different with each service, too. Uber allows riders to tip their driver through the app after they have rated their ride, once complete. You have 30 days to add a tip once your ride is complete.

NYC cab drivers are required to accept MasterCard, Visa, Discover, and American Express credit cards and MasterCard and Visa debit cards with no minimum fare requirement. Passengers pay for rides by swiping their card through a card reader a...

Encoded shortest path sequences for NYC taxi trip

kaggle.com

zip

Updated Sep 8, 2017

Facebook

Twitter

Click to copy link

Link copied

Cite

Lem (2017). Encoded shortest path sequences for NYC taxi trip [Dataset]. https://www.kaggle.com/tongjiyiming/encoded-shortest-path-sequences-for-nyc-taxi-trip

Explore at:

zip(140239784 bytes)Available download formats

Dataset updated

Sep 8, 2017

Authors

Lem

License

http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

Area covered

New York

Description

Get a closest approximate of real trip trace

For NYC taxi trip, there is only start coordinates and end coordinates, which is hard to be used to explore variation of different road condition. This dataset uses OSM road data and break it into small directed segments. Each segment is defined from one cross (node) to adjacent cross (node). And, it has direction, which means two-ways road will result in two segment, and oneway road will result in one segment.

What you get

Scipy`s .npz format

141505 columns: each column encoded a small segment. Its value is just an indicator: 1 means taxi would travel through this segment, 0 means not. As you can see, it results in a very sparse matrix.

Some insights

This is inspired by ECML/PKDD 15: Taxi Trajectory Prediction. Apparently, with more accurate trajectory of trips, we create a space that different trip`s information can be shared by more others. If we only got start and end point, similarity of two trips only depends on a clustering of start and end point, which we hope, could have some accurate similarity approximation (which also highly depend on how many clusters you define). But, with path sequences, we can know that two quite different trips can share some common but important parts of roads, such as motorways. This is closer to real life. More importantly, now, we can learn the situation of that road segments from many different trips, as long as we have a suitable machine learning algorithm. Similar to the winners in ECML/PKDD 15, this dataset allows deep learning to be applied.

The original road data is from OSM. Library osmnx, networkx are used to store road graph. Speedlimit data is primary got from NYC`s DOT. A shortest path library in java developed by Arizona State University is used for processing shortest path using Dijkstra Algorithm. Using Pyjnius to use java library inside Python. Additionally, with some multithread programming code in both python and java to speedup the whole execution.

The initial idea is actually to get Top K paths, so that it provides a probabilistic information of taxi driver drives along. It is too slow as I run the Yen`s Top-K algorithms.

Time dependent linkage might also help. But, linkage between different segments are not considered, since I have no idea how to map that information to a useful feature space.

Notice that this data actually use exactly same information as New York City Taxi with OSRM. The difference is that that data only give a name of a road, but this dataset encode each small segments. However, total time from that dataset is also proved to be useful. Unfortunately, I did not record the trip time by my codes. We will see if anyone ask.

So, have fun with this dataset, Kagglers!

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

Namma Yatri Cab Bookings Bangalore Open Data

kaggle.com

zip

Updated Jan 12, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Nishant Singhal (2024). Namma Yatri Cab Bookings Bangalore Open Data [Dataset]. https://www.kaggle.com/datasets/stacknishant/namma-yatri-cab-bookings-bangalore-open-data

Explore at:

zip(17393 bytes)Available download formats

Dataset updated

Jan 12, 2024

Authors

Nishant Singhal

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Area covered

Bengaluru

Description

The cab bookings data is from namma yatri ride-hailing services within the Bangalore region. It is downloaded from nammayatri.in

Namma Yatri has become Bengaluru's most loved auto app, since its formal launch in January 2023. It is a Direct-to-Driver app. There is no commission or middle-men. What one pays goes 100% to the Driver and his family.

Here is the github page: https://github.com/nammayatri

Rides sample

kaggle.com

zip

Updated Oct 21, 2019

Facebook

Twitter

Click to copy link

Link copied

Cite

Easy (2019). Rides sample [Dataset]. https://www.kaggle.com/datasets/easytaxi/week-18-rides-sample

Explore at:

zip(93924512 bytes)Available download formats

Dataset updated

Oct 21, 2019

Dataset authored and provided by

Easy

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Context

Easy (Taxi) is a mobile E-hailing application available in many countries in Latin America. The app allows users to book a taxi and track it in real time.

This dataset contains a sample of rides that were requested by Easy's passengers.

This dataset is being used in the Easy selection process for the Data Engineering Team. We will evaluate the following topics from your solution:

Code legibility and understandability;
Code and solution structure, with the perspective of future maintenance and evolution;
Solution compatibility with a Big Data stack.

Content

This data is a sample collected in the 18th week of 2018 and anonymized for privacy purposes.

You are going to find the following schema in the rides.csv file (inside the .zip file):

ride_id: Unique ride identifier
city_code: The IATA code representing where the ride was requested.
country_code: The associated ISO-3166 code for the city_code.
passenger_id: Unique passenger identifier
requested_at: The timestamp of the ride request event
payment_created_at: The timestamp of payment date
boarded_at: This is filled when the passenger boards the requested ride
driver_id: Unique driver identifier
payment_final_value: This is populated at the end of the ride, with the monetary value paid by the passenger in local currency.
rating_stars: The stars value given by the passenger after the end of the ride.

Acknowledgements

This dataset was collected from Easy's data and never distributed before.

Inspiration

By looking to the past we can better understand how urban mobility happens on the cities and then we can make better plans for the future.

We hope you can answer at least one of the following questions:

What is the average ride payment value?
How many rides were done on the period?
What is the ride conversion rate? ride conversion rate is the ratio between the number of rides that were requested and the number of rides that were effectively done.
How are the 10 best drivers? (better evaluated)
What is the average number of rides that a driver do?
What is our capacity of attending rides, considering all drivers available on the given sample?

NLU-NLG Dataset

kaggle.com

zip

Updated Jul 7, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Kaushik T.D. Roy (2025). NLU-NLG Dataset [Dataset]. https://www.kaggle.com/datasets/kaushiktdroy/nlu-nlg-dataset

Explore at:

zip(9079410 bytes)Available download formats

Dataset updated

Jul 7, 2025

Authors

Kaushik T.D. Roy

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Customer Service Natural Language Generation and Understanding Dataset

Overview

This dataset contains cleaned and processed customer service conversations designed for both Natural Language Generation (NLG) and Natural Language Understanding (NLU) tasks. The data focuses on customer inquiries across various service categories including refunds, bookings, and cancellations, with corresponding human agent responses and detailed annotations.

Dataset Structure

NLG Component

The Natural Language Generation portion contains instruction-following examples for customer service response generation:

Fields: - instruction: Task description specifying the customer query type and emotional state - context: The actual customer message/inquiry - response: The appropriate agent response

Format Example: json { "instruction": "A customer has a query about refund. They are feeling NEGATIVE. Draft a helpful response.", "context": "who do i send my taxi receipt to for reimbursement please? (hersham to walton at 12.20)", "response": "what day did you travel please?" }

NLU Component

The Natural Language Understanding portion provides comprehensive annotations for customer messages:

Fields: - text: The customer's original message - intents: List of identified intents/purposes (e.g., "refund", "booking", "cancellation") - sentiment: Sentiment classification (e.g., "NEGATIVE", "POSITIVE") - entities: Named entities extracted from the text (currently empty arrays, indicating entity extraction preprocessing)

Format Example: json { "text": "who do i send my taxi receipt to for reimbursement please? (hersham to walton at 12.20)", "intents": ["refund"], "sentiment": ["NEGATIVE"], "entities": [] }

Data Characteristics

Intent Categories

Refund: Customer inquiries about reimbursements, receipt submissions, and payment issues
Booking: Reservation-related queries, modifications, and booking assistance
Cancellation: Service cancellation requests and related issues

Sentiment Distribution

NEGATIVE: Customer frustration, complaints, or urgent requests
POSITIVE: Appreciation, thanks, or satisfied interactions

Domain Context

The dataset appears to focus on transportation/travel services, with references to: - Taxi receipts and reimbursements - Flight bookings and upgrades - Location-based services (e.g., "hersham to walton", "man-eus")

Use Cases

NLG Applications

Customer service chatbot response generation
Automated agent assistance tools
Response quality evaluation and training
Instruction-following model fine-tuning

NLU Applications

Intent classification systems
Sentiment analysis in customer service
Multi-label classification tasks
Customer query routing and prioritization

Data Quality

Cleaned: Processed to remove inconsistencies and formatting issues
Anonymized: Personal details appear to be removed or genericized
Balanced: Includes both positive and negative sentiment examples
Realistic: Contains authentic customer service language patterns

Technical Notes

All text is in English
JSON format for easy integration with ML pipelines
Consistent schema across all records
Ready for immediate use in training/evaluation workflows

Potential Applications

Training conversational AI systems
Benchmarking NLU models
Customer service automation research
Sentiment analysis in business contexts
Multi-task learning experiments combining NLG and NLU

Limitations

Limited entity annotations (entities field is empty)
Moderate dataset size
Specific to customer service domain
May require additional preprocessing for certain applications

This dataset serves as a valuable resource for researchers and practitioners working on customer service automation, conversational AI, and natural language processing applications in business contexts.

NYC Yellow Taxi Trip Data

Context

Complimentary Kernel

Important Points

Attributes

2023 Yellow Taxi Trip Data

Taxi Trips (2013-2023)

Uber NYC for-hire vehicles trip data (2021)

Taxi Data Set

Cab Services Drivers Info

Taxi Trips

New York City Taxi and Limousine project

Taxi Trajectory Data

Context

Content

Acknowledgements

Inspiration

Replication Data for: Heat Causes Large Earnings Losses for Informal-Sector...

Taxi Licences - Dataset - York Open Data

Newyork Yellow Taxi Trip Data

Context

Content

Acknowledgements

Taxi Trip Fare Prediction Challenge

Overview

Data description:

NYC Yellow Taxi Trip Records

About TLC Trip Record Data

Data Source: TLC

Data dictionary

Gett Taxi Interview Assignment

Taxi trip data NYC

NYC Cabs

Key Takeaways

Yellow Cabs

Uber Cabs

Payments and Tipping

Encoded shortest path sequences for NYC taxi trip

Get a closest approximate of real trip trace

What you get

Some insights

Acknowledgements

Inspiration

Namma Yatri Cab Bookings Bangalore Open Data

Rides sample

Context

Content

Acknowledgements

Inspiration

NLU-NLG Dataset

Customer Service Natural Language Generation and Understanding Dataset

Overview

Dataset Structure

NLG Component

NLU Component

Data Characteristics

Intent Categories

Sentiment Distribution

Domain Context

Use Cases

NLG Applications

NLU Applications

Data Quality

Technical Notes

Potential Applications

Limitations

NYC Yellow Taxi Trip Data

Pratice your ML skills on this Time-Series Dataset!

Context

Complimentary Kernel

Important Points

Attributes