Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
New York City (NYC) Taxi & Limousine Commission (TLC) keeps data from all its cabs, and it is freely available to download from its official website. You can access it here. Now, the TLC primarily keeps and manages data for 4 different types of vehicles: - Yellow Taxi: Yellow Medallion Taxicabs: These are the famous NYC yellow taxis that provide transportation exclusively through street hails. The number of taxicabs is limited by a finite number of medallions issued by the TLC. You access this mode of transportation by standing in the street and hailing an available taxi with your hand. The pickups are not pre-arranged. - Green Taxi: Street Hail Livery: The SHL program will allow livery vehicle owners to license and outfit their vehicles with green borough taxi branding, meters, credit card machines, and ultimately the right to accept street hails in addition to pre-arranged rides. - For-Hire Vehicles (FHVs): FHV transportation is accessed by a pre-arrangement with a dispatcher or limo company. These FHVs are not permitted to pick up passengers via street hails, as those rides are not considered pre-arranged.
| Field Name | Description |
|---|---|
| VendorID |
A code indicating the TPEP provider that provided the record.
|
| tpep_pickup_datetime | The date and time when the meter was engaged. |
| tpep_dropoff_datetime | The date and time when the meter was disengaged. |
| Passenger_count | The number of passengers in the vehicle. This is a driver-entered value. |
| Trip_distance | The elapsed trip distance in miles reported by the taximeter. |
| Pickup_longitude | Longitude where the meter was engaged. |
| Pickup_latitude | Latitude where the meter was engaged. |
| RateCodeID | The final rate code in effect at the end of the trip.
|
| Store_and_fwd_flag | This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store and forward trip |
| Dropoff_longitude | Longitude where the meter was disengaged. |
| Dropoff_ latitude | Latitude where the meter was disengaged. |
| Payment_type | A numeric code signifying how the passenger paid for the trip.
|
| Fare_amount | The time-and-distance fare calculated by the meter. |
| Extra | Miscellaneous extras and surcharges. Currently, this only includes. the $0.50 and $1 rush hour and overnight charges. |
| MTA_tax | 0.50 MTA tax that is automatically triggered based on the metered rate in use. |
| Improvement_surcharge | 0.30 improvement surcharge assessed trips at the flag drop. the improvement surcharge began being levied in 2015. |
Facebook
TwitterThese records are generated from the trip record submissions made by yellow taxi Technology Service Providers (TSPs). Each row represents a single trip in a yellow taxi. The trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off taxi zone locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
Facebook
TwitterThis dataset ends with 2023. Please see the Featured Content link below for the dataset that starts in 2024.
Taxi trips from 2013 to 2023 reported to the City of Chicago in its role as a regulatory agency. To protect privacy but allow for aggregate analyses, the Taxi ID is consistent for any given taxi medallion number but does not show the number, Census Tracts are suppressed in some cases, and times are rounded to the nearest 15 minutes.
Due to the data reporting process, not all trips are reported but the City believes that most are.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In Newyork City, all taxi vehicles are managed by TLC (Taxi and Limousine Commission), here is a brief description about TLC:
The New York City Taxi and Limousine Commission (TLC), created in 1971, is the agency responsible for licensing and regulating New York City's Medallion (Yellow) taxi cabs, for-hire vehicles (community-based liveries, black cars and luxury limousines), commuter vans, and paratransit vehicles. The Commission's Board consists of nine members, eight of whom are unsalaried Commissioners. The salaried Chair/ Commissioner presides over regularly scheduled public commission meetings and is the head of the agency, which maintains a staff of approximately 600 TLC employees. Over 200,000 TLC licensees complete approximately 1,000,000 trips each day. To operate for hire, drivers must first undergo a background check, have a safe driving record, and complete 24 hours of driver training. TLC-licensed vehicles are inspected for safety and emissions at TLC's Woodside Inspection Facility.
Now NYC TLC has released its Trip Record data to public for research and study purposes. There are three main taxi types in NYC: Yellow taxis are traditionally hailed by signaling to a driver who is on duty and seeking a passenger (street hail), but now they may also be hailed using an e-hail app like Curb or Arro. Yellow taxis are the only vehicles permitted to respond to a street hail from a passenger in all five boroughs. Green taxis, also known as boro taxis and street-hail liveries, were introduced in August of 2013 to improve taxi service and availability in the boroughs. Green taxis may respond to street hails, but only in the areas indicated in green on the map (i.e. above W 110 St/E 96th St in Manhattan and in the boroughs). FHV data includes trip data from high-volume for-hire vehicle bases (bases for companies dispatching 10,000+ trip per day, meaning Uber, Lyft, Via, and Juno), community livery bases, luxury limousine bases, and black car bases. Uber as one of the biggest ride-hailing services providers, its trip records are collected in High Volume For-Hire Vehicle Trip Records as well.
Based on this dataset, there are some business goals we want to achieve to improve Uber's ride-hailing service: Exploratory data analysis, research data fhvhv_tripdata_2021 and figure out underlying trip patterns in 2021. Based on fhvhv_tripdata_2021 and weather data, build predict model to predict the peak footfall. Try explore Uber's user portrait in NYC (which orders are urgent and what kind of users should be given higher priorities?)
Some useful tips about this dataset:
- The trip data of the for-hire vehicles named like fhvhv_tripdata_2021-0X.parquet
- Columns' description of the trip data please refer to data_dictionary_trip_records_hvfhs.pdf.
- taxi_zones folder contains the geospatial data of NYC taxi zones (geopandas would be helpful).
- taxi_zone_lookup.csv stores taxi zones zip code and other relevant information.
- nyc 2021-01-01 to 2021-12-31.csv record the weather data of year 2021,taxi+_zone_lookup.csv stored the zone information of all taxi, data file end with .parquet could be processed by pyarrow package and convert to Pandas DataFrame.
If you find this dataset helpful, please up-vote and more high-quality datasets will be published in future!❤️
Facebook
TwitterFirst I'm give credits to Raviiloveyou who create the original Taxi trip fare predictor data set.
Modify the Taxi Set to included taxi fares from Philadelphia, PA.
The following costs are calculations have been updated in the dataset to include all fares for taxis
First 1/10 mile (flag drop) or fraction thereof: $2.70
Each additional 1/10 mile or fraction thereof: $0.25
Each 37.6 seconds of wait time: $0.25
Include speed of the taxis in KPH (Kilometers per Hour)
Columns are the following: Trip Duration in second (part of the original data set)
Trip Duration in minutes
Trip Duration in Hours
Distance Traveled in Kilometers (part of the original data set)
KPH speed of the taxis in Kilometers per Hour
Wait Time Cost: Each 37.6 seconds of wait time: $0.25 is taxi time used to get the person to the location
Distance Cost: Each additional 1/10 mile (.1 mile = 0.160934 KM) or fraction thereof: $0.25
Fare w Flag: starting cost is $2.70 added into Wait Time Cost plus Distance Cost
TIP: how much money did the taxi drive get for the trip (part of the original data set)
Miscellaneous fees: part of the original data set
Total Fare New: is the total cost of the trip
Num of passengers: is the number of passengers Note there is no addition cost per passenger for Philadelphia, PA Taxis.
surge applied: (part of the original data set)
Facebook
TwitterRecruiting and retaining drivers is seen by industry watchers as a tough battle for XYZCab. Churn among drivers is high and it’s very easy for drivers to stop working for the service on the fly or jump to Uber depending on the rates.
As the companies get bigger, the high churn could become a bigger problem. To find new drivers, XYZCab is casting a wide net, including people who don’t have cars for jobs. But this acquisition is really costly. Losing drivers frequently impacts the morale of the organization and acquiring new drivers is more expensive than retaining existing ones.
You are working as a data scientist with the Analytics Department of XYZCab, focused on driver team attrition. You are provided with the monthly information for a segment of drivers for 2019 and 2020 and tasked to predict whether a driver will be leaving the company or not based on their attributes like • Demographics (city, age, gender etc.) • Tenure information (joining date, Last Date) • Historical data regarding the performance of the driver (Quarterly rating, Monthly business acquired, grade, Income)
Column Profiling: 1. MMMM-YY : Reporting Date (Monthly) 2. Driver_ID : Unique id for drivers 3. Age : Age of the driver 4. Gender : Gender of the driver – Male : 0, Female: 1 5. City : City Code of the driver 6. Education_Level : Education level – 0 for 10+ ,1 for 12+ ,2 for graduate 7. Income : Monthly average Income of the driver 8. Date Of Joining : Joining date for the driver 9. LastWorkingDate : Last date of working for the driver 10. Joining Designation : Designation of the driver at the time of joining 11. Grade : Grade of the driver at the time of reporting 12. Total Business Value : The total business value acquired by the driver in a month (negative business indicates cancellation/refund or car EMI adjustments) 13. Quarterly Rating : Quarterly rating of the driver: 1,2,3,4,5 (higher is better)
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset contains information on taxi trips including pickup location, destination, and fare. Additional fields have been integrated to the raw data through automated and manual procedures to facilitate easier data analysis. Those fields are indicated in the column metadata.
B. HOW THE DATASET IS CREATED As required by the Transportation Code, all taxi companies permitted to operate in the City and County of San Francisco transmit digital records of their fleet’s activity to SFMTA in real time through the SFMTA Taxi Application Programming Interface (API).
C. UPDATE PROCESS This dataset will be updated monthly with new taxi trip information.
D. HOW TO USE THIS DATASET This dataset is useful for tracking average daily taxi trip counts and monitoring the impact of the Taxi Upfront Pricing Pilot program on driver income.
E. RELATED DATASETS
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The New York City Taxi and Limousine Commission (TLC) oversees the licensing and regulation of taxi cabs and for-hire vehicles in the city. The TLC gathers data from over 200,000 license holders, including taxi drivers and limousine operators, who collectively complete around one million trips each day.
Note: The dataset used for this project was designed for educational purposes and may not accurately represent the behavior of taxi cab riders in New York City.
| Column name | Description |
|---|---|
| ID | Trip identification number |
| VendorID | A code indicating the TPEP provider that provided the record. 1= Creative Mobile Technologies, LLC; 2= VeriFone Inc. |
| tpep_pickup_datetime | The date and time when the meter was engaged |
| tpep_dropoff_datetime | The date and time when the meter was disengaged |
| Passenger_count | The number of passengers in the vehicle. This is a driver-entered value |
| Trip_distance | The elapsed trip distance in miles reported by the taximeter |
| RateCodeID | The final rate code in effect at the end of the trip. 1= Standard rate 2=JFK 3=Newark 4=Nassau or Westchester 5=Negotiated fare 6=Group ride |
| Store_and_fwd_flag | This flag indicates whether the trip record was held in vehicle memory before being sent to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store and forward trip |
| PULocationID | TLC Taxi Zone in which the taximeter was engaged |
| DOLocationID | TLC Taxi Zone in which the taximeter was disengaged |
| Payment_type | A numeric code signifying how the passenger paid for the trip. 1= Credit card 2= Cash 3= No charge 4= Dispute 5= Unknown 6= Voided trip |
| Fare_amount | The time-and-distance fare calculated by the meter |
| Extra | Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges |
| MTA_tax | $0.50 MTA tax that is automatically triggered based on the metered rate in use |
| Tip_amount | Tip amount – This field is automatically populated for credit card tips. Cash tips are not included |
| Tolls_amount | Total amount of all tolls paid in trip |
| Improvement_surcharge | $0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015 |
| Total_amount | The total amount charged to passengers. Does not include cash tips |
Facebook
TwitterTechnology has many effects on the transportation industry.
We have provided an accurate dataset describing a complete year (from 01/07/2013 to 30/06/2014) of the trajectories for all the 442 taxis running in the city of Porto, in Portugal (i.e. one CSV file named "train.csv"). These taxis operate through a taxi dispatch central, using mobile data terminals installed in the vehicles. We categorize each ride into three categories: A) taxi central based, B) stand-based or C) non-taxi central based. For the first, we provide an anonymized id, when such information is available from the telephone call. The last two categories refer to services that were demanded directly to the taxi drivers on a B) taxi stand or on a C) random street.
Each data sample corresponds to one completed trip. It contains a total of 9 (nine) features, described as follows:
TRIP_ID: (String) It contains an unique identifier for each trip;
CALL_TYPE: (char) It identifies the way used to demand this service. It may contain one of three possible values: ‘A’ if this trip was dispatched from the central; ‘B’ if this trip was demanded directly to a taxi driver on a specific stand; ‘C’ otherwise (i.e. a trip demanded on a random street).
ORIGIN_CALL: (integer) It contains an unique identifier for each phone number which was used to demand, at least, one service. It identifies the trip’s customer if CALL_TYPE=’A’. Otherwise, it assumes a NULL value;
ORIGIN_STAND: (integer): It contains an unique identifier for the taxi stand. It identifies the starting point of the trip if CALL_TYPE=’B’. Otherwise, it assumes a NULL value;
TAXI_ID: (integer): It contains an unique identifier for the taxi driver that performed each trip;
TIMESTAMP: (integer) Unix Timestamp (in seconds). It identifies the trip’s start;
DAYTYPE: (char) It identifies the daytype of the trip’s start. It assumes one of three possible values: ‘B’ if this trip started on a holiday or any other special day (i.e. extending holidays, floating holidays, etc.); ‘C’ if the trip started on a day before a type-B day; ‘A’ otherwise (i.e. a normal day, workday or weekend).
MISSING_DATA: (Boolean) It is FALSE when the GPS data stream is complete and TRUE whenever one (or more) locations are missing
POLYLINE: (String): It contains a list of GPS coordinates (i.e. WGS84 format) mapped as a string. The beginning and the end of the string are identified with brackets (i.e. [ and ], respectively). Each pair of coordinates is also identified by the same brackets as [LONGITUDE, LATITUDE]. This list contains one pair of coordinates for each 15 seconds of trip. The last list item corresponds to the trip’s destination while the first one represents its start;
The total travel time of the trip (the prediction target of this competition) is defined as the (number of points-1) x 15 seconds. For example, a trip with 101 data points in POLYLINE has a length of (101-1) * 15 = 1500 seconds. Some trips have missing data points in POLYLINE, indicated by MISSING_DATA column, and it is part of the challenge how you utilize this knowledge.
Data from ECML/PKDD 15: Taxi Trip Time Prediction (II) Competition
Added this dataset because competition datasets do not appear in the dataset search and this dataset could help learn basic methods in the area of geo-spatial analysis and trajectory handling
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a dataset of 400 workers collected daily in the months of May and June of 2019 in Delhi. These workers were working as launderers, construction workers, painters, coolies (manual laborers in transport or other sectors), cycle rickshaw drivers, electric rickshaw drivers, auto (three-wheeled taxi) drivers, taxi drivers, food vendors, street vendors, rag pickers, petty traders, fruit sellers, waste and scrap dealers, roadside barbers, cobblers, roadside cycle/auto mechanics, and others. We collect data on their earnings, expenditure and health. The data was merged with temperature data from the meteorological station at Delhi Airport.
Facebook
TwitterOpen Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
License information was derived automatically
• Hackney carriage vehicles on 1st of June • Private hire vehicles on 1st of November A list of all Hackney Carriage and Private Hire vehicle licences issued by City of York Council. The list can be filtered to identify Wheelchair Accessible Vehicles as per Section 167 of the Equality Act 2010. For further information please visit City of York Council's website.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
The yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP).
Column Description
Data is obtained from NYCTaxi & Limousine Commission website. https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Through a real-world challenge, this hackathon aims to enhance competitors' data science and innovative analytical thinking abilities. Get an opportunity to work on a remarkable data science technology by competing with the best brains in this area at this point in time, where artificial intelligence and machine learning are at the forefront of attention, and find out how you stack up!
This hackathon will try to address the challenges faced by taxi operators in quoting the right fare to customers before starting the trip. However, the details are shared with taxi drivers or operators related to the trip, they find it difficult to quote the right fare because of uncertainties and calculation complexities. The same issue is faced by passengers as well because of inaccurate or irrelevant fares quoted. To find a solution for this, this hackathon provides a historical dataset to participants that includes records of taxi trip details and fares of those trips. Using this dataset, the participants need to build machine learning models for predicting the trip fare based on the given other useful features of the trip.
Overall, it involves using a dataset, finding the best set of features from the dataset, building a machine learning model to predict trip fare based on other trip features and evaluating the predictions using mean squared error and finally submitting the predictions in the given template.
Trip_distance: The elapsed trip distance in miles reported by the taximeter. Rate_code: The final rate code is in effect at the end of the trip. 1= Standard rate,2=JFK,3=Newark, 4=Nassau or Westchester, 5=Negotiated fare,6=Group ride Storeandfwd_flag: This flag indicates whether the trip record was held in vehicle memory before sending it to the vendor and determines if the trip was stored in the server and forwarded to the vendor. Y= store and forward trip N= not a store and forward trip Payment_type: A numeric code signifying how the passenger paid for the trip. 1= Credit card,2= Cash, 3= No charge, 4= Dispute, 5= Unknown, 6= Voided trip Fare_amount: The time-and-distance fare calculated by the meter Extra: Miscellaneous extras and surcharges. Mta_tax: $0.50 MTA tax that is automatically triggered based on the metered rate in use. Tip_amount: Tip amount credited to the driver for credit card transactions. Tolls_amount: Total amount of all tolls paid in the trip. Imp_surcharge: $0.30 extra charges added automatically to all rides Total_amount: The total amount charged to passengers. Does not include cash tips Pickuplocationid: TLC Taxi Zone in which the taximeter was engaged Dropofflocationid: TLC Taxi Zone in which the taximeter was disengaged Year: The year in which the taxi trip was taken. Month: The month on which the taxi trip was taken. Day: The day on which the taxi trip was taken. Day_of_week: The day of the week on which the taxi trip was taken Hour_of_day: Used to determine the hour of the day in 24 hours format Trip_duration: The total duration of the trip in seconds calculated_total_amount: The total amount the customer has to pay for the taxi.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemised fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorised under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.
For-Hire Vehicle (“FHV”) trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location ID (shape file below). These records are generated from the FHV Trip Record submissions made by bases. Note: The TLC publishes base trip record data as submitted by the bases, and we cannot guarantee or confirm their accuracy or completeness. Therefore, this may not represent the total amount of trips dispatched by all TLC-licensed bases. The TLC performs routine reviews of the records and takes enforcement actions when necessary to ensure, to the extent possible, complete and accurate information.
https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf
| Sr no. | Field Name | Description |
|---|---|---|
| 1. | VendorID | A code indicating the TPEP provider that provided the record. 1 = Creative Mobile Technologies, LLC 2 = VeriFone Inc. |
| 2. | tpep_pickup_datetime | The date and time when the meter was engaged. |
| 3. | tpep_dropoff_datetime | The date and time when the meter was disengaged. |
| 4. | Passenger_count | The number of passengers in the vehicle. (Driver-entered value) |
| 5. | Trip_distance | The elapsed trip distance in miles reported by the taximeter. |
| 6. | PULocationID | TLC Taxi Zone in which the taximeter was engaged. |
| 7. | DOLocationID | TLC Taxi Zone in which the taximeter was disengaged. |
| 8. | RateCodeID | The final rate code in effect at the end of the trip. 1 = Standard rate 2 = JFK 3 = Newark 4 = Nassau or Westchester 5 = Negotiated fare 6 = Group ride |
| 9. | Store_and_fwd_flag | This flag indicates whether the trip record was held in vehicle memory before sending to the vendor. Y = store and forward trip N = not a store and forward trip |
| 10. | Payment_type | A numeric code signifying how the passenger paid for the trip. 1 = Credit card 2 = Cash 3 = No charge 4 = Dispute 5 = Unknown 6 = Voided trip |
| 11. | Fare_amount | The time-and-distance fare calculated by the meter. |
| 12. | Extra | Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges. |
| 13. | MTA_tax | $0.50 MTA tax that is automatically triggered based on the metered rate in use. |
| 14. | Improvement_surcharge | $0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015. |
| 15. | Tip_amount | Tip amount – This field is automatically populated for credit card tips. Cash tips are not included. |
| 16. | Tolls_amount | Total amount of all tolls paid in trip. |
| 17. | Total_amount | The total amount charged to passengers. Does not include cash tips. |
| 18. | Congestion_Surcharge | Total amount collected in trip for NYS congestion surcharge. |
| 19. | Airport_fee | $1.25 for pick up only at LaGuardia and John F. Kennedy Airports. |
Photo by Mourad Saadi on Unsplash
Facebook
Twitterhttps://www.gett.com/uk/wp-content/uploads/sites/6/2022/11/top_illustration_desktop.svg" alt="">
Gett, previously known as GetTaxi, is an Israeli-developed technology platform solely focused on corporate Ground Transportation Management (GTM). They have an application where clients can order taxis, and drivers can accept their rides (offers). At the moment, when the client clicks the Order button in the application, the matching system searches for the most relevant drivers and offers them the order. In this task, we would like to investigate some matching metrics for orders that did not completed successfully, i.e., the customer didn't end up getting a car.
Assignment Please complete the following tasks. 1. Build up distribution of orders according to reasons for failure: cancellations before and after driver assignment, and reasons for order rejection. Analyse the resulting plot. Which category has the highest number of orders? 2. Plot the distribution of failed orders by hours. Is there a trend that certain hours have an abnormally high proportion of one category or another? What hours are the biggest fails? How can this be explained? 3. Plot the average time to cancellation with and without driver, by the hour. If there are any outliers in the data, it would be better to remove them. Can we draw any conclusions from this plot? 4. Plot the distribution of average ETA by hours. How can this plot be explained? 5. BONUS Hexagons. Using the h3 and folium packages, calculate how many sizes 8 hexes contain 80% of all orders from the original data sets and visualise the hexes, colouring them by the number of fails on the map.
We have two data sets: data_orders and data_offers, both being stored in a CSV format. The data_orders data set contains the following columns: 1. order_datetime - time of the order 2. origin_longitude - longitude of the order 3. origin_latitude - latitude of the order 4. m_order_eta - time before order arrival 5. order_gk - order number 6. order_status_key - status, an enumeration consisting of the following mapping:= 4 - cancelled by client, 9 - cancelled by system, i.e., a reject 7. is_driver_assigned_key - whether a driver has been assigned 8. cancellation_time_in_seconds - how many seconds passed before cancellation
The data_offers data set is a simple map with 2 columns: 1. order_gk - order number, associated with the same column from the data_orders data set 2. offer_id - ID of an offer
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
If you live in a mid-to-large-sized city and take taxis, you have probably already tried Uber. What you may not know is that the transportation app has different rates in each city. New York City is arguably the taxi capital of America and home to the classic yellow taxicab.
They do have some similarities—both conventional taxis and Uber charge fares based on a combination of time and distance. Both also charge passengers for any bridge or road tolls in addition to the fare. However, there are also significant differences between Uber and taxis in New York City. Which is the quickest and most economical ride in New York City, and what are the differences between Uber and Yellow Cab?
Both conventional taxis and Uber charge fares based on a combination of time and distance. Taxis do not have surge pricing, but riders might have to wait longer when demand exceeds supply. Uber does not differentiate between cruising and stop-and-go traffic, while taxis do charge different rates based on speed.
Uber does not differentiate between cruising and stop-and-go traffic, while taxis do charge different rates based on speed. In addition, Uber has price hikes during times of high demand, while taxis have extra rush hour fees. Uber does provide fare estimates within the Uber app, but it does not guarantee the final fare because road conditions can change during the ride.
The service is only accessible through an up-to-date smartphone. If you do not own a smartphone, your smartphone is not up to date, or you forgot your phone, you will not be able to use Uber. New York City regulations prohibit street hails for private ride services (also called livery services).
Getting into a taxi in an unfamiliar city can be nerve-wracking. You have no idea how much the trip should cost or if the driver is taking the most direct route. In New York City, taxi riders cannot get an advance estimate for taxi fares. The NYC Taxi and Limousine Commission’s official stance is that “it is impossible to pre-calculate a fare because the meter rate depends on traffic, construction, weather, and route to the destination.”
Yellow cabs accept street hails anywhere in New York City. Green Boro Taxis, which operate in the outer boroughs and parts of Manhattan north of certain streets, can either be prearranged or hailed on the street.
Uber has something called surge pricing, which refers to the higher fares it imposes during times of high rider demand. Surge pricing can take effect during rush hour, during a natural disaster, or during a random spike of requests on a Saturday afternoon. Uber claims these price increases are meant to encourage more Uber drivers to get out on the road, and that prices revert to normal when supply and demand even out—capitalism at its finest. The Uber app notifies users of surge pricing when they request a ride.
Uber used to offer a $60 flat rate between Manhattan and JFK but dropped that option. Rates are now calculated based on time and distance.
Taxis do not have surge pricing, but riders might have to wait longer when demand exceeds supply. Taxis do, however, add a $0.50 surcharge in the evening (8:00 p.m. to 6:00 a.m.) and a $1 surcharge during rush hour (4:00 p.m. to 8:00 p.m.), Monday through Friday. If Uber’s surge pricing is in effect, you will probably pay a lot less by taking a cab, if you can get one. Surge pricing will at least double your usual fare, and Uber has reported charging customers as much as $39 per mile. A New York City councilman introduced a bill in January 2015 proposing to limit surge pricing to twice the usual rate.
Yellow cabs have regulated fares to and from the Newark International and John F. Kennedy International airports. For trips between Newark International Airport and New York City, the price is the regular metered fare, plus a $17.50 surcharge, plus tolls. For trips between John F. Kennedy International Airport and Manhattan, it is a flat fare of $52 plus tolls. The regular metered fare applies to all trips to and from LaGuardia International Airport.
Before you can call an Uber, you must download the app onto your smartphone and register a credit card or PayPal account to your Uber account. Uber automatically charges your account at the end of the ride. When you take a cab, you can pay with cash, credit card, or a payment app on your phone, like Apple Pay.
Tipping is different with each service, too. Uber allows riders to tip their driver through the app after they have rated their ride, once complete. You have 30 days to add a tip once your ride is complete.
NYC cab drivers are required to accept MasterCard, Visa, Discover, and American Express credit cards and MasterCard and Visa debit cards with no minimum fare requirement. Passengers pay for rides by swiping their card through a card reader a...
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
For NYC taxi trip, there is only start coordinates and end coordinates, which is hard to be used to explore variation of different road condition. This dataset uses OSM road data and break it into small directed segments. Each segment is defined from one cross (node) to adjacent cross (node). And, it has direction, which means two-ways road will result in two segment, and oneway road will result in one segment.
Scipy`s .npz format
141505 columns: each column encoded a small segment. Its value is just an indicator: 1 means taxi would travel through this segment, 0 means not. As you can see, it results in a very sparse matrix.
This is inspired by ECML/PKDD 15: Taxi Trajectory Prediction. Apparently, with more accurate trajectory of trips, we create a space that different trip`s information can be shared by more others. If we only got start and end point, similarity of two trips only depends on a clustering of start and end point, which we hope, could have some accurate similarity approximation (which also highly depend on how many clusters you define). But, with path sequences, we can know that two quite different trips can share some common but important parts of roads, such as motorways. This is closer to real life. More importantly, now, we can learn the situation of that road segments from many different trips, as long as we have a suitable machine learning algorithm. Similar to the winners in ECML/PKDD 15, this dataset allows deep learning to be applied.
The original road data is from OSM. Library osmnx, networkx are used to store road graph. Speedlimit data is primary got from NYC`s DOT. A shortest path library in java developed by Arizona State University is used for processing shortest path using Dijkstra Algorithm. Using Pyjnius to use java library inside Python. Additionally, with some multithread programming code in both python and java to speedup the whole execution.
The initial idea is actually to get Top K paths, so that it provides a probabilistic information of taxi driver drives along. It is too slow as I run the Yen`s Top-K algorithms.
Time dependent linkage might also help. But, linkage between different segments are not considered, since I have no idea how to map that information to a useful feature space.
Notice that this data actually use exactly same information as New York City Taxi with OSRM. The difference is that that data only give a name of a road, but this dataset encode each small segments. However, total time from that dataset is also proved to be useful. Unfortunately, I did not record the trip time by my codes. We will see if anyone ask.
So, have fun with this dataset, Kagglers!
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The cab bookings data is from namma yatri ride-hailing services within the Bangalore region. It is downloaded from nammayatri.in
Namma Yatri has become Bengaluru's most loved auto app, since its formal launch in January 2023. It is a Direct-to-Driver app. There is no commission or middle-men. What one pays goes 100% to the Driver and his family.
Here is the github page: https://github.com/nammayatri
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Easy (Taxi) is a mobile E-hailing application available in many countries in Latin America. The app allows users to book a taxi and track it in real time.
This dataset contains a sample of rides that were requested by Easy's passengers.
This dataset is being used in the Easy selection process for the Data Engineering Team. We will evaluate the following topics from your solution:
This data is a sample collected in the 18th week of 2018 and anonymized for privacy purposes.
You are going to find the following schema in the rides.csv file (inside the .zip file):
This dataset was collected from Easy's data and never distributed before.
By looking to the past we can better understand how urban mobility happens on the cities and then we can make better plans for the future.
We hope you can answer at least one of the following questions:
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains cleaned and processed customer service conversations designed for both Natural Language Generation (NLG) and Natural Language Understanding (NLU) tasks. The data focuses on customer inquiries across various service categories including refunds, bookings, and cancellations, with corresponding human agent responses and detailed annotations.
The Natural Language Generation portion contains instruction-following examples for customer service response generation:
Fields:
- instruction: Task description specifying the customer query type and emotional state
- context: The actual customer message/inquiry
- response: The appropriate agent response
Format Example:
json
{
"instruction": "A customer has a query about refund. They are feeling NEGATIVE. Draft a helpful response.",
"context": "who do i send my taxi receipt to for reimbursement please? (hersham to walton at 12.20)",
"response": "what day did you travel please?"
}
The Natural Language Understanding portion provides comprehensive annotations for customer messages:
Fields:
- text: The customer's original message
- intents: List of identified intents/purposes (e.g., "refund", "booking", "cancellation")
- sentiment: Sentiment classification (e.g., "NEGATIVE", "POSITIVE")
- entities: Named entities extracted from the text (currently empty arrays, indicating entity extraction preprocessing)
Format Example:
json
{
"text": "who do i send my taxi receipt to for reimbursement please? (hersham to walton at 12.20)",
"intents": ["refund"],
"sentiment": ["NEGATIVE"],
"entities": []
}
The dataset appears to focus on transportation/travel services, with references to: - Taxi receipts and reimbursements - Flight bookings and upgrades - Location-based services (e.g., "hersham to walton", "man-eus")
This dataset serves as a valuable resource for researchers and practitioners working on customer service automation, conversational AI, and natural language processing applications in business contexts.
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
New York City (NYC) Taxi & Limousine Commission (TLC) keeps data from all its cabs, and it is freely available to download from its official website. You can access it here. Now, the TLC primarily keeps and manages data for 4 different types of vehicles: - Yellow Taxi: Yellow Medallion Taxicabs: These are the famous NYC yellow taxis that provide transportation exclusively through street hails. The number of taxicabs is limited by a finite number of medallions issued by the TLC. You access this mode of transportation by standing in the street and hailing an available taxi with your hand. The pickups are not pre-arranged. - Green Taxi: Street Hail Livery: The SHL program will allow livery vehicle owners to license and outfit their vehicles with green borough taxi branding, meters, credit card machines, and ultimately the right to accept street hails in addition to pre-arranged rides. - For-Hire Vehicles (FHVs): FHV transportation is accessed by a pre-arrangement with a dispatcher or limo company. These FHVs are not permitted to pick up passengers via street hails, as those rides are not considered pre-arranged.
| Field Name | Description |
|---|---|
| VendorID |
A code indicating the TPEP provider that provided the record.
|
| tpep_pickup_datetime | The date and time when the meter was engaged. |
| tpep_dropoff_datetime | The date and time when the meter was disengaged. |
| Passenger_count | The number of passengers in the vehicle. This is a driver-entered value. |
| Trip_distance | The elapsed trip distance in miles reported by the taximeter. |
| Pickup_longitude | Longitude where the meter was engaged. |
| Pickup_latitude | Latitude where the meter was engaged. |
| RateCodeID | The final rate code in effect at the end of the trip.
|
| Store_and_fwd_flag | This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store and forward trip |
| Dropoff_longitude | Longitude where the meter was disengaged. |
| Dropoff_ latitude | Latitude where the meter was disengaged. |
| Payment_type | A numeric code signifying how the passenger paid for the trip.
|
| Fare_amount | The time-and-distance fare calculated by the meter. |
| Extra | Miscellaneous extras and surcharges. Currently, this only includes. the $0.50 and $1 rush hour and overnight charges. |
| MTA_tax | 0.50 MTA tax that is automatically triggered based on the metered rate in use. |
| Improvement_surcharge | 0.30 improvement surcharge assessed trips at the flag drop. the improvement surcharge began being levied in 2015. |