https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This directory contains data on over 4.5 million Uber pickups in New York City from April to September 2014, and 14.3 million more Uber pickups from January to June 2015. Trip-level data on 10 other for-hire vehicle (FHV) companies, as well as aggregated data for 329 FHV companies, is also included. All the files are as they were received on August 3, Sept. 15 and Sept. 22, 2015.
FiveThirtyEight obtained the data from the NYC Taxi & Limousine Commission (TLC) by submitting a Freedom of Information Law request on July 20, 2015. The TLC has sent us the data in batches as it continues to review trip data Uber and other HFV companies have submitted to it. The TLC's correspondence with FiveThirtyEight is included in the files TLC_letter.pdf
, TLC_letter2.pdf
and TLC_letter3.pdf
. TLC records requests can be made here.
This data was used for four FiveThirtyEight stories: Uber Is Serving New York’s Outer Boroughs More Than Taxis Are, Public Transit Should Be Uber’s New Best Friend, Uber Is Taking Millions Of Manhattan Rides Away From Taxis, and Is Uber Making NYC Rush-Hour Traffic Worse?.
The dataset contains, roughly, four groups of files:
There are six files of raw data on Uber pickups in New York City from April to September 2014. The files are separated by month and each has the following columns:
Date/Time
: The date and time of the Uber pickupLat
: The latitude of the Uber pickupLon
: The longitude of the Uber pickupBase
: The TLC base company code affiliated with the Uber pickupThese files are named:
uber-raw-data-apr14.csv
uber-raw-data-aug14.csv
uber-raw-data-jul14.csv
uber-raw-data-jun14.csv
uber-raw-data-may14.csv
uber-raw-data-sep14.csv
Also included is the file uber-raw-data-janjune-15.csv
This file has the following columns:
Dispatching_base_num
: The TLC base company code of the base that dispatched the UberPickup_date
: The date and time of the Uber pickupAffiliated_base_num
: The TLC base company code affiliated with the Uber pickuplocationID
: The pickup location ID affiliated with the Uber pickupThe Base
codes are for the following Uber bases:
B02512 : Unter B02598 : Hinter B02617 : Weiter B02682 : Schmecken B02764 : Danach-NY B02765 : Grun B02835 : Dreist B02836 : Drinnen
For coarse-grained location information from these pickups, the file taxi-zone-lookup.csv
shows the taxi Zone
(essentially, neighborhood) and Borough
for each locationID
.
The dataset also contains 10 files of raw data on pickups from 10 for-hire vehicle (FHV) companies. The trip information varies by company, but can include day of trip, time of trip, pickup location, driver's for-hire license number, and vehicle's for-hire license number.
These files are named:
American_B01362.csv
Diplo_B01196.csv
Highclass_B01717.csv
Skyline_B00111.csv
Carmel_B00256.csv
Federal_02216.csv
Lyft_B02510.csv
Dial7_B00887.csv
Firstclass_B01536.csv
Prestige_B01338.csv
There is also a file other-FHV-data-jan-aug-2015.csv
containing daily pickup data for 329 FHV companies from January 2015 through August 2015.
The file Uber-Jan-Feb-FOIL.csv
contains aggregated daily Uber trip statistics in January and February 2015.
In the fourth quarter of 2023, Uber's ridership worldwide totaled 2.6 billion trips. This compares to 2.1 billion trips in the first quarter of 2022, representing an increase of 24 percent year-on-year. A brief overview of Uber Technologies Uber Technologies Corporation started as a ridesharing company to disrupt the traditional taxi services industry. Having observed the global lucrativeness of the sharing economy in the upcoming years, Uber expanded its business profile to reshape the entire transportation industry, from food delivery and logistics to transport of people. As a result of strategic market positioning, the company experienced strong growth. The net revenue of Uber increased over 75 times in ten years, up from 0.5 billion U.S. dollars in 2014 to 37.3 billion U.S. dollars in 2023. Uber Technologies reported being profitable for the first time since 2018, posting a net profit of roughly 1.9 billion U.S. dollars during the fiscal year of 2023. Competition in the sharing economy Uber has been operating in a highly competitive environment since it introduced its first differentiated cab services. One of the major competitors of Uber Technologies is the San Francisco-based Lyft. Although Lyft is a latecomer into the ride-sharing business, Lyft progressively worked on weaknesses exhibited by Uber to strengthen its position against Uber and other competitors. Besides, Lyft is one of the major innovators in the sharing economy along with Uber Technologies. In 2022, Lyft Corporation invested nearly 556 million U.S. dollars into research and development globally, which has been scaled back in recent years. Lyft generated 4.4 billion U.S. dollars in global revenue during 2023.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘My Uber Drives’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/zusmani/uberdrives on 13 February 2022.
--- Dataset description provided by original source is as follows ---
My Uber Drives (2016)
Here are the details of my Uber Drives of 2016. I am sharing this dataset for data science community to learn from the behavior of an ordinary Uber customer.
Geography: USA, Sri Lanka and Pakistan
Time period: January - December 2016
Unit of analysis: Drives
Total Drives: 1,155
Total Miles: 12,204
Dataset: The dataset contains Start Date, End Date, Start Location, End Location, Miles Driven and Purpose of drive (Business, Personal, Meals, Errands, Meetings, Customer Support etc.)
Users are allowed to use, download, copy, distribute and cite the dataset for their pet projects and training. Please cite it as follows: “Zeeshan-ul-hassan Usmani, My Uber Drives Dataset, Kaggle Dataset Repository, March 23, 2017.”
Uber TLC FOIL Response - The dataset contains over 4.5 million Uber pickups in New York City from April to September 2014, and 14.3 million more Uber pickups from January to June 2015 https://github.com/fivethirtyeight/uber-tlc-foil-response
1.1 Billion Taxi Pickups from New York - http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/
What you can do with this data - a good example by Yao-Jen Kuo - https://yaojenkuo.github.io/uber.html
Some ideas worth exploring:
• What is the average length of the trip?
• Average number of rides per week or per month?
• Total tax savings based on traveled business miles?
• Percentage of business miles vs personal vs. Meals
• How much money can be saved by a typical customer using Uber, Careem, or Lyft versus regular cab service?
--- Original source retains full ownership of the source dataset ---
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
My Uber Drives (2016)
Here are the details of my Uber Drives of 2016. I am sharing this dataset for data science community to learn from the behavior of an ordinary Uber customer.
Geography: USA, Sri Lanka and Pakistan
Time period: January - December 2016
Unit of analysis: Drives
Total Drives: 1,155
Total Miles: 12,204
Dataset: The dataset contains Start Date, End Date, Start Location, End Location, Miles Driven and Purpose of drive (Business, Personal, Meals, Errands, Meetings, Customer Support etc.)
Users are allowed to use, download, copy, distribute and cite the dataset for their pet projects and training. Please cite it as follows: “Zeeshan-ul-hassan Usmani, My Uber Drives Dataset, Kaggle Dataset Repository, March 23, 2017.”
Uber TLC FOIL Response - The dataset contains over 4.5 million Uber pickups in New York City from April to September 2014, and 14.3 million more Uber pickups from January to June 2015
https://github.com/fivethirtyeight/uber-tlc-foil-response
1.1 Billion Taxi Pickups from New York -
http://toddwschneider.com/posts/analyzing-1-1-billion-nyc-taxi-and-uber-trips-with-a-vengeance/
What you can do with this data - a good example by Yao-Jen Kuo - https://yaojenkuo.github.io/uber.html
Some ideas worth exploring:
• What is the average length of the trip?
• Average number of rides per week or per month?
• Total tax savings based on traveled business miles?
• Percentage of business miles vs personal vs. Meals
• How much money can be saved by a typical customer using Uber, Careem, or Lyft versus regular cab service?
This dataset ends with 2022. Please see the Featured Content link below for the dataset that starts in 2023.
All trips, from November 2018 to December 2022, reported by Transportation Network Providers (sometimes called rideshare companies) to the City of Chicago as part of routine reporting required by ordinance.
Census Tracts are suppressed in some cases, and times are rounded to the nearest 15 minutes. Fares are rounded to the nearest $2.50 and tips are rounded to the nearest $1.00.
For a discussion of the approach to privacy in this dataset, please see https://data.cityofchicago.org/stories/s/82d7-i4i2.
This dataset contains lists of Restaurants and their menus in the USA that are partnered with Uber Eats. Data was collected via web scraping using python libraries.
*This dataset is dedicated to the awesome delivery drivers of Uber Eats, hence the cover image
kaggle API Command
!kaggle datasets download -d ahmedshahriarsakib/uber-eats-usa-restaurants-menus
The dataset has two CSV files -
restaurants.csv (40k+ entries, 11 columns)
$
= Inexpensive, $$
= Moderately expensive, $$$
= Expensive, $$$$
= Very Expensive) - Source - stackoverflowrestaurant-menus.csv (3.71M entries, 5 columns)
Data was scraped from - - https://www.ubereats.com - An online food ordering and delivery platform launched by Uber in 2014. Users can read menus, reviews, ratings, order, and pay for food from participating restaurants using an application on the iOS or Android platforms, or through a web browser. Users are also able to tip for delivery. Payment is charged to a card on file with Uber. Meals are delivered by couriers using cars, scooters, bikes, or foot. It is operational in over 6,000 cities across 45 countries.
The data and information in the data set provided here are intended to use for educational purposes only. I do not own any of the data and all rights are reserved to the respective owners.
This is a dataset of Uber rides provided in one of the Upgrad case study where the problem is the cancellation of rides by driver and non-availability of cabs for users travelling between airport and the city. Lets look for insights and try to find out what are issues that company need to resolve to avoid loss of business.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Uber Movement provides anonymized data from over two billion trips to help urban planning around the world.
Data retrieved from Uber Movement, (c) 2017 Uber Technologies, Inc.,https://movement.uber.com
Over the past six and a half years, Uber has learned a lot about the future of urban mobility and what it means for cities and the people who live in them. Uber has gotten consistent feedback from cities they partner with that access to their aggregated data will inform decisions about how to adapt existing infrastructure and invest in future solutions to make our cities more efficient. Uber hopes Uber Movement can play a role in helping cities grow in a way that works for everyone.
With the taxi sector booming exponentially in the country, the ride hailing industry has been the source of employment for a number of people across India. The market is dominated by two players, Uber and Ola. The number of employees in OlaCabs was over 500 thousand as of July 2016. This snowballing growth of the cab industry has been creating problems for local rickshaw and auto drivers with people opting to take a ride in an online taxi as opposed to an auto-rickshaw.
Battle of the Giants
Even after the arrival of the San-Francisco based Uber, it is the native company doing the heavy lifting in the market. Ola held the highest share of taxi apps installed across the country in 2017, whereas Uber suffered more de-installations in the same time frame.
A cab wherever you are
High penetration is presumably one of the major factors for the success of the native company. As opposed to its main competitor, OlaCabs had a reach of an additional 20 percent among smartphone users in tier 1 cities in 2017. The firm operates in more than 100 cities, twice more than its counterpart, leading to this development. Despite the differences in their services and revenue streams, both companies still seem to thrive for greater success with new developments in the now fast-moving economy of India. With the announcement of an outpost in Australia, the home-grown startup from India does not seem willing to stop at just one destination.
RESULTS EXPECTED:
**Visually identify the most pressing problems for Uber.
1. Hint: Create plots to visualise the frequency of requests that get cancelled or show 'no cars available'; identify the most problematic types of requests (city to airport / airport to city etc.) and the time slots (early mornings, late evenings etc.) using plots
1.a. Find out the gap between supply and demand and show the same using plots.
2. Find the time slots when the highest gap exists
2.a. Find the types of requests (city-airport or airport-city) for which the gap is the most severe in the identified time slots
3. What do you think is the reason for this issue for the supply-demand gap? Write the answer in less than 100 words. You may accompany the write-up with plot(s).
4. Recommend some ways to resolve the supply-demand gap.
These resources offer non-emergency transportation to medical appointments, grocery stores, social activities, etc., and may be government or private services. This list includes programs supported through District funding for older adults, as well as services that offer support inclusive to individuals with dementia. Some of these services are available to all older adults, while others are restricted to members of organizations, such as the DC Villages. This resource guide does not include private rideshare companies such as Lyft, Uber, or Go Go Grandparent.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains four datasets about the number of active users of selected mobile apps purchased from Selectivv company (https://selectivv.com/). Details regarding the data may be found below:
How data was collected: Selectivv uses programmatic advertisements systems that collect information on about 24 mln smartphone users in Poland
Apps:
Transportation: Uber, Bolt Driver, FREE NOW, iTaxi,
Delivery: Glover, Takeaway, Bolt Courier, Wolt;
Unit: an active user of a given app. Active = used given app at least 1 minute in a given period (e.g. 1 unit during whole month, half-year).
Period: 2018-2018; monthly and half-year data
Spatial aggregation: country level, city level, functional area level, voivodeship level. Functional area is defined as here https://stat.gov.pl/en/regional-statistics/regional-surveys/urban-audit/larger-urban-zones-luz/
Activity time: measured by activity time of given app (in hours; average and standard deviation)
Datasets:
gig-table1-monthly-counts-stats.csv -- the monthly number of active users;
gig-table2-halfyear-demo-stats.csv -- the half-year number of active users by socio-demographic variables;
gig-table3-halfyear-region-stats.csv -- the half-year number of active users by spatial aggregation;
gig-table4-halfyear-activity-stats.csv -- the half-year activity time by working week, weekend, day (8-18) and night (18-8).
Detailed description:
Structure:
month - YYYY-MM-DD -- we set all dates to 15th of given month but actually the data is about the whole month (active users in whole period); 2018-01-15 to 2021-12-15
app -- app name (Uber, Bolt Driver, FREE NOW, iTaxi, Glover, Takeaway, Bolt Courier, Wolt)
number_of_users -- the number of active users
category -- Transportation, Deliver
Structure:
gender -- men, women
age -- 18-30, 31-50, 51-64
country -- Poland, Ukraine, Other
period -- 2018.1, 2018.2, 2019.1, 2019.2, 2020.1, 2021.2
apps -- app name (Uber, Bolt Driver, FREE NOW, iTaxi, Glover, Takeaway, Bolt Courier, Wolt)
number_of_users -- the number of active users
students -- the share of students within a given row
parents_of_children_0_4_years -- the share of parents of 0-4 years children in a given row
parents_of_children_5_10_years -- the share of parents of 5-10 years children in a given row
women_planning_a_baby -- the share of women planing a baby in a given row
standard -- the share of standard smartphones in a given row
premium_i_phone -- the share of iPhone smartphones in a given row
other_premium -- the share of other premium smartphones in a given row
category -- Transportation, Delivery
Structure:
group -- Voivodeship, Functional Area, Cities
period -- 2018.1, 2018.2, 2019.1, 2019.2, 2020.1, 2021.2
region_name:
Cities -- Białystok, Bydgoszcz, Gdańsk, Gdynia, Gorzów Wielkopolski, Katowice, Kielce, Kraków, Łódź, Lublin, Olsztyn, Opole, Poznań, Rzeszów, Sopot, Szczecin, Toruń, Warszawa, Wrocław, Zielona Góra
Functional Area -- Functional area - Białystok, Functional area - Bydgoszcz, Functional area - Gorzów Wielkopolski, Functional area - GZM, Functional area - GZM2, Functional area - Kielce, Functional area - Kraków, Functional area - Łódź, Functional area - Lublin, Functional area - Olsztyn, Functional area - Opole, Functional area - Poznań, Functional area - Rzeszów, Functional area - Szczecin, Functional area - Toruń, Functional area - Trójmiasto, Functional area - Warszawa, Functional area - Wrocław, Functional area - Zielona Góra
Voivodeship -- dolnośląskie, kujawsko-pomorskie, łódzkie, lubelskie, lubuskie, małopolskie, mazowieckie, opolskie, podkarpackie, podlaskie, pomorskie, śląskie, świętokrzyskie, warmińsko-mazurskie, wielkopolskie, zachodniopomorskie
apps -- app name (Uber, Bolt Driver, FREE NOW, iTaxi, Glover, Takeaway, Bolt Courier, Wolt)
number_of_users -- the number of active users
category -- Transportation, Delivery
Please note that:
the number of active users in a given functional area = number of active users in a city and a functional area of this city
the number of active users in voivodeship = number of active users in a city, its functional area and the rest of the voivodeship where this city and functional area is located
More details here: https://stat.gov.pl/en/regional-statistics/regional-surveys/urban-audit/larger-urban-zones-luz/
Structure:
period -- 2018.1, 2018.2, 2019.1, 2019.2, 2020.1, 2021.2
apps -- app name (Uber, Bolt Driver, FREE NOW, iTaxi, Glover, Takeaway, Bolt Courier, Wolt)
day -- Mondays-Thursdays, Fridays-Sundays
hour -- day (8-18), night (18-8)
activity_time -- in hours
statistic -- Average, Std.Dev. (standard deviation)
category -- Transportation, Delivery
This is NOT a raw population dataset. We use our proprietary stack to combine detailed 'WorldPop' UN-adjusted, sex and age structured population data with a spatiotemporal OD matrix.
The result is a dataset where each record indicates how many people can be reached in a fixed timeframe (90 Mins in this case) from that record's location.
The dataset is broken down into sex and age bands at 5 year intervals, e.g - male 25-29 (m_25) and also contains a set of features detailing the representative percentage of the total that the count represents.
The dataset provides 76174 records, one for each sampled location. These are labelled with a h3 index at resolution 7 - this allows easy plotting and filtering in Kepler.gl / Deck.gl / Mapbox, or easy conversion to a centroid (lat/lng) or the representative geometry of the hexagonal cell for integration with your geospatial applications and analyses.
A h3 resolution of 7, is a hexagonal cell area equivalent to: - ~1.9928 sq miles - ~5.1613 sq km
Higher resolutions or alternate geographies are available on request.
More information on the h3 system is available here: https://eng.uber.com/h3/
WorldPop data provides for a population count using a grid of 1 arc second intervals and is available for every geography.
More information on the WorldPop data is available here: https://www.worldpop.org/
One of the main use cases historically has been in prospecting for site selection, comparative analysis and network validation by asset investors and logistics companies. The data structure makes it very simple to filter out areas which do not meet requirements such as: - being able to access 70% of the German population within 4 hours by Truck and show only the areas which do exhibit this characteristic.
Clients often combine different datasets either for different timeframes of interest, or to understand different populations, such as that of the unemployed, or those with particular qualifications within areas reachable as a commute.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dies sind die Daten zu überfahrenen Wirbeltieren aus dem Citizen Science Projekt Roadkill (https://roadkill.at) aus den Jahren 2014-2023, welche der Qualitätsstufe 2 entsprechen. Daten der Qualitätsstufe 1 wurden über GBIF veröffentlicht und sind hier zu finden: https://www.gbif.org/dataset/d0d5ef85-71b2-4da6-b6f6-c1c3d60987d3
Jeden zweiten Tag wurden die in das Projekt Roadkill eingegebenen Daten von Mitglieder*innen des Projektteams validiert, um falsche oder inkonsistente Einträge über das Backend der Website zu korrigieren oder, falls der Datensatz nicht korrigiert werden kann, den Datensatz zu löschen. Die Korrektur von Daten erfolgte (i) durch das Projektteam selbst, wenn Fehler offensichtlich war (z. B. Tier auf dem eingereichten Bild stimmt nicht mit der angeführten Artbestimmung überein) oder (ii) durch die Teilnehmer*innen selbst, nachdem sie vom Projektteam darauf hingewiesen wurden, dass eine Korrektur notwendig ist (z. B. wenn der Roadkill nicht an einer Straße liegt). Da die Teilnehmer*innen die Daten während ihrer täglichen Routine sammelten, handelt es sich bei den vorgelegten Daten um so genannte Präsenzdaten.
Um die Qualität der Daten zu sichern, haben wir ein schrittweises Auswahlverfahren angewandt, mit dem wir die eingereichten Daten am Ende dieses Prozesses in drei Qualitätsstufen einordnen konnten:
Qualitätsstufe 1: Datensätze mit korrekter Tierbestimmung (entweder durch Experten oder durch Bilder) und konsistenten Daten
Qualitätsstufe 2: Datensätze mit konsistenten Daten, aber keiner möglichen Validierung des Tieres
Gelöscht: Datensätze mit inkonsistenten Daten und keiner möglichen Validierung des Tieres
Wir danken allen Citizen Scientists, welche die Daten gemeldet und bei der Identifizierung der Tiere geholfen haben. Ohne die viele freiwillige Arbeit der Citizen Scientists wäre dieses Projekt nicht möglich.
These are the data on roadkill vertebrates from the citizen science project Roadkill collected in the years 2014-2023, which correspond to quality level 2. Quality level 1 data are published via GBIF and can be found here: https://www.gbif.org/dataset/d0d5ef85-71b2-4da6-b6f6-c1c3d60987d3.
Every other day, data entered into the Roadkill project were validated by members of the project team to correct incorrect or inconsistent entries via the backend of the website or, if the record cannot be corrected, to delete the record. Correction of data was done (i) by the project team itself when errors were obvious (e.g., animal in the submitted image does not match the species identification listed) or (ii) by the participants themselves after being informed by the project team that a correction is needed (e.g., if the roadkill is not on a road). Since the participants collected the data during their daily routine, the submitted data are so-called presence-only data.
To ensure the quality of the data, we used a stepwise selection process that allowed us to classify the submitted data into three quality levels at the end of this process:
Quality Level 1: Records with correct species identification (either by experts or by images) and consistent data.
Quality level 2: Records with consistent data, but no possible validation of the species
Deleted: records with inconsistent data and no possible validation of the species
We thank all the citizen scientists who reported the data and helped identify the species. Without the voluntary work of the citizen scientists this project would not be possible.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This research developed EV Explorer 2.0, an online vehicle cost calculator (VCC) to meet the requirements of transportation network company (TNC) drivers considering acquiring an electric vehicle (EV). The tool was built to also support the needs of other users considering an EV, including other types of gig economy drivers as well as the general population of non-professional drivers. EV Explorer 2.0 includes several important features and functionalities to support the TNC driver use case that are not found in any other available tool: (1) It allows users to estimate TCO for used vehicles as well as new (others only estimate TCO for new vehicles); (2) Outputs include ridehail-driving income estimates, accounting for EV trip bonuses offered by Uber, net driving costs; (3) Estimates of total cost of driving (TCD) include charging network membership fees and charging session fees (in addition to electricity prices). It also includes key features found in other leading tools, such as presenting and tailoring EV purchase/lease incentive estimates (based on a database we developed), and innovative features to benefit all users, such as animations conveying the social and environmental impacts of vehicle choice. Design features were informed and validated in user testing with TNC drivers who had expressed interest in EV adoption. Methods Federal incentives sourced from fueleconomy.gov State and local incentives sourced from AFDC.energy.gov Maintenance itemized costs sourced from afleet.es.anl.gov Lyft upgraded ride-eligible PEVs sourced from help.lyft.com/hc/en-us/all/articles/115012923147-Lyft-Lux-Lux-Black-and-Lux-Black-XL-rides-for-drivers#eligible Uber upgraded ride-eligible PEVs sourced from uber.com/global/en/eligible-vehicles/?city=san-francisco; NOTE: Uber provides vehicle eligibility by city, so we used those eligible in San Francisco, CA, as the reference.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Overview
BigQuery is Google's fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes and terabytes of data without having any infrastructure to manage, or needing a database administrator.
BigQuery Machine Learning BQML is where data analysts can create, train, evaluate, and predict with machine learning models with minimal coding.
In this you will explore millions of New York City yellow taxi cab trips available in a BigQuery Public Dataset. You will create a machine learning model inside of BigQuery to predict the fare of the cab ride given your model inputs and evaluate the performance of your model and make predictions with it.
perform the following tasks:
Query and explore the public taxi cab dataset. Create a training and evaluation dataset to be used for batch prediction. Create a forecasting (linear regression) model in BQML. Evaluate the performance of your machine learning model.
There are several model types to choose from:
Forecasting numeric values like next month's sales with Linear Regression (linear_reg). Binary or Multiclass Classification like spam or not spam email by using Logistic Regression (logistic_reg). k-Means Clustering for when you want unsupervised learning for exploration (kmeans).
Note: There are many additional model types used in Machine Learning (like Neural Networks and decision trees) and available using libraries like TensorFlow. At this time, BQML supports the three listed above. Follow the BQML roadmap for more information.
For reference sake of you we also released notebook which is available in this try to explore from that .use AutoMl foundational Models to automatically selecting important features from dataset and Model selection .
you can also go with spectral clustering algorithms upcourse it is not an unsupervised task but it is correlated ,visualize the Fare trip prices .so that cab drive easily identifies fare trips in their respective locations .
Build a Forecasting model which helps for cab drives like (uber,rapido) which reach their customers easily and short time
Dataset : ⏱️ 'trip_duration': How long did the journey last?[in Seconds] 🛣️ 'distance_traveled': How far did the taxi travel?[in Km] 🧑🤝🧑 'num_of_passengers': How many passengers were in the taxi? 💵 'fare': What's the base fare for the journey?[In INR] 💲 'tip': How much did the driver receive in tips?[In INR] 🎀 'miscellaneous_fees': Were there any additional charges during the trip?e.g. tolls, convenience fees, GST etc.[In INR] 💰 'total_fare': The grand total for the ride (this is your prediction target!).[In INR] ⚡ 'surge_applied': Was there a surge pricing applied? Yes or no?
IF IT IS USEFUL UPVOTE THE DATASET. THANK YOU!
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This directory contains data on over 4.5 million Uber pickups in New York City from April to September 2014, and 14.3 million more Uber pickups from January to June 2015. Trip-level data on 10 other for-hire vehicle (FHV) companies, as well as aggregated data for 329 FHV companies, is also included. All the files are as they were received on August 3, Sept. 15 and Sept. 22, 2015.
FiveThirtyEight obtained the data from the NYC Taxi & Limousine Commission (TLC) by submitting a Freedom of Information Law request on July 20, 2015. The TLC has sent us the data in batches as it continues to review trip data Uber and other HFV companies have submitted to it. The TLC's correspondence with FiveThirtyEight is included in the files TLC_letter.pdf
, TLC_letter2.pdf
and TLC_letter3.pdf
. TLC records requests can be made here.
This data was used for four FiveThirtyEight stories: Uber Is Serving New York’s Outer Boroughs More Than Taxis Are, Public Transit Should Be Uber’s New Best Friend, Uber Is Taking Millions Of Manhattan Rides Away From Taxis, and Is Uber Making NYC Rush-Hour Traffic Worse?.
The dataset contains, roughly, four groups of files:
There are six files of raw data on Uber pickups in New York City from April to September 2014. The files are separated by month and each has the following columns:
Date/Time
: The date and time of the Uber pickupLat
: The latitude of the Uber pickupLon
: The longitude of the Uber pickupBase
: The TLC base company code affiliated with the Uber pickupThese files are named:
uber-raw-data-apr14.csv
uber-raw-data-aug14.csv
uber-raw-data-jul14.csv
uber-raw-data-jun14.csv
uber-raw-data-may14.csv
uber-raw-data-sep14.csv
Also included is the file uber-raw-data-janjune-15.csv
This file has the following columns:
Dispatching_base_num
: The TLC base company code of the base that dispatched the UberPickup_date
: The date and time of the Uber pickupAffiliated_base_num
: The TLC base company code affiliated with the Uber pickuplocationID
: The pickup location ID affiliated with the Uber pickupThe Base
codes are for the following Uber bases:
B02512 : Unter B02598 : Hinter B02617 : Weiter B02682 : Schmecken B02764 : Danach-NY B02765 : Grun B02835 : Dreist B02836 : Drinnen
For coarse-grained location information from these pickups, the file taxi-zone-lookup.csv
shows the taxi Zone
(essentially, neighborhood) and Borough
for each locationID
.
The dataset also contains 10 files of raw data on pickups from 10 for-hire vehicle (FHV) companies. The trip information varies by company, but can include day of trip, time of trip, pickup location, driver's for-hire license number, and vehicle's for-hire license number.
These files are named:
American_B01362.csv
Diplo_B01196.csv
Highclass_B01717.csv
Skyline_B00111.csv
Carmel_B00256.csv
Federal_02216.csv
Lyft_B02510.csv
Dial7_B00887.csv
Firstclass_B01536.csv
Prestige_B01338.csv
There is also a file other-FHV-data-jan-aug-2015.csv
containing daily pickup data for 329 FHV companies from January 2015 through August 2015.
The file Uber-Jan-Feb-FOIL.csv
contains aggregated daily Uber trip statistics in January and February 2015.