Facebook
TwitterThis dataset was created by Vishnu Murali
Facebook
TwitterYou have access to two datasets: one exclusively containing car ratings, and the other containing detailed car features. These datasets provide an opportunity to work with real data, enabling you to practice various data analytics techniques such as data visualization, regression analysis for predicting prices, and classification tasks such as brand classification.
Facebook
TwitterThis dataset contains the file of vehicle, snowmobile and boat registrations in NYS. Registrations expired more than 2 years are excluded. Records that have a scofflaw, revocation and/or suspension are included with indicators specifying those kinds of records.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In Newyork City, all taxi vehicles are managed by TLC (Taxi and Limousine Commission), here is a brief description about TLC:
The New York City Taxi and Limousine Commission (TLC), created in 1971, is the agency responsible for licensing and regulating New York City's Medallion (Yellow) taxi cabs, for-hire vehicles (community-based liveries, black cars and luxury limousines), commuter vans, and paratransit vehicles. The Commission's Board consists of nine members, eight of whom are unsalaried Commissioners. The salaried Chair/ Commissioner presides over regularly scheduled public commission meetings and is the head of the agency, which maintains a staff of approximately 600 TLC employees. Over 200,000 TLC licensees complete approximately 1,000,000 trips each day. To operate for hire, drivers must first undergo a background check, have a safe driving record, and complete 24 hours of driver training. TLC-licensed vehicles are inspected for safety and emissions at TLC's Woodside Inspection Facility.
Now NYC TLC has released its Trip Record data to public for research and study purposes. There are three main taxi types in NYC: Yellow taxis are traditionally hailed by signaling to a driver who is on duty and seeking a passenger (street hail), but now they may also be hailed using an e-hail app like Curb or Arro. Yellow taxis are the only vehicles permitted to respond to a street hail from a passenger in all five boroughs. Green taxis, also known as boro taxis and street-hail liveries, were introduced in August of 2013 to improve taxi service and availability in the boroughs. Green taxis may respond to street hails, but only in the areas indicated in green on the map (i.e. above W 110 St/E 96th St in Manhattan and in the boroughs). FHV data includes trip data from high-volume for-hire vehicle bases (bases for companies dispatching 10,000+ trip per day, meaning Uber, Lyft, Via, and Juno), community livery bases, luxury limousine bases, and black car bases. Uber as one of the biggest ride-hailing services providers, its trip records are collected in High Volume For-Hire Vehicle Trip Records as well.
Based on this dataset, there are some business goals we want to achieve to improve Uber's ride-hailing service: Exploratory data analysis, research data fhvhv_tripdata_2021 and figure out underlying trip patterns in 2021. Based on fhvhv_tripdata_2021 and weather data, build predict model to predict the peak footfall. Try explore Uber's user portrait in NYC (which orders are urgent and what kind of users should be given higher priorities?)
Some useful tips about this dataset:
- The trip data of the for-hire vehicles named like fhvhv_tripdata_2021-0X.parquet
- Columns' description of the trip data please refer to data_dictionary_trip_records_hvfhs.pdf.
- taxi_zones folder contains the geospatial data of NYC taxi zones (geopandas would be helpful).
- taxi_zone_lookup.csv stores taxi zones zip code and other relevant information.
- nyc 2021-01-01 to 2021-12-31.csv record the weather data of year 2021,taxi+_zone_lookup.csv stored the zone information of all taxi, data file end with .parquet could be processed by pyarrow package and convert to Pandas DataFrame.
If you find this dataset helpful, please up-vote and more high-quality datasets will be published in future!❤️
Facebook
TwitterThis is the EVENT data captured from the New York City CV Pilot project that was processed by the independent evaluators at Volpe. Additional data collected and data dictionary are in the attachments. Each EVENT record documents the details of one application warning that occurred on an Aftermarket Safety Device (ASD) in an equipped host vehicle and includes CV messages from a defined recording time both before and after the warning was generated by the host ASD. Messages in the recording time window include the Basic Safety Messages (BSM) of the host vehicle that received the warning, as well as other BSMs received from the warning target equipped vehicle (for V2V applications) or other nearby equipped vehicles. Depending on the application warning type, MAP messages, Signal Phase and Timing (SPaT) messages, and Traveler Information Messages (TIM) that were heard by the host vehicle may also be included in the event record.
Facebook
TwitterThis dataset contains the file of vehicle, snowmobile and boat registrations in NYS. Registrations expired more than 2 years are excluded. Records that have a scofflaw, revocation and/or suspension are included with indicators specifying those kinds of records.
Facebook
TwitterPLEASE NOTE: This dataset, which includes all TLC licensed for-hire vehicles which are in good standing and able to drive, is updated every day in the evening between 4-7pm. Please check the 'Last Update Date' field to make sure the list has updated successfully. 'Last Update Date' should show either today or yesterday's date, depending on the time of day. If the list is outdated, please download the most recent list from the link below. http://www1.nyc.gov/assets/tlc/downloads/datasets/tlc_for_hire_vehicle_active_and_inactive.csv
TLC authorized For-Hire vehicles that are active. This list is accurate to the date and time represented in the Last Date Updated and Last Time Updated fields. For inquiries about the contents of this dataset, please email licensinginquiries@tlc.nyc.gov.
Facebook
TwitterThis dataset represents the total number of Automobiles Imported and Exported annually through maritime terminals located within Port Authority property in the Port of New York and New Jersey in vehicle units beginning 2000
Facebook
TwitterTLC authorized For-Hire vehicles that are active or inactive. This list is accurate to the date and time represented in the Last Date Updated and Last Time Updated fields. For inquiries about the contents of this dataset, please email licensinginquiries@tlc.nyc.gov.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Contains real-time traffic information from locations where NYCDOT picks up sensor feeds within the five NYC boroughs between 2018 and 2022, mostly on major arterials and highways.
Facebook
TwitterThis dataset contains a one-month sample of flattened EVENT data records from the New York City (NYC) Connected Vehicle (CV) Pilot that have undergone obfuscation of precise time and location details as well as other vehicle identifiers. The full unflattened event data from NYC CV pilot can be requested from the ITS DataHub Sandbox. Each EVENT record documents the details of one application warning that occurred on an Aftermarket Safety Device (ASD) in an equipped host vehicle and includes CV messages from a defined recording time both before and after the warning was generated by the host ASD. Messages in the recording time window include the Basic Safety Messages (BSM) of the host vehicle that received the warning, as well as other BSMs received from the warning target equipped vehicle (for V2V applications) or other nearby equipped vehicles. Depending on the application warning type, MAP messages, Signal Phase and Timing (SPaT) messages, and Traveler Information Messages (TIM) that were heard by the host vehicle may also be included in the event record.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The New York City Taxi and Limousine Commission (TLC) oversees the licensing and regulation of taxi cabs and for-hire vehicles in the city. The TLC gathers data from over 200,000 license holders, including taxi drivers and limousine operators, who collectively complete around one million trips each day.
Note: The dataset used for this project was designed for educational purposes and may not accurately represent the behavior of taxi cab riders in New York City.
| Column name | Description |
|---|---|
| ID | Trip identification number |
| VendorID | A code indicating the TPEP provider that provided the record. 1= Creative Mobile Technologies, LLC; 2= VeriFone Inc. |
| tpep_pickup_datetime | The date and time when the meter was engaged |
| tpep_dropoff_datetime | The date and time when the meter was disengaged |
| Passenger_count | The number of passengers in the vehicle. This is a driver-entered value |
| Trip_distance | The elapsed trip distance in miles reported by the taximeter |
| RateCodeID | The final rate code in effect at the end of the trip. 1= Standard rate 2=JFK 3=Newark 4=Nassau or Westchester 5=Negotiated fare 6=Group ride |
| Store_and_fwd_flag | This flag indicates whether the trip record was held in vehicle memory before being sent to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server. Y= store and forward trip N= not a store and forward trip |
| PULocationID | TLC Taxi Zone in which the taximeter was engaged |
| DOLocationID | TLC Taxi Zone in which the taximeter was disengaged |
| Payment_type | A numeric code signifying how the passenger paid for the trip. 1= Credit card 2= Cash 3= No charge 4= Dispute 5= Unknown 6= Voided trip |
| Fare_amount | The time-and-distance fare calculated by the meter |
| Extra | Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges |
| MTA_tax | $0.50 MTA tax that is automatically triggered based on the metered rate in use |
| Tip_amount | Tip amount – This field is automatically populated for credit card tips. Cash tips are not included |
| Tolls_amount | Total amount of all tolls paid in trip |
| Improvement_surcharge | $0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015 |
| Total_amount | The total amount charged to passengers. Does not include cash tips |
Facebook
TwitterThis dataset contains the file of vehicle, snowmobile and boat registrations in NYS. Expired registrations are excluded. Records that have a scofflaw, revocation and/or suspension are included with indicators specifying this.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset represents the total number of Automobiles Imported and Exported annually through maritime terminals located within Port Authority property in the Port of New York and New Jersey in vehicle units beginning 2000
This is a dataset hosted by the State of New York. The state has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York State using Kaggle and all of the data sources available through the State of New York organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
Cover photo by chuttersnap on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
More details about each file are in the individual file descriptions.
This is a dataset hosted by the City of New York. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York City using Kaggle and all of the data sources available through the City of New York organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
This dataset is distributed under the following licenses: Public Domain
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Uber Ride Dataset for New York City contains detailed information about every Uber ride in the city, including the TLC Base License Number of the base that dispatched the trip, the date and time of the trip pick-up and drop-off, the TLC Taxi Zone in which the trip began and ended, and whether the trip was a part of a shared ride chain offered by a High Volume FHV company (e.g. Uber Pool, Lyft Line).
For shared trips, the SR_Flag field indicates a value of 1, while for non-shared rides, this field is null. However, it is important to note that for most High Volume FHV companies, only shared rides that were requested and matched to another shared-ride request over the course of the journey are flagged.
For Lyft (base license numbers B02510 + B02844), trip records with SR_Flag=1 could indicate EITHER a first trip in a shared trip chain OR a trip for which a shared ride was requested but never matched. As a result, there may be an overcount of successfully shared trips completed by Lyft.
This comprehensive dataset can be used for a variety of research and analysis purposes, including understanding the popularity and effectiveness of shared ride services in New York City, analyzing trip patterns by TLC Taxi Zone, and evaluating the performance of different HVFHS bases in the city.
The datasets are broken down by month and formatted in parquet. To use the parquet formatted files in pandas, there is an example in my notebook in the code section. If you need more details, look at the pdfs in the datasets. The data is originally from https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset captures the park name, address, and county in which a maufactured home park is located; the number of site capacity and number of occupied sites; and the name and contact number for the park owner/operator. New York State Homes and Community Renewal’s (HCR) Division of Housing and Community Renewal (DHCR) oversees the registration of these parks in accordance with NYS Real Property Law Section 233 sub-section (v.) which requires owners of manufactured home parks with three or more homes register their park with DHCR.
This is a dataset hosted by the State of New York. The state has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York State using Kaggle and all of the data sources available through the State of New York organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
Cover photo by Willian Justen de Vasconcellos on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Facebook
TwitterThe Motor Vehicle Collisions vehicle table contains details on each vehicle involved in the crash. Each row represents a motor vehicle involved in a crash. The data in this table goes back to April 2016 when crash reporting switched to an electronic system. The Motor Vehicle Collisions data tables contain information from all police reported motor vehicle collisions in NYC. The police report (MV104-AN) is required to be filled out for collisions where someone is injured or killed, or where there is at least $1000 worth of damage (https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/documents/ny_overlay_mv-104an_rev05_2004.pdf). It should be noted that the data is preliminary and subject to change when the MV-104AN forms are amended based on revised crash details. Due to success of the CompStat program, NYPD began to ask how to apply the CompStat principles to other problems. Other than homicides, the fatal incidents with which police have the most contact with the public are fatal traffic collisions. Therefore in April 1998, the Department implemented TrafficStat, which uses the CompStat model to work towards improving traffic safety. Police officers complete form MV-104AN for all vehicle collisions. The MV-104AN is a New York State form that has all of the details of a traffic collision. Before implementing Trafficstat, there was no uniform traffic safety data collection procedure for all of the NYPD precincts. Therefore, the Police Department implemented the Traffic Accident Management System (TAMS) in July 1999 in order to collect traffic data in a uniform method across the City. TAMS required the precincts manually enter a few selected MV-104AN fields to collect very basic intersection traffic crash statistics which included the number of accidents, injuries and fatalities. As the years progressed, there grew a need for additional traffic data so that more detailed analyses could be conducted. The Citywide traffic safety initiative, Vision Zero started in the year 2014. Vision Zero further emphasized the need for the collection of more traffic data in order to work towards the Vision Zero goal, which is to eliminate traffic fatalities. Therefore, the Department in March 2016 replaced the TAMS with the new Finest Online Records Management System (FORMS). FORMS enables the police officers to electronically, using a Department cellphone or computer, enter all of the MV-104AN data fields and stores all of the MV-104AN data fields in the Department’s crime data warehouse. Since all of the MV-104AN data fields are now stored for each traffic collision, detailed traffic safety analyses can be conducted as applicable.
Facebook
TwitterThis dataset provides volume of loaded freight rail cars transported between NY and NJ by New York New Jersey Rail, a carfloat operation owned by the Port Authority of NY & NJ. Total volume, including empty cars, is estimated at twice the volume of loaded. 2013 entry is year-to-date volume at this time
Facebook
TwitterThe Motor Vehicle Collisions crash table contains details on the crash event. Each row represents a crash event. The Motor Vehicle Collisions data tables contain information from all police reported motor vehicle collisions in NYC. The police report (MV104-AN) is required to be filled out for collisions where someone is injured or killed, or where there is at least $1000 worth of damage (https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/documents/ny_overlay_mv-104an_rev05_2004.pdf). It should be noted that the data is preliminary and subject to change when the MV-104AN forms are amended based on revised crash details.For the most accurate, up to date statistics on traffic fatalities, please refer to the NYPD Motor Vehicle Collisions page (updated weekly) or Vision Zero View (updated monthly).
Due to success of the CompStat program, NYPD began to ask how to apply the CompStat principles to other problems. Other than homicides, the fatal incidents with which police have the most contact with the public are fatal traffic collisions. Therefore in April 1998, the Department implemented TrafficStat, which uses the CompStat model to work towards improving traffic safety. Police officers complete form MV-104AN for all vehicle collisions. The MV-104AN is a New York State form that has all of the details of a traffic collision. Before implementing Trafficstat, there was no uniform traffic safety data collection procedure for all of the NYPD precincts. Therefore, the Police Department implemented the Traffic Accident Management System (TAMS) in July 1999 in order to collect traffic data in a uniform method across the City. TAMS required the precincts manually enter a few selected MV-104AN fields to collect very basic intersection traffic crash statistics which included the number of accidents, injuries and fatalities. As the years progressed, there grew a need for additional traffic data so that more detailed analyses could be conducted. The Citywide traffic safety initiative, Vision Zero started in the year 2014. Vision Zero further emphasized the need for the collection of more traffic data in order to work towards the Vision Zero goal, which is to eliminate traffic fatalities. Therefore, the Department in March 2016 replaced the TAMS with the new Finest Online Records Management System (FORMS). FORMS enables the police officers to electronically, using a Department cellphone or computer, enter all of the MV-104AN data fields and stores all of the MV-104AN data fields in the Department’s crime data warehouse. Since all of the MV-104AN data fields are now stored for each traffic collision, detailed traffic safety analyses can be conducted as applicable.
Facebook
TwitterThis dataset was created by Vishnu Murali