This dataset was created by neelaydoshi7
This dataset was created by sodagar rajiv
This dataset was created by Khabibullokhon Bakhshilloev
Released under Data files © Original Authors
This dataset was created by Master Sniffer
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Shopping Cart Database’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ruchi798/shopping-cart-database on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This dataset contains synthetic data generated by me for one of my courses at Carnegie Mellon University.
Several deductions and analyses can be drawn from this data, including: - Which products were sold the most in the last month? - How have sales and revenue changed over the past few quarters? - Understanding Customer demographics and their preferences
--- Original source retains full ownership of the source dataset ---
The GigaFlexhicle dataset contains many cut-out photos of brands and models of vehicles (cars/trucks/buses), divided by generations. All models of vehicles produced in the same generation belong to the same class, regardless of the body, restyling or special edition. For example, Opel Astra J hatchback, station wagon, GTC are one class, regardless of the change in appearance during restyling. Opel Astra J and Opel Astra H are different classes.
Dataset structure: The images directory contains all vehicle brands. Inside folders with brands are folders with models of this brand. Inside each folder with the model there are possible generations of this model (usually they are listed as 1st, 2nd, etc., however, there may also be names of generations. Further deepening into the directory is considered as one generation). The annotation.csv file contains the class numbers and the relative path to each image. The class was assigned only if it contains 3 or more pictures. The number of classes is 9548. Directories with 2 or less pictures were left for further expansion and search for similar pictures.
For example, I need to take a look at all the photos of the BMW 2-series of the 2nd generation. This folder will contain such 2nd generation brands as 2-series, 2-series Active Tourer and 2-series Gran Tourer.
This dataset is not the final option and its structure will be improved, and it will be supplemented by itself. The dataset may include motorcycles, but their presence is not assumed.
The data used in this dataset has been copied from the website PlatesMania.com
This dataset was created by DataWizard20
It contains the following files:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The OTTO
session dataset is a large-scale dataset intended for multi-objective recommendation research. We collected the data from anonymized behavior logs of the OTTO webshop and the app. The mission of this dataset is to serve as a benchmark for session-based recommendations and foster research in the multi-objective and session-based recommender systems area. We also launched a Kaggle competition with the goal to predict clicks, cart additions, and orders based on previous events in a user session.
For additional background, please see the published OTTO Recommender Systems Dataset GitHub.
clicks
, carts
and orders
.jsonl
formatDataset | #sessions | #items | #events | #clicks | #carts | #orders | Density [%] |
---|---|---|---|---|---|---|---|
Train | 12.899.779 | 1.855.603 | 216.716.096 | 194.720.954 | 16.896.191 | 5.098.951 | 0.0005 |
Test | 1.671.803 | 1.019.357 | 13.851.293 | 12.340.303 | 1.155.698 | 355.292 | 0.0005 |
Since we want to evaluate a model's performance in the future, as would be the case when we deploy such a system in an actual webshop, we choose a time-based validation split. Our train set consists of observations from 4 weeks, while the test set contains user sessions from the following week. Furthermore, we trimmed train sessions overlapping with the test period, as depicted in the following diagram, to prevent information leakage from the future:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4621388%2F94cead9aec2ef687490b1212e40f409a%2Ftrain_test_split.png?generation=1676645044801713&alt=media" alt="Train/Test Split">
This dataset was created by Faisal Jamil
Released under Other (specified in description)
It contains the following files:
The Donner Party were a group of emigrants moving to start a new life in California. But between 1846 and 1847, 45 out of the 87 people on the wagon train would die from sickness, starvation, murder, and cannibalism.
The dataset has information gathered from reading Daniel James Brown, The Indifferent Stars Above, and overall reading online
James Brown, Daniel. The Indifferent Stars Above. New York: HarperCollins Publishers, 2015.
A question that I had while researching the Donner Party was what factors influenced their chances of survival(Age, Sex, etc). So this was made in an attempt to answer that question
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Cart Vista
Released under CC0: Public Domain
This Data set including all the Variables (choice, college,hsg2, coml5, typez, fuelz, pricez, speedz, pollutionz, sizez) from 2016 to 2018.I scrapped this data from www.qed.econ.queensu.ca
Attribute Information:
McFadden, Daniel and Kenneth Train (2000) “Mixed MNL models for discrete response”, Journal of Applied Econometrics, 15(5), 447–470.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
This dataset was created by Satya Mishra
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset containing vehicles sold in the US market of 2024-2025 year. Compares horsepower, torque, weight, and ratios of all makes and models sold in the US market of 2024-2025.
Data is taken from manufacturer website and Car & Driver where applicable.
I only compared data with vehicles designed, marketed, and sold as sedans or lower. Wagons were included where applicable. The Mercedes E-class wagon was excluded due to lack of data found. Data excludes vehicles sold and marketed as CUV and above (CUVs, SUVs, Trucks, Vans, etc.)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains traffic violation information from electronic traffic violations issued in the US. Any information that can be used to uniquely identify the vehicle, the vehicle owner or the officer issuing the violation is not included. Some features were removed from the original dataset and all remaining character features were recoded as nominal factor variables. All punctuation characters were removed from factor levels. The variable 'Violation.Type' is used as target by default. The smaller target categories 'SERO' and 'ESERO' were collapsed into one category labeled 'SERO'. Unused factor levels and a few almost constant features were dropped.
- Description: Text description of the specific charge
- Belts: If seat belts were in use in accident cases or not?
- Personal Injury: If traffic violation involved Personal Injury or not?
- Property Damage: If traffic violation involved Property Damage or not?
- Commercial License: If the driver holds a Commercial Drivers License or not?
- Commercial Vehicle: If the vehicle committing the traffic violation is a commercial vehicle or not?
- State: State issuing the vehicle registration
- VehicleType: Type of vehicle (Examples: Automobile, Station Wagon, Heavy Duty Truck, etc.)
- Year: Year the vehicle was made
- Make: Manufacturer of the vehicle (Examples: Ford, Chevy, Honda, Toyota, etc.)
- Model: Model of the vehicle
- Color: Color of the vehicle
- Charge: Alphanumeric code for the specific charge
- Contributed To Accident: If the traffic violation was a contributing factor in an accident or not?
- Race: Race of the driver (Example: Asian, Black, White, Other, etc.)
- Gender: Gender of the driver (F = Female, M = Male)
- Driver City: City of the driver’s home address
- Driver State: State of the driver’s home address
- DL State: State issuing the Driver’s License
- Arrest Type: Type of Arrest (A = Marked, B = Unmarked, etc.)
- Violation Type: Type of Violation (Examples: Warning, Citation, SERO)
Please, provide an upvote👍if the dataset was useful for your task. It would be much appreciated😄
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Gender Recognition by Voice and Speech Analysis
This database was created to identify a voice as male or female, based upon acoustic properties of the voice and speech. The dataset consists of 3,168 recorded voice samples, collected from male and female speakers. The voice samples are pre-processed by acoustic analysis in R using the seewave and tuneR packages, with an analyzed frequency range of 0hz-280hz (human vocal range).
The following acoustic properties of each voice are measured and included within the CSV:
50% / 50%
97% / 98%
96% / 97%
100% / 98%
100% / 99%
100% / 99%
An original analysis of the data-set can be found in the following article:
Identifying the Gender of a Voice using Machine Learning
The best model achieves 99% accuracy on the test set. According to a CART model, it appears that looking at the mean fundamental frequency might be enough to accurately classify a voice. However, some male voices use a higher frequency, even though their resonance differs from female voices, and may be incorrectly classified as female. To the human ear, there is apparently more than simple frequency, that determines a voice's gender.
http://i.imgur.com/Npr2U7O.png" alt="CART model">
Mean fundamental frequency appears to be an indicator of voice gender, with a threshold of 140hz separating male from female classifications.
The Harvard-Haskins Database of Regularly-Timed Speech
Telecommunications & Signal Processing Laboratory (TSP) Speech Database at McGill University, Home
Festvox CMU_ARCTIC Speech Database at Carnegie Mellon University
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset provides comprehensive details on used car listings, including vehicle specifications, features, pricing, and more. It's valuable for analyzing car prices, trends, and customer preferences in the automotive market.
This dataset is ideal for machine learning, data analysis, and business intelligence applications in the automotive industry.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Dataset is about cars from back in 85. It's raw and messy.
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics, (b) its assigned insurance risk rating, (c) its normalized losses in use as compared to other cars. The second rating corresponds to the degree to which the auto is more risky than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is more risky (or less), this symbol is adjusted by moving it up (or down) the scale. Actuarians call this process "symboling". A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe. The third factor is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports/specialty, etc...), and represents the average loss per car per year.
Note: Several of the attributes in the database could be used as a "class" attribute.
Number of Instances: 205
Number of Attributes: 26 total
-- 15 continuous
-- 1 integer
-- 10 nominal
| Attribute | Attribute Range
"Automobile Data Set" from the following link: https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data.
I used these data for data cleaning/analysis purposes.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset was created by neelaydoshi7