43 datasets found

Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Accuracy and AUC value of ML algorithms using three hyper parameter tuning...
plos.figshare.com
xls
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie (2025). Accuracy and AUC value of ML algorithms using three hyper parameter tuning techniques. [Dataset]. http://doi.org/10.1371/journal.pone.0316452.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316452.t003
Dataset updated
Jan 24, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accuracy and AUC value of ML algorithms using three hyper parameter tuning techniques.
market_basket_optimization
kaggle.com
zip
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rupak Roy/ Bob (2023). market_basket_optimization [Dataset]. https://www.kaggle.com/rupakroy/market-basket-optimization
Explore at:
zip(47991 bytes)Available download formats
Dataset updated
Feb 11, 2023
Authors
Rupak Roy/ Bob
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset is specially curated for Association Rule Learning using **Apriori and Eclat **using Python to predict Shopping Behavior.

Apriori is one of the powerful algorithms to understand association among the products. Take an example of a supermarket where most of the person buys egg also buys milk and also baking soda. Probably the reason is they want to bake a cake for new year's eve.

So we can see there is an association between eggs, milk as well as baking soda. Now after knowing such association we simply put all the 3 things together in the shelf and that definitely will increase our sales.

Let’s perform Apriori with the help of an example.
LLM-SE Python Wheel
kaggle.com
zip
Updated Oct 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ranchantan (2023). LLM-SE Python Wheel [Dataset]. https://www.kaggle.com/ranchantan/llm-se-python-wheel
Explore at:
zip(146397 bytes)Available download formats
Dataset updated
Oct 7, 2023
Authors
Ranchantan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Ranchantan

Released under CC0: Public Domain

Contents
Socio-demographic characteristics among adolescent girls in Ethiopia, 2016...
figshare.com
datasetcatalog.nlm.nih.gov
+1more
xls
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie (2025). Socio-demographic characteristics among adolescent girls in Ethiopia, 2016 EDHS. [Dataset]. http://doi.org/10.1371/journal.pone.0316452.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316452.t001
Dataset updated
Jan 24, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ethiopia
Description
Socio-demographic characteristics among adolescent girls in Ethiopia, 2016 EDHS.
Dataset for "Neural embedding of beliefs reveals the role of relative...
figshare.com
zip
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Byunghwee Lee (2025). Dataset for "Neural embedding of beliefs reveals the role of relative dissonance in human decision-making". [Dataset]. http://doi.org/10.6084/m9.figshare.28327019.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28327019.v3
Dataset updated
Feb 6, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Byunghwee Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This project contains the dataset used to generate the results of the study "Neural embedding of beliefs reveals the role of relative dissonance in human decision-making" (arXiv:2408.07237).Authors: Byunghwee Lee1, Rachith Aiyappa1, Yong-Yeol Ahn1, Haewoon Kwak1, Jisun An11 Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, USA, 47408DDO_dataset.zip (original Debate.org dataset)This archive contains the original raw Debate.org dataset, which was obtained from the publicly accessible website (https://esdurmus.github.io/ddo.html), maintained by Esin Durmus [1,2]. All credit for this dataset belongs entirely to the original authors, Esin Durmus and Claire Cardie. We do not claim any authorship or modifications to this dataset. It is provided here solely for reproducibility and reference in our study.The dataset includes the following three files:- debates.json: This JSON file contains a Python dictionary that assigns a debate name --- a unique name for each debate --- to debate information- users.json: This JSON file includes a Python dictionary containing user information- readme.md file from the authors (Esin Durmus and Claire Cardie)When using this dataset, please reference Debate.org and cite the following works:[1] Esin Durmus and Claire Cardie. 2019. A Corpus for Modeling User and Language Effects in Argumentation on Online Debating. In Proceedings of the 57th Conference of the Association for Computational Linguistics. Florence, Italy. Association for Computational Linguistics.[2] Esin Durmus and Claire Cardie. 2018. Exploring the Role of Prior Beliefs for Argument Persuasion. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL).df_ddo_including_only_truebeliefs_nodup(N192307).pThis file contains a pre-processed dataset used in our project (arXiv:2408.07237). The dataset includes records of user participation in debates (both as debaters and voters) as well as voting records across various debates. The belief triplet dataset used for fine-tuning a Sentence-BERT model was generated based on this pre-processed dataset. Detailed explanations of the pre-processing procedure are provided in the Methods section of the paper.When using this pre-processed dataset, please cite the following reference (in addition to the two papers mentioned above):[3] Lee, B., Aiyappa, R., Ahn, Y. Y., Kwak, H., & An, J. (2024). Neural embedding of beliefs reveals the role of relative dissonance in human decision-making. arXiv preprint arXiv:2408.07237.model_full_data.zipThis zip file contains five fine-tuned S-BERT models trained using a 5-fold belief triplet dataset. After unzipping the files, users can import the models using the 'sentence_transformers' Python library (https://sbert.net/).
u
Association analysis of high-high cluster road intersection crashes within...
zivahub.uct.ac.za
xlsx
Updated Jun 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Vieira; Simon Hull; Roger Behrens (2024). Association analysis of high-high cluster road intersection crashes within the CoCT in 2017, 2018, 2019 and 2021 [Dataset]. http://doi.org/10.25375/uct.25975285.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.25375/uct.25975285.v2
Dataset updated
Jun 7, 2024
Dataset provided by
University of Cape Town
Authors
Simone Vieira; Simon Hull; Roger Behrens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
City of Cape Town
Description
This dataset provides comprehensive information on road intersection crashes recognised as "high-high" clusters within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in at least 5% of the total "high-high" cluster road intersection crashes for the years 2017, 2018, 2019, and 2021. The dataset is meticulously organised according to support metric values, ranging from 0,05 to 0,0235, with entries presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 499 KBNumber of Files: The dataset contains a total of 7186 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-high" cluster fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes that occurred within the "high-high" cluster fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,05 support metric value. Consequently, commonly occurring crash attributes among at least 5% of the "high-high" cluster road intersection crashes were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2021 (2020 data omitted)
Z
DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning
data.niaid.nih.gov
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olsen, Alex; Konovalov, Dimitriv A.; Philippa, Bronson; Ridd, Peter; Wood, Jake C.; Johns, Jamie; Banks, Wesley; Girgenti, Benjamin; Kenny, Owen; Whinney, James; Calvert, Brendan; Rahimi Azghadi, Mostafa; White, Ronald D. (2023). DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7939059
Explore at:
Dataset updated
May 16, 2023
Authors
Olsen, Alex; Konovalov, Dimitriv A.; Philippa, Bronson; Ridd, Peter; Wood, Jake C.; Johns, Jamie; Banks, Wesley; Girgenti, Benjamin; Kenny, Owen; Whinney, James; Calvert, Brendan; Rahimi Azghadi, Mostafa; White, Ronald D.
License
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description
DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning

This repository makes available the source code and public dataset for the work, "DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning", published with open access by Scientific Reports: https://www.nature.com/articles/s41598-018-38343-3. The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighbouring flora. In our work, the dataset was classified to an average accuracy of 95.7% with the ResNet50 deep convolutional neural network.

The source code, images and annotations are licensed under CC BY 4.0 license. The contents of this repository are released under an Apache 2 license.

Download the dataset images and our trained models

images.zip (468 MB)

models.zip (477 MB)

Due to the size of the images and models they are hosted outside of the Github repository. The images and models must be downloaded into directories named "images" and "models", respectively, at the root of the repository. If you execute the python script (deepweeds.py), as instructed below, this step will be performed for you automatically.

TensorFlow Datasets

Alternatively, you can access the DeepWeeds dataset with TensorFlow Datasets, TensorFlow's official collection of ready-to-use datasets. DeepWeeds was officially added to the TensorFlow Datasets catalog in August 2019.

Weeds and locations

The selected weed species are local to pastoral grasslands across the state of Queensland. They include: "Chinee apple", "Snake weed", "Lantana", "Prickly acacia", "Siam weed", "Parthenium", "Rubber vine" and "Parkinsonia". The images were collected from weed infestations at the following sites across Queensland: "Black River", "Charters Towers", "Cluden", "Douglas", "Hervey Range", "Kelso", "McKinlay" and "Paluma". The table and figure below break down the dataset by weed, location and geographical distribution.

Data organization

Images are assigned unique filenames that include the date/time the image was photographed and an ID number for the instrument which produced the image. The format is like so: YYYYMMDD-HHMMSS-ID, where the ID is simply an integer from 0 to 3. The unique filenames are strings of 17 characters, such as 20170320-093423-1.

labels

The labels.csv file assigns species labels to each image. It is a comma separated text file in the format:

Filename,Label,Species ... 20170207-154924-0,jpg,7,Snake weed 20170610-123859-1.jpg,1,Lantana 20180119-105722-1.jpg,8,Negative ...

Note: The specific label subsets of training (60%), validation (20%) and testing (20%) for the five-fold cross validation used in the paper are also provided here as CSV files in the same format as "labels.csv".

models

We provide the most successful ResNet50 and InceptionV3 models saved in Keras' hdf5 model format. The ResNet50 model, which provided the best results, has also been converted to UFF format in order to construct a TensorRT inference engine.

resnet.hdf5 inception.hdf5 resnet.uff

deepweeds.py

This python script trains and evaluates Keras' base implementation of ResNet50 and InceptionV3 on the DeepWeeds dataset, pre-trained with ImageNet weights. The performance of the networks are cross validated for 5 folds. The final classification accuracy is taken to be the average across the five folds. Similarly, the final confusion matrix from the associated paper aggregates across the five independent folds. The script also provides the ability to measure the inference speeds within the TensorFlow environment.

The script can be executed to carry out these computations using the following commands.

To train and evaluate the ResNet50 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model resnet.

To train and evaluate the InceptionV3 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model inception.

To measure inference times for the ResNet50 model, use python3 deepweeds.py inference --model models/resnet.hdf5.

To measure inference times for the InceptionV3 model, use python3 deepweeds.py inference --model models/inception.hdf5.

Dependencies

The required Python packages to execute deepweeds.py are listed in requirements.txt.

tensorrt

This folder includes C++ source code for creating and executing a ResNet50 TensorRT inference engine on an NVIDIA Jetson TX2 platform. To build and run on your Jetson TX2, execute the following commands:

cd tensorrt/src make -j4 cd ../bin ./resnet_inference

Citations

If you use the DeepWeeds dataset in your work, please cite it as:

IEEE style citation: “A. Olsen, D. A. Konovalov, B. Philippa, P. Ridd, J. C. Wood, J. Johns, W. Banks, B. Girgenti, O. Kenny, J. Whinney, B. Calvert, M. Rahimi Azghadi, and R. D. White, “DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning,” Scientific Reports, vol. 9, no. 2058, 2 2019. [Online]. Available: https://doi.org/10.1038/s41598-018-38343-3 ”

BibTeX

@article{DeepWeeds2019, author = {Alex Olsen and Dmitry A. Konovalov and Bronson Philippa and Peter Ridd and Jake C. Wood and Jamie Johns and Wesley Banks and Benjamin Girgenti and Owen Kenny and James Whinney and Brendan Calvert and Mostafa {Rahimi Azghadi} and Ronald D. White}, title = {{DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning}}, journal = {Scientific Reports}, year = 2019, number = 2058, month = 2, volume = 9, issue = 1, day = 14, url = "https://doi.org/10.1038/s41598-018-38343-3", doi = "10.1038/s41598-018-38343-3" }
u
Association analysis of high-high cluster road intersection crashes...
zivahub.uct.ac.za
xlsx
Updated Jun 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Vieira; Simon Hull; Roger Behrens (2024). Association analysis of high-high cluster road intersection crashes involving motorcycles that resulted in injuries within the CoCT in 2017, 2018 and 2019 [Dataset]. http://doi.org/10.25375/uct.25975825.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.25375/uct.25975825.v2
Dataset updated
Jun 7, 2024
Dataset provided by
University of Cape Town
Authors
Simone Vieira; Simon Hull; Roger Behrens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
City of Cape Town
Description
This dataset provides comprehensive information on road intersection crashes involving motorcycles (Motor tricycle, Motorcycle: under 125cc, Motorcycle: Above 125cc, Quadru-cycle) that have resulted in injuries recognised as "high-high" clusters within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in 33% of the total "high-high" cluster motorcycle road intersection crashes resulting in injuries for the years 2017, 2018 and 2019. The dataset is meticulously organised according to confidence metric values presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 29,8 KBNumber of Files: The dataset contains a total of 576 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-high" cluster fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes involving a motorcycle resulting in injuries that occurred within the "high-high" cluster fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,30 support metric value. Consequently, commonly occurring crash attributes among at least 33% of the "high-high" cluster road intersection motorcycle crashes resulting in injuries were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2019
Data analysis with pandas and python
kaggle.com
zip
Updated Apr 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
乡TOBY乡 (2023). Data analysis with pandas and python [Dataset]. https://www.kaggle.com/datasets/toby000/data-analysis-with-pandas-and-python
Explore at:
zip(701073 bytes)Available download formats
Dataset updated
Apr 16, 2023
Authors
乡TOBY乡
Description
This dataset includes data that is provided in the Udemy course "Data Analysis with Pandas and Python" by Boris Paskhaver.
Cafe ratings and Prices dataset
kaggle.com
zip
Updated Oct 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SevanthiBR (2025). Cafe ratings and Prices dataset [Dataset]. https://www.kaggle.com/datasets/sevanthibr/cafe-ratings-and-prices-dataset
Explore at:
zip(2706 bytes)Available download formats
Dataset updated
Oct 14, 2025
Authors
SevanthiBR
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains 100 fictional café records generated using the Python Faker library. Each record includes the café’s name, city, average customer rating, price range, and specialty.

Purpose: Designed for learning data analysis, visualization, and basic machine learning.

Source: Synthetic (no real-world data used).

Update Frequency: Static (one-time release).

License: CC0: Public Domain.

Columns:

Cafe_Name: Fictional café name

City: Random city name

Rating: Customer rating between 3.0–5.0

Price_for_Two: Average price for two customers

Specialty: Type of coffee specialty

Opening_Hours: Typical operating hours
Sample Students Grades Dataset Tutorial Notebook
kaggle.com
zip
Updated Jun 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arya Shah (2022). Sample Students Grades Dataset Tutorial Notebook [Dataset]. https://www.kaggle.com/datasets/aryashah2k/sample-students-grades-dataset-tutorial-notebook/discussion?sort=undefined
Explore at:
zip(360 bytes)Available download formats
Dataset updated
Jun 7, 2022
Authors
Arya Shah
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a sample Dataset used for the Tutorial Notebook Titled:

Mistakes To Avoid In Data Science⚠️| Python🐍

Follow the Notebook here: https://www.kaggle.com/code/aryashah2k/mistakes-to-avoid-in-data-science-python
Secure VANET Vehicle Dataset
kaggle.com
zip
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Python Developer (2025). Secure VANET Vehicle Dataset [Dataset]. https://www.kaggle.com/datasets/programmer3/secure-vanet-vehicle-dataset
Explore at:
zip(16148 bytes)Available download formats
Dataset updated
Jun 19, 2025
Authors
Python Developer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset for vehicle data for a Vehicular Ad Hoc Network (VANET) environment, focusing on real-time communication and lightweight cryptographic applications. It includes 500 vehicles with associated parameters such as speed (up to 250 km/h), GPS location, fixed message size (30 KB), frequency of communication, and threat level as the target column.
Gender Bias Text Analysis in Python w/ Datalab
kaggle.com
zip
Updated Sep 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Faseeh Nadeem (2024). Gender Bias Text Analysis in Python w/ Datalab [Dataset]. https://www.kaggle.com/datasets/faseehnadeem/gender-bias-text-analysis-in-python-w-datalab
Explore at:
zip(846531 bytes)Available download formats
Dataset updated
Sep 10, 2024
Authors
Faseeh Nadeem
Description
This was a code along project done with Datalab's course resources with instructors Data Evangelist Richie Cotton and Senior Data Science Content Developer Maham Khan.

Code was done in Datacamp's Datalab Workbook

The purpose of this code along project was to:

Learn how to apply a web scraper to create a corpus of freelancer reviews

Learn how to label the reviews as masculine or feminine based on pronouns

Identify language (words and phrases) used more often for freelancers with male pronounces versus female pronouns, and study how this language varies by field
Intelligent Classroom 6G Network Slicing Dataset
kaggle.com
zip
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Python Developer (2025). Intelligent Classroom 6G Network Slicing Dataset [Dataset]. https://www.kaggle.com/datasets/programmer3/intelligent-classroom-6g-network-slicing-dataset
Explore at:
zip(59926 bytes)Available download formats
Dataset updated
Jan 21, 2025
Authors
Python Developer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is designed for research and development in optimizing resource allocation and joint transmission for intelligent classrooms using 6G network slicing technology.

Features: Classroom_Type: The type of classroom (Virtual, Hybrid, In-Person, Large Lecture). Number_of_Devices: The number of connected devices in the classroom. Bandwidth (Mbps): Network bandwidth available for transmission. Latency (ms): Delay in data transmission. Throughput (Mbps): The effective rate of successful data delivery. Signal_Strength (dBm): Signal quality in decibels. CPU_Usage (%): Processor usage during operation. Memory_Usage (%): Memory consumption during operation. Performance: Target column indicating high (1) or low (0) performance based on network and classroom parameters.
Random Forest Regression for Salaries in Python
kaggle.com
zip
Updated Sep 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subho117 (2024). Random Forest Regression for Salaries in Python [Dataset]. https://www.kaggle.com/datasets/subho117/random-forest-regression-for-salaries-in-python
Explore at:
zip(313 bytes)Available download formats
Dataset updated
Sep 7, 2024
Authors
Subho117
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Subho117

Released under MIT

Contents
SuperMarket
kaggle.com
zip
Updated Jul 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad A B Fayyaz (2023). SuperMarket [Dataset]. https://www.kaggle.com/datasets/muhammadabfayyaz/supermarket
Explore at:
zip(36783 bytes)Available download formats
Dataset updated
Jul 5, 2023
Authors
Muhammad A B Fayyaz
Description
As part of the High School Project within Manchester Metropolitan University, students are encouraged to explore the field of data visualization using Python. This project aims to introduce students to the fundamental concepts of data visualization and provide them with practical experience in utilizing Python programming language for visualizing datasets.

Once you have analysed it, try answering these questions:

Q1: Total Customers from the dataset? Q2: Total Females in the dataset? Q3: Total Males in the dataset? Q4: What is the Min Rating? Q5: What is the Max Rating? Q6: What is the Average Rating? Q7: Which product line has the highest rating? Q8: Do Q7 for maximum and minimum rating as well. Q9: Which product line has the highest sales? Q10: Do Q9 for maximum and minimum rating as well.
Meta Kaggle Code
kaggle.com
zip
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(167219625372 bytes)Available download formats
Dataset updated
Nov 27, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Fuzzy based Smart Manufacturing Dataset
kaggle.com
zip
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WARNER (2025). Fuzzy based Smart Manufacturing Dataset [Dataset]. https://www.kaggle.com/datasets/s3programmer/fuzzy-based-smart-manufacturing-dataset
Explore at:
zip(74476 bytes)Available download formats
Dataset updated
Feb 18, 2025
Authors
WARNER
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Fuzzy based Smart Manufacturing Dataset contains 1,000 samples of sensor readings and process parameters collected for optimizing industrial process control using a Fuzzy-PID Controller. This dataset is designed for research and development in smart manufacturing, adaptive control, and Industry 4.0 applications.

Features: Temperature_C (°C): Temperature readings from industrial sensors. Pressure_Bar (Bar): Pressure levels in the manufacturing process. Speed_RPM (RPM): Rotational speed of motors or actuators. Error: Deviation between setpoint and measured value. Delta_Error: Rate of change of the error signal. Load_Variation: Load factor indicating dynamic variations (0.8 - 1.2). Ambient_Temp_C (°C): External environmental temperature. Energy_Consumption_W (W): Power usage in watts. Use Cases: Tuning and optimizing Fuzzy-PID controllers Energy-efficient control system design Fault detection and predictive maintenance Industrial automation and adaptive process control This dataset is particularly useful for machine learning, optimization algorithms, and MATLAB/Python simulations in smart manufacturing environments. 🚀
IPL(Cricket League) Dataset For Beginner Analaysis
kaggle.com
zip
Updated Sep 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spoorthi U K (2020). IPL(Cricket League) Dataset For Beginner Analaysis [Dataset]. https://www.kaggle.com/datasets/spoorthiuk/iplcricket-league-dataset-for-beginner-analaysis
Explore at:
zip(18022 bytes)Available download formats
Dataset updated
Sep 24, 2020
Authors
Spoorthi U K
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Indian Premier League Popularly known as IPL is one of the most popular league in the world. Every year it is spectated by billions of cricket fans from around the world. The league has Indian and foreign players where newbies get their first taste of what it's like to play against international player. Due to it's popularity it attracts many bigshot companies and business men to make an investment in the playing teams. The team names are associated with regions and the league is usually played around India in various stadiums

The following data sets consists of the IPL data from the year 2008-2019 .A total of 764 matches have been played A point to note is that few of the teams have either dropped out or changed their team name over the years .So it's important to do some fact check | Columns | Description | | Team1 | Team #1 playing the match| | Team2 | Team #2 playing the match | | Date | The day on which the match was played | | Year | The year the match was played | | Time | The matches are usually played in 2 slots, afternoon and evening this gives the time when the match was started | | Place | Contains the city and the stadium the match was played in | | Toss | Contains name of the team that won the toss | | TossDecision | Gives details on what was the decision of the team winning the toss | | Result | Contains result of the match| |Tied | Contains information of the tie| | won**_**runs | Contains information about the winning team that batted first | | won**_**wickets | Contains information about the winning team that bowled first |

The dataset was web scraped from cricbuzz.com using python.The website contains detailed information of all the cricket matches

Things to explore 1. Common traits in the data 2. Which team won the most number of matches? 3. Which team played most number of games? 4. How does winning the toss affect the game result? 5. Use the data to predict future ipl matches

Facebook

Twitter

Click to copy link

Link copied

Cite

Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zip(23875170 bytes)Available download formats

Dataset updated

Dec 9, 2021

Authors

Aslan Ahmedov

Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import
Data Understanding and Exploration
Transformation of the data – so that is ready to be consumed by the association rules algorithm
Running association rules
Exploring the rules generated
Filtering the generated rules
Visualization of Rule

Dataset Description

File name: Assignment-1_Data
List name: retaildata
File format: . xlsx
Number of Row: 522065
Number of Attributes: 7
- BillNo: 6-digit number assigned to each transaction. Nominal.
- Itemname: Product name. Nominal.
- Quantity: The quantities of each product per transaction. Numeric.
- Date: The day and time when each transaction was generated. Numeric.
- Price: Product price. Numeric.
- CustomerID: 5-digit number assigned to each customer. Nominal.
- Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
readxl - Read Excel Files in R.
plyr - Tools for Splitting, Applying and Combining Data.
ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
knitr - Dynamic Report generation in R.
magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Clear search

Close search

Google apps

Main menu

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Accuracy and AUC value of ML algorithms using three hyper parameter tuning...

market_basket_optimization

LLM-SE Python Wheel

Dataset

Contents

Socio-demographic characteristics among adolescent girls in Ethiopia, 2016...

Dataset for "Neural embedding of beliefs reveals the role of relative...

Association analysis of high-high cluster road intersection crashes within...

DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning

Association analysis of high-high cluster road intersection crashes...

Data analysis with pandas and python

Cafe ratings and Prices dataset

Sample Students Grades Dataset Tutorial Notebook

Mistakes To Avoid In Data Science⚠️| Python🐍

Secure VANET Vehicle Dataset

Gender Bias Text Analysis in Python w/ Datalab

Intelligent Classroom 6G Network Slicing Dataset

Random Forest Regression for Salaries in Python

Dataset

Contents

SuperMarket

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Fuzzy based Smart Manufacturing Dataset

IPL(Cricket League) Dataset For Beginner Analaysis

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing