43 datasets found
  1. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  2. Accuracy and AUC value of ML algorithms using three hyper parameter tuning...

    • plos.figshare.com
    xls
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie (2025). Accuracy and AUC value of ML algorithms using three hyper parameter tuning techniques. [Dataset]. http://doi.org/10.1371/journal.pone.0316452.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accuracy and AUC value of ML algorithms using three hyper parameter tuning techniques.

  3. market_basket_optimization

    • kaggle.com
    zip
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupak Roy/ Bob (2023). market_basket_optimization [Dataset]. https://www.kaggle.com/rupakroy/market-basket-optimization
    Explore at:
    zip(47991 bytes)Available download formats
    Dataset updated
    Feb 11, 2023
    Authors
    Rupak Roy/ Bob
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset is specially curated for Association Rule Learning using **Apriori and Eclat **using Python to predict Shopping Behavior.

    Apriori is one of the powerful algorithms to understand association among the products. Take an example of a supermarket where most of the person buys egg also buys milk and also baking soda. Probably the reason is they want to bake a cake for new year's eve.

    So we can see there is an association between eggs, milk as well as baking soda. Now after knowing such association we simply put all the 3 things together in the shelf and that definitely will increase our sales.

    Let’s perform Apriori with the help of an example.

  4. LLM-SE Python Wheel

    • kaggle.com
    zip
    Updated Oct 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ranchantan (2023). LLM-SE Python Wheel [Dataset]. https://www.kaggle.com/ranchantan/llm-se-python-wheel
    Explore at:
    zip(146397 bytes)Available download formats
    Dataset updated
    Oct 7, 2023
    Authors
    Ranchantan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Ranchantan

    Released under CC0: Public Domain

    Contents

  5. Socio-demographic characteristics among adolescent girls in Ethiopia, 2016...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xls
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie (2025). Socio-demographic characteristics among adolescent girls in Ethiopia, 2016 EDHS. [Dataset]. http://doi.org/10.1371/journal.pone.0316452.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ethiopia
    Description

    Socio-demographic characteristics among adolescent girls in Ethiopia, 2016 EDHS.

  6. Dataset for "Neural embedding of beliefs reveals the role of relative...

    • figshare.com
    zip
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Byunghwee Lee (2025). Dataset for "Neural embedding of beliefs reveals the role of relative dissonance in human decision-making". [Dataset]. http://doi.org/10.6084/m9.figshare.28327019.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Byunghwee Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project contains the dataset used to generate the results of the study "Neural embedding of beliefs reveals the role of relative dissonance in human decision-making" (arXiv:2408.07237).Authors: Byunghwee Lee1, Rachith Aiyappa1, Yong-Yeol Ahn1, Haewoon Kwak1, Jisun An11 Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, USA, 47408DDO_dataset.zip (original Debate.org dataset)This archive contains the original raw Debate.org dataset, which was obtained from the publicly accessible website (https://esdurmus.github.io/ddo.html), maintained by Esin Durmus [1,2]. All credit for this dataset belongs entirely to the original authors, Esin Durmus and Claire Cardie. We do not claim any authorship or modifications to this dataset. It is provided here solely for reproducibility and reference in our study.The dataset includes the following three files:- debates.json: This JSON file contains a Python dictionary that assigns a debate name --- a unique name for each debate --- to debate information- users.json: This JSON file includes a Python dictionary containing user information- readme.md file from the authors (Esin Durmus and Claire Cardie)When using this dataset, please reference Debate.org and cite the following works:[1] Esin Durmus and Claire Cardie. 2019. A Corpus for Modeling User and Language Effects in Argumentation on Online Debating. In Proceedings of the 57th Conference of the Association for Computational Linguistics. Florence, Italy. Association for Computational Linguistics.[2] Esin Durmus and Claire Cardie. 2018. Exploring the Role of Prior Beliefs for Argument Persuasion. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL).df_ddo_including_only_truebeliefs_nodup(N192307).pThis file contains a pre-processed dataset used in our project (arXiv:2408.07237). The dataset includes records of user participation in debates (both as debaters and voters) as well as voting records across various debates. The belief triplet dataset used for fine-tuning a Sentence-BERT model was generated based on this pre-processed dataset. Detailed explanations of the pre-processing procedure are provided in the Methods section of the paper.When using this pre-processed dataset, please cite the following reference (in addition to the two papers mentioned above):[3] Lee, B., Aiyappa, R., Ahn, Y. Y., Kwak, H., & An, J. (2024). Neural embedding of beliefs reveals the role of relative dissonance in human decision-making. arXiv preprint arXiv:2408.07237.model_full_data.zipThis zip file contains five fine-tuned S-BERT models trained using a 5-fold belief triplet dataset. After unzipping the files, users can import the models using the 'sentence_transformers' Python library (https://sbert.net/).

  7. u

    Association analysis of high-high cluster road intersection crashes within...

    • zivahub.uct.ac.za
    xlsx
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simone Vieira; Simon Hull; Roger Behrens (2024). Association analysis of high-high cluster road intersection crashes within the CoCT in 2017, 2018, 2019 and 2021 [Dataset]. http://doi.org/10.25375/uct.25975285.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    University of Cape Town
    Authors
    Simone Vieira; Simon Hull; Roger Behrens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    City of Cape Town
    Description

    This dataset provides comprehensive information on road intersection crashes recognised as "high-high" clusters within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in at least 5% of the total "high-high" cluster road intersection crashes for the years 2017, 2018, 2019, and 2021. The dataset is meticulously organised according to support metric values, ranging from 0,05 to 0,0235, with entries presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 499 KBNumber of Files: The dataset contains a total of 7186 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-high" cluster fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes that occurred within the "high-high" cluster fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,05 support metric value. Consequently, commonly occurring crash attributes among at least 5% of the "high-high" cluster road intersection crashes were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2021 (2020 data omitted)

  8. Z

    DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning

    • data.niaid.nih.gov
    Updated May 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olsen, Alex; Konovalov, Dimitriv A.; Philippa, Bronson; Ridd, Peter; Wood, Jake C.; Johns, Jamie; Banks, Wesley; Girgenti, Benjamin; Kenny, Owen; Whinney, James; Calvert, Brendan; Rahimi Azghadi, Mostafa; White, Ronald D. (2023). DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7939059
    Explore at:
    Dataset updated
    May 16, 2023
    Authors
    Olsen, Alex; Konovalov, Dimitriv A.; Philippa, Bronson; Ridd, Peter; Wood, Jake C.; Johns, Jamie; Banks, Wesley; Girgenti, Benjamin; Kenny, Owen; Whinney, James; Calvert, Brendan; Rahimi Azghadi, Mostafa; White, Ronald D.
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning

    This repository makes available the source code and public dataset for the work, "DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning", published with open access by Scientific Reports: https://www.nature.com/articles/s41598-018-38343-3. The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighbouring flora. In our work, the dataset was classified to an average accuracy of 95.7% with the ResNet50 deep convolutional neural network.

    The source code, images and annotations are licensed under CC BY 4.0 license. The contents of this repository are released under an Apache 2 license.

    Download the dataset images and our trained models

    images.zip (468 MB)

    models.zip (477 MB)

    Due to the size of the images and models they are hosted outside of the Github repository. The images and models must be downloaded into directories named "images" and "models", respectively, at the root of the repository. If you execute the python script (deepweeds.py), as instructed below, this step will be performed for you automatically.

    TensorFlow Datasets

    Alternatively, you can access the DeepWeeds dataset with TensorFlow Datasets, TensorFlow's official collection of ready-to-use datasets. DeepWeeds was officially added to the TensorFlow Datasets catalog in August 2019.

    Weeds and locations

    The selected weed species are local to pastoral grasslands across the state of Queensland. They include: "Chinee apple", "Snake weed", "Lantana", "Prickly acacia", "Siam weed", "Parthenium", "Rubber vine" and "Parkinsonia". The images were collected from weed infestations at the following sites across Queensland: "Black River", "Charters Towers", "Cluden", "Douglas", "Hervey Range", "Kelso", "McKinlay" and "Paluma". The table and figure below break down the dataset by weed, location and geographical distribution.

    Data organization

    Images are assigned unique filenames that include the date/time the image was photographed and an ID number for the instrument which produced the image. The format is like so: YYYYMMDD-HHMMSS-ID, where the ID is simply an integer from 0 to 3. The unique filenames are strings of 17 characters, such as 20170320-093423-1.

    labels

    The labels.csv file assigns species labels to each image. It is a comma separated text file in the format:

    Filename,Label,Species ... 20170207-154924-0,jpg,7,Snake weed 20170610-123859-1.jpg,1,Lantana 20180119-105722-1.jpg,8,Negative ...

    Note: The specific label subsets of training (60%), validation (20%) and testing (20%) for the five-fold cross validation used in the paper are also provided here as CSV files in the same format as "labels.csv".

    models

    We provide the most successful ResNet50 and InceptionV3 models saved in Keras' hdf5 model format. The ResNet50 model, which provided the best results, has also been converted to UFF format in order to construct a TensorRT inference engine.

    resnet.hdf5 inception.hdf5 resnet.uff

    deepweeds.py

    This python script trains and evaluates Keras' base implementation of ResNet50 and InceptionV3 on the DeepWeeds dataset, pre-trained with ImageNet weights. The performance of the networks are cross validated for 5 folds. The final classification accuracy is taken to be the average across the five folds. Similarly, the final confusion matrix from the associated paper aggregates across the five independent folds. The script also provides the ability to measure the inference speeds within the TensorFlow environment.

    The script can be executed to carry out these computations using the following commands.

    To train and evaluate the ResNet50 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model resnet.

    To train and evaluate the InceptionV3 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model inception.

    To measure inference times for the ResNet50 model, use python3 deepweeds.py inference --model models/resnet.hdf5.

    To measure inference times for the InceptionV3 model, use python3 deepweeds.py inference --model models/inception.hdf5.

    Dependencies

    The required Python packages to execute deepweeds.py are listed in requirements.txt.

    tensorrt

    This folder includes C++ source code for creating and executing a ResNet50 TensorRT inference engine on an NVIDIA Jetson TX2 platform. To build and run on your Jetson TX2, execute the following commands:

    cd tensorrt/src make -j4 cd ../bin ./resnet_inference

    Citations

    If you use the DeepWeeds dataset in your work, please cite it as:

    IEEE style citation: “A. Olsen, D. A. Konovalov, B. Philippa, P. Ridd, J. C. Wood, J. Johns, W. Banks, B. Girgenti, O. Kenny, J. Whinney, B. Calvert, M. Rahimi Azghadi, and R. D. White, “DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning,” Scientific Reports, vol. 9, no. 2058, 2 2019. [Online]. Available: https://doi.org/10.1038/s41598-018-38343-3

    BibTeX

    @article{DeepWeeds2019, author = {Alex Olsen and Dmitry A. Konovalov and Bronson Philippa and Peter Ridd and Jake C. Wood and Jamie Johns and Wesley Banks and Benjamin Girgenti and Owen Kenny and James Whinney and Brendan Calvert and Mostafa {Rahimi Azghadi} and Ronald D. White}, title = {{DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning}}, journal = {Scientific Reports}, year = 2019, number = 2058, month = 2, volume = 9, issue = 1, day = 14, url = "https://doi.org/10.1038/s41598-018-38343-3", doi = "10.1038/s41598-018-38343-3" }

  9. u

    Association analysis of high-high cluster road intersection crashes...

    • zivahub.uct.ac.za
    xlsx
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simone Vieira; Simon Hull; Roger Behrens (2024). Association analysis of high-high cluster road intersection crashes involving motorcycles that resulted in injuries within the CoCT in 2017, 2018 and 2019 [Dataset]. http://doi.org/10.25375/uct.25975825.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    University of Cape Town
    Authors
    Simone Vieira; Simon Hull; Roger Behrens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    City of Cape Town
    Description

    This dataset provides comprehensive information on road intersection crashes involving motorcycles (Motor tricycle, Motorcycle: under 125cc, Motorcycle: Above 125cc, Quadru-cycle) that have resulted in injuries recognised as "high-high" clusters within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in 33% of the total "high-high" cluster motorcycle road intersection crashes resulting in injuries for the years 2017, 2018 and 2019. The dataset is meticulously organised according to confidence metric values presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 29,8 KBNumber of Files: The dataset contains a total of 576 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-high" cluster fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes involving a motorcycle resulting in injuries that occurred within the "high-high" cluster fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,30 support metric value. Consequently, commonly occurring crash attributes among at least 33% of the "high-high" cluster road intersection motorcycle crashes resulting in injuries were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2019

  10. Data analysis with pandas and python

    • kaggle.com
    zip
    Updated Apr 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    乡TOBY乡 (2023). Data analysis with pandas and python [Dataset]. https://www.kaggle.com/datasets/toby000/data-analysis-with-pandas-and-python
    Explore at:
    zip(701073 bytes)Available download formats
    Dataset updated
    Apr 16, 2023
    Authors
    乡TOBY乡
    Description

    This dataset includes data that is provided in the Udemy course "Data Analysis with Pandas and Python" by Boris Paskhaver.

  11. Cafe ratings and Prices dataset

    • kaggle.com
    zip
    Updated Oct 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SevanthiBR (2025). Cafe ratings and Prices dataset [Dataset]. https://www.kaggle.com/datasets/sevanthibr/cafe-ratings-and-prices-dataset
    Explore at:
    zip(2706 bytes)Available download formats
    Dataset updated
    Oct 14, 2025
    Authors
    SevanthiBR
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains 100 fictional café records generated using the Python Faker library. Each record includes the café’s name, city, average customer rating, price range, and specialty.

    Purpose: Designed for learning data analysis, visualization, and basic machine learning.

    Source: Synthetic (no real-world data used).

    Update Frequency: Static (one-time release).

    License: CC0: Public Domain.

    Columns:

    Cafe_Name: Fictional café name

    City: Random city name

    Rating: Customer rating between 3.0–5.0

    Price_for_Two: Average price for two customers

    Specialty: Type of coffee specialty

    Opening_Hours: Typical operating hours

  12. Sample Students Grades Dataset Tutorial Notebook

    • kaggle.com
    zip
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arya Shah (2022). Sample Students Grades Dataset Tutorial Notebook [Dataset]. https://www.kaggle.com/datasets/aryashah2k/sample-students-grades-dataset-tutorial-notebook/discussion?sort=undefined
    Explore at:
    zip(360 bytes)Available download formats
    Dataset updated
    Jun 7, 2022
    Authors
    Arya Shah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a sample Dataset used for the Tutorial Notebook Titled:

    Mistakes To Avoid In Data Science⚠️| Python🐍

    Follow the Notebook here: https://www.kaggle.com/code/aryashah2k/mistakes-to-avoid-in-data-science-python

  13. Secure VANET Vehicle Dataset

    • kaggle.com
    zip
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Python Developer (2025). Secure VANET Vehicle Dataset [Dataset]. https://www.kaggle.com/datasets/programmer3/secure-vanet-vehicle-dataset
    Explore at:
    zip(16148 bytes)Available download formats
    Dataset updated
    Jun 19, 2025
    Authors
    Python Developer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset for vehicle data for a Vehicular Ad Hoc Network (VANET) environment, focusing on real-time communication and lightweight cryptographic applications. It includes 500 vehicles with associated parameters such as speed (up to 250 km/h), GPS location, fixed message size (30 KB), frequency of communication, and threat level as the target column.

  14. Gender Bias Text Analysis in Python w/ Datalab

    • kaggle.com
    zip
    Updated Sep 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faseeh Nadeem (2024). Gender Bias Text Analysis in Python w/ Datalab [Dataset]. https://www.kaggle.com/datasets/faseehnadeem/gender-bias-text-analysis-in-python-w-datalab
    Explore at:
    zip(846531 bytes)Available download formats
    Dataset updated
    Sep 10, 2024
    Authors
    Faseeh Nadeem
    Description

    This was a code along project done with Datalab's course resources with instructors Data Evangelist Richie Cotton and Senior Data Science Content Developer Maham Khan.

    Code was done in Datacamp's Datalab Workbook

    The purpose of this code along project was to:

    Learn how to apply a web scraper to create a corpus of freelancer reviews

    Learn how to label the reviews as masculine or feminine based on pronouns

    Identify language (words and phrases) used more often for freelancers with male pronounces versus female pronouns, and study how this language varies by field

  15. Intelligent Classroom 6G Network Slicing Dataset

    • kaggle.com
    zip
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Python Developer (2025). Intelligent Classroom 6G Network Slicing Dataset [Dataset]. https://www.kaggle.com/datasets/programmer3/intelligent-classroom-6g-network-slicing-dataset
    Explore at:
    zip(59926 bytes)Available download formats
    Dataset updated
    Jan 21, 2025
    Authors
    Python Developer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is designed for research and development in optimizing resource allocation and joint transmission for intelligent classrooms using 6G network slicing technology.

    Features: Classroom_Type: The type of classroom (Virtual, Hybrid, In-Person, Large Lecture). Number_of_Devices: The number of connected devices in the classroom. Bandwidth (Mbps): Network bandwidth available for transmission. Latency (ms): Delay in data transmission. Throughput (Mbps): The effective rate of successful data delivery. Signal_Strength (dBm): Signal quality in decibels. CPU_Usage (%): Processor usage during operation. Memory_Usage (%): Memory consumption during operation. Performance: Target column indicating high (1) or low (0) performance based on network and classroom parameters.

  16. Random Forest Regression for Salaries in Python

    • kaggle.com
    zip
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subho117 (2024). Random Forest Regression for Salaries in Python [Dataset]. https://www.kaggle.com/datasets/subho117/random-forest-regression-for-salaries-in-python
    Explore at:
    zip(313 bytes)Available download formats
    Dataset updated
    Sep 7, 2024
    Authors
    Subho117
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Subho117

    Released under MIT

    Contents

  17. SuperMarket

    • kaggle.com
    zip
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad A B Fayyaz (2023). SuperMarket [Dataset]. https://www.kaggle.com/datasets/muhammadabfayyaz/supermarket
    Explore at:
    zip(36783 bytes)Available download formats
    Dataset updated
    Jul 5, 2023
    Authors
    Muhammad A B Fayyaz
    Description

    As part of the High School Project within Manchester Metropolitan University, students are encouraged to explore the field of data visualization using Python. This project aims to introduce students to the fundamental concepts of data visualization and provide them with practical experience in utilizing Python programming language for visualizing datasets.

    Once you have analysed it, try answering these questions:

    Q1: Total Customers from the dataset? Q2: Total Females in the dataset? Q3: Total Males in the dataset? Q4: What is the Min Rating? Q5: What is the Max Rating? Q6: What is the Average Rating? Q7: Which product line has the highest rating? Q8: Do Q7 for maximum and minimum rating as well. Q9: Which product line has the highest sales? Q10: Do Q9 for maximum and minimum rating as well.

  18. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(167219625372 bytes)Available download formats
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  19. Fuzzy based Smart Manufacturing Dataset

    • kaggle.com
    zip
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WARNER (2025). Fuzzy based Smart Manufacturing Dataset [Dataset]. https://www.kaggle.com/datasets/s3programmer/fuzzy-based-smart-manufacturing-dataset
    Explore at:
    zip(74476 bytes)Available download formats
    Dataset updated
    Feb 18, 2025
    Authors
    WARNER
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Fuzzy based Smart Manufacturing Dataset contains 1,000 samples of sensor readings and process parameters collected for optimizing industrial process control using a Fuzzy-PID Controller. This dataset is designed for research and development in smart manufacturing, adaptive control, and Industry 4.0 applications.

    Features: Temperature_C (°C): Temperature readings from industrial sensors. Pressure_Bar (Bar): Pressure levels in the manufacturing process. Speed_RPM (RPM): Rotational speed of motors or actuators. Error: Deviation between setpoint and measured value. Delta_Error: Rate of change of the error signal. Load_Variation: Load factor indicating dynamic variations (0.8 - 1.2). Ambient_Temp_C (°C): External environmental temperature. Energy_Consumption_W (W): Power usage in watts. Use Cases: Tuning and optimizing Fuzzy-PID controllers Energy-efficient control system design Fault detection and predictive maintenance Industrial automation and adaptive process control This dataset is particularly useful for machine learning, optimization algorithms, and MATLAB/Python simulations in smart manufacturing environments. 🚀

  20. IPL(Cricket League) Dataset For Beginner Analaysis

    • kaggle.com
    zip
    Updated Sep 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spoorthi U K (2020). IPL(Cricket League) Dataset For Beginner Analaysis [Dataset]. https://www.kaggle.com/datasets/spoorthiuk/iplcricket-league-dataset-for-beginner-analaysis
    Explore at:
    zip(18022 bytes)Available download formats
    Dataset updated
    Sep 24, 2020
    Authors
    Spoorthi U K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Indian Premier League Popularly known as IPL is one of the most popular league in the world. Every year it is spectated by billions of cricket fans from around the world. The league has Indian and foreign players where newbies get their first taste of what it's like to play against international player. Due to it's popularity it attracts many bigshot companies and business men to make an investment in the playing teams. The team names are associated with regions and the league is usually played around India in various stadiums

    The following data sets consists of the IPL data from the year 2008-2019 .A total of 764 matches have been played A point to note is that few of the teams have either dropped out or changed their team name over the years .So it's important to do some fact check | Columns | Description | | Team1 | Team #1 playing the match| | Team2 | Team #2 playing the match | | Date | The day on which the match was played | | Year | The year the match was played | | Time | The matches are usually played in 2 slots, afternoon and evening this gives the time when the match was started | | Place | Contains the city and the stadium the match was played in | | Toss | Contains name of the team that won the toss | | TossDecision | Gives details on what was the decision of the team winning the toss | | Result | Contains result of the match| |Tied | Contains information of the tie| | won**_**runs | Contains information about the winning team that batted first | | won**_**wickets | Contains information about the winning team that bowled first |

    The dataset was web scraped from cricbuzz.com using python.The website contains detailed information of all the cricket matches

    Things to explore 1. Common traits in the data 2. Which team won the most number of matches? 3. Which team played most number of games? 4. How does winning the toss affect the game result? 5. Use the data to predict future ipl matches

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Organization logo

Market Basket Analysis

Analyzing Consumer Behaviour Using MBA Association Rule Mining

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description

Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

  • Data Import
  • Data Understanding and Exploration
  • Transformation of the data – so that is ready to be consumed by the association rules algorithm
  • Running association rules
  • Exploring the rules generated
  • Filtering the generated rules
  • Visualization of Rule

Dataset Description

  • File name: Assignment-1_Data
  • List name: retaildata
  • File format: . xlsx
  • Number of Row: 522065
  • Number of Attributes: 7

    • BillNo: 6-digit number assigned to each transaction. Nominal.
    • Itemname: Product name. Nominal.
    • Quantity: The quantities of each product per transaction. Numeric.
    • Date: The day and time when each transaction was generated. Numeric.
    • Price: Product price. Numeric.
    • CustomerID: 5-digit number assigned to each customer. Nominal.
    • Country: Name of the country where each customer resides. Nominal.

imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

  • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
  • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
  • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
  • readxl - Read Excel Files in R.
  • plyr - Tools for Splitting, Applying and Combining Data.
  • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • knitr - Dynamic Report generation in R.
  • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
  • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
  • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

Search
Clear search
Close search
Google apps
Main menu