52 datasets found
  1. Data from: Data Mining Project Dataset

    • kaggle.com
    zip
    Updated Dec 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Dobres (2020). Data Mining Project Dataset [Dataset]. https://www.kaggle.com/markdobres/data-mining-project-dataset
    Explore at:
    zip(1552418617 bytes)Available download formats
    Dataset updated
    Dec 10, 2020
    Authors
    Mark Dobres
    Description

    Dataset

    This dataset was created by Mark Dobres

    Contents

  2. Data from: Data Mining Project

    • kaggle.com
    zip
    Updated Nov 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oscar NG (2018). Data Mining Project [Dataset]. https://www.kaggle.com/oscar321a/data-mining-project
    Explore at:
    zip(8083512 bytes)Available download formats
    Dataset updated
    Nov 30, 2018
    Authors
    Oscar NG
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Oscar NG

    Released under CC0: Public Domain

    Contents

  3. Data from: Data mining Project

    • kaggle.com
    zip
    Updated May 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuxian Chen (2022). Data mining Project [Dataset]. https://www.kaggle.com/datasets/cyanlu/data-mining-project
    Explore at:
    zip(165846374 bytes)Available download formats
    Dataset updated
    May 27, 2022
    Authors
    Yuxian Chen
    Description

    Dataset

    This dataset was created by Yuxian Chen

    Contents

  4. Data Mining Project 1

    • kaggle.com
    zip
    Updated Jan 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will Newt (2024). Data Mining Project 1 [Dataset]. https://www.kaggle.com/datasets/willnewt/data-mining-project-1/data
    Explore at:
    zip(6058765 bytes)Available download formats
    Dataset updated
    Jan 29, 2024
    Authors
    Will Newt
    Description

    Dataset

    This dataset was created by Will Newt

    Contents

  5. Data Mining Project - Boston

    • kaggle.com
    zip
    Updated Nov 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SophieLiu (2019). Data Mining Project - Boston [Dataset]. https://www.kaggle.com/sliu65/data-mining-project-boston
    Explore at:
    zip(59313797 bytes)Available download formats
    Dataset updated
    Nov 25, 2019
    Authors
    SophieLiu
    Area covered
    Boston
    Description

    Context

    To make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.

    Use of Data Files

    You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:

    This loads the file into R

    df<-read.csv('uber.csv')

    The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

    df_black<-subset(uber_df, uber_df$name == 'Black')

    This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

    write.csv(df_black, "nameofthefileyouwanttosaveas.csv")

    The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

    getwd()

    The output will be the file path to your working directory. You will find the file you just created in that folder.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  6. DATA MINING

    • kaggle.com
    zip
    Updated Dec 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chimaralavamshireddy (2021). DATA MINING [Dataset]. https://www.kaggle.com/chimaralavamshireddy/data-mining
    Explore at:
    zip(901512 bytes)Available download formats
    Dataset updated
    Dec 3, 2021
    Authors
    chimaralavamshireddy
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Dataset

    This dataset was created by chimaralavamshireddy

    Released under U.S. Government Works

    Contents

  7. Data Mining Project 1 Sapfile

    • kaggle.com
    zip
    Updated Jan 31, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prutchakorn (2019). Data Mining Project 1 Sapfile [Dataset]. https://www.kaggle.com/prutchakorn/data-mining-project-1-sapfile
    Explore at:
    zip(2244 bytes)Available download formats
    Dataset updated
    Jan 31, 2019
    Authors
    Prutchakorn
    Description

    Dataset

    This dataset was created by Prutchakorn

    Contents

  8. IGDB Dataset for Data Mining Projects

    • kaggle.com
    zip
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emir Şahin (2025). IGDB Dataset for Data Mining Projects [Dataset]. https://www.kaggle.com/datasets/emirshn/igdb-dataset-for-data-mining-projects
    Explore at:
    zip(56776900 bytes)Available download formats
    Dataset updated
    Jul 26, 2025
    Authors
    Emir Şahin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains detailed metadata for over 240,000 video games sourced from the IGDB API. It includes information about each game's release, genres, themes, platforms, developers, publishers, player perspectives, game modes, ratings, summaries, media assets (screenshots, artworks, covers), and more. This dataset is ideal for projects in game recommendation, clustering, tagging, genre analysis, and player preference modeling.

  9. Airbnb Berlin 2020

    • kaggle.com
    zip
    Updated Sep 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MrRaghav (2020). Airbnb Berlin 2020 [Dataset]. https://www.kaggle.com/raghavs1003/airbnb-berlin-2020
    Explore at:
    zip(112994067 bytes)Available download formats
    Dataset updated
    Sep 22, 2020
    Authors
    MrRaghav
    Area covered
    Berlin
    Description

    Acknowledgements

    http://insideairbnb.com/get-the-data.html

    Inspiration

    A. Is there seasonality in the prices of properties listed in Airbnb-Berlin? B. Which are the popular areas of Berlin among the tourists? C. An analysis of reviews – using text mining D. Which are the most commonly available amenities in the properties of Berlin? E. Can we predict the price of properties in Berlin by analyzing other column values?

  10. Applied Data Mining Final Project - T- Shirt Data

    • kaggle.com
    zip
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danizo (2025). Applied Data Mining Final Project - T- Shirt Data [Dataset]. https://www.kaggle.com/datasets/danizo/applied-data-mining-final-project-t-shirt-data/code
    Explore at:
    zip(375026934 bytes)Available download formats
    Dataset updated
    Jan 7, 2025
    Authors
    Danizo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Danizo

    Released under Apache 2.0

    Contents

  11. Human Activity Recognition Accelerometer data

    • kaggle.com
    zip
    Updated Mar 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandy3108 (2022). Human Activity Recognition Accelerometer data [Dataset]. https://www.kaggle.com/datasets/sandy3108/human-activity-recognition-accelerometer-data/suggestions?status=pending&yourSuggestions=true
    Explore at:
    zip(650483753 bytes)Available download formats
    Dataset updated
    Mar 2, 2022
    Authors
    Sandy3108
    Description

    Dataset

    This dataset was created by Sandy3108

    Contents

  12. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  13. Diabetes Dataset (Data mining project)

    • kaggle.com
    zip
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Izaz Bhuiyan (2025). Diabetes Dataset (Data mining project) [Dataset]. https://www.kaggle.com/datasets/ahmedizazbhuiyan/diabetes-dataset-data-mining-project
    Explore at:
    zip(9128 bytes)Available download formats
    Dataset updated
    Nov 7, 2025
    Authors
    Ahmed Izaz Bhuiyan
    Description

    Dataset

    This dataset was created by Ahmed Izaz Bhuiyan

    Contents

  14. Beginner Data Mining Datasets

    • kaggle.com
    zip
    Updated May 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    verdecali (2022). Beginner Data Mining Datasets [Dataset]. https://www.kaggle.com/datasets/verdecali/beginner-data-mining-datasets
    Explore at:
    zip(1672021 bytes)Available download formats
    Dataset updated
    May 28, 2022
    Authors
    verdecali
    Description

    These are artificially made beginner data mining datasets for learning purposes.

    Case study:

    • FEELS LIKE HOME is an interior design company, which has about 100 000 registered customers and provide services for more than 200 000 clients annually.
    • The range of the products can be divided in 5 major classes: Decor accessories, Furniture, Textiles, Lighting and Art with an option to purchase Limited Edition versions for an extra charge. These goods can be distributed by 3 channels: Physical stores, yearly catalogs and the companies’ website.
    • FEELS LIKE HOME has been doing a great job during recent years, achieving decent profits and revenues, but the future remains volatile. In order to solve the problem of instability the company is planning to launch new marketing program, especially to improve the accuracy of marketing campaigns.

    The aim of FeelsLikeHome_Campaign dataset is to create project is in which you build a predictive model (using a sample of 2500 clients’ data) forecasting the highest profit from the next marketing campaign, which will indicate the customers who will be the most likely to accept the offer.

    The aim of FeelsLikeHome_Cluster dataset is to create project in which you split company’s customer base on homogenous clusters (using 5000 clients’ data) and propose draft marketing strategies for these groups based on customer behavior and information about their profile.

    FeelsLikeHome_Score dataset can be used to calculate total profit from marketing campaign and for producing a list of sorted customers by the probability of the dependent variable in predictive model problem.

  15. Web Mining for Collaborative Food Delivery

    • kaggle.com
    zip
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jocelyn Dumlao (2023). Web Mining for Collaborative Food Delivery [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/web-mining-for-collaborative-food-delivery
    Explore at:
    zip(396903 bytes)Available download formats
    Dataset updated
    Aug 26, 2023
    Authors
    Jocelyn Dumlao
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description

    This is the main data set that was built for the work titled: "A Web Mining Approach to Collaborative Consumption of Food Delivery Services" which is the official institutional research project of Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz.

    Categories

    Urban Transportation, Consumer, e-Commerce Retail

    Acknowledgements & Source

    Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz

    Data Source

    View Details

    Image Source

    Please don't forget to upvote if you find this useful.

  16. Open-Pit Mining Block Model Dataset

    • kaggle.com
    zip
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). Open-Pit Mining Block Model Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/open-pit-mining-block-model-dataset/data
    Explore at:
    zip(1812380 bytes)Available download formats
    Dataset updated
    Jul 15, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is a generated representation of an open-pit mining block model, designed to reflect realistic geological, spatial, and economic conditions found in large-scale mineral extraction projects. It contains 75,000 individual blocks, each representing a unit of earth material with associated attributes that influence decision-making in mine planning and resource evaluation.

    The dataset includes essential parameters such as ore grade, tonnage, economic value, and operational costs. A calculated profit value and a corresponding binary target label indicate whether a block is considered economically viable for extraction. This setup supports various types of analysis, such as profitability assessments, production scheduling, and resource categorization.

    🔑 Key Features Block_ID: Unique identifier for each block in the model.

    Spatial Coordinates (X, Y, Z): 3D location data representing the layout of the deposit.

    Rock Type: Geological classification of each block (e.g., Hematite, Magnetite, Waste).

    Ore Grade (%): Iron content percentage for ore-bearing blocks; set to 0% for waste.

    Tonnage (tonnes): Total mass of the block, used in volume and value calculations.

    Ore Value (¥/tonne): Estimated revenue based on grade and market assumptions.

    Mining Cost (¥): Estimated extraction cost per block.

    Processing Cost (¥): Cost associated with refining ore-bearing blocks.

    Waste Flag: Indicates whether a block is classified as waste material (1 = Waste, 0 = Ore).

    Profit (¥): Net value after subtracting mining and processing costs from potential revenue.

    Target: Label indicating whether a block is economically profitable (1 = Yes, 0 = No).

    This dataset is ideal for applications related to mineral resource evaluation, production planning, and profitability analysis. It can also be used for teaching and demonstration purposes in mining engineering and resource management contexts.

  17. Turkish TV Series

    • kaggle.com
    zip
    Updated Dec 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Furkan Erhan (2022). Turkish TV Series [Dataset]. https://www.kaggle.com/datasets/furkanerhan/turkish-tv-series
    Explore at:
    zip(162949 bytes)Available download formats
    Dataset updated
    Dec 19, 2022
    Authors
    Furkan Erhan
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    This dataset created for using in IZU YAM433 - Data Mining Course Project. You must read the describtions for columns to understand the data

  18. Hospital Database Management System SQL Project

    • kaggle.com
    zip
    Updated May 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Dolcimascolo-Garrett (2024). Hospital Database Management System SQL Project [Dataset]. https://www.kaggle.com/datasets/andrewdolcigarrett/hospital-database-management-system-sql-project
    Explore at:
    zip(1487278 bytes)Available download formats
    Dataset updated
    May 9, 2024
    Authors
    Andrew Dolcimascolo-Garrett
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Andrew Dolcimascolo-Garrett

    Released under MIT

    Contents

  19. Titanic Datamining project Yousef

    • kaggle.com
    zip
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr THABIT FURSAN (2023). Titanic Datamining project Yousef [Dataset]. https://www.kaggle.com/datasets/drthabitfursan/titanic-datamining-project-yousefib
    Explore at:
    zip(22544 bytes)Available download formats
    Dataset updated
    Dec 19, 2023
    Authors
    Dr THABIT FURSAN
    Description

    Dataset

    This dataset was created by Dr THABIT FURSAN

    Contents

  20. Pattern Mining project

    • kaggle.com
    zip
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zahid Ali (2021). Pattern Mining project [Dataset]. https://www.kaggle.com/datasets/zahidmahar/pattern-mining-project/code
    Explore at:
    zip(621097 bytes)Available download formats
    Dataset updated
    Mar 9, 2021
    Authors
    Zahid Ali
    Description

    Context

    Sequential pattern mining is the discovery of subsequences that are frequent in a set of sequences. The process is similar to the frequent itemset mining1 except that the input database is ordered. As the output of a sequential pattern mining algorithm, it generates a set of frequent sequential patterns, which are sub-sequences that have a frequency in the database greater than or equal to the user-specified minimum support. Let the data set shown in Table 1 where events are accompanied by instants of occurrence in each tuple. https://pasteboard.co/JRNB4rH.png" alt="Image of table">

    We can note that, for a fixed threshold equal to 1, the pattern < A, B, C > is considered as frequent because its support (the number of occurrences in the database) is equal to 2.

    Content

    Problematic and Goal:

    Let us assume the example given in Table 1. < A, B, C > is considered a frequent sequential pattern. It shows that events A, B, and C occurred frequently in a sequence manner, but without providing any additional information about the gap between them. For instance, we do not know when B would happen, knowing that A already did. Therefore, we ask you to provide a richer pattern where time constraints are considered. In our data set example, we can deduce that A, B, and C occur sequentially, and that B occurs after A at least after one instant and at most after 5 instants, while C occurs after B in the interval [2, 4] of instants. We represent our pattern as A[1,5]B and B[2,4]C. It is a direct graph where nodes are events and vertices are the instant intervals, denoted by time constraints as shown in Figure 1. https://pasteboard.co/JRNBWWL.png" alt="Image">

    Formally, Definition (Event) An event is a couple (e,t) where e ϵ Ε is the type of the event and t ϵ Τ is its time. Definition (Sequence) Let E be a set of event types and T a time domain such that T ⊆ R. E is assumed totally ordered and is denoted #

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mark Dobres (2020). Data Mining Project Dataset [Dataset]. https://www.kaggle.com/markdobres/data-mining-project-dataset
Organization logo

Data from: Data Mining Project Dataset

Related Article
Explore at:
zip(1552418617 bytes)Available download formats
Dataset updated
Dec 10, 2020
Authors
Mark Dobres
Description

Dataset

This dataset was created by Mark Dobres

Contents

Search
Clear search
Close search
Google apps
Main menu