52 datasets found

Data from: Data Mining Project Dataset
kaggle.com
zip
Updated Dec 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Dobres (2020). Data Mining Project Dataset [Dataset]. https://www.kaggle.com/markdobres/data-mining-project-dataset
Explore at:
zip(1552418617 bytes)Available download formats
Dataset updated
Dec 10, 2020
Authors
Mark Dobres
Description
Dataset

This dataset was created by Mark Dobres

Contents
Data from: Data Mining Project
kaggle.com
zip
Updated Nov 30, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oscar NG (2018). Data Mining Project [Dataset]. https://www.kaggle.com/oscar321a/data-mining-project
Explore at:
zip(8083512 bytes)Available download formats
Dataset updated
Nov 30, 2018
Authors
Oscar NG
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Oscar NG

Released under CC0: Public Domain

Contents
Data from: Data mining Project
kaggle.com
zip
Updated May 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuxian Chen (2022). Data mining Project [Dataset]. https://www.kaggle.com/datasets/cyanlu/data-mining-project
Explore at:
zip(165846374 bytes)Available download formats
Dataset updated
May 27, 2022
Authors
Yuxian Chen
Description
Dataset

This dataset was created by Yuxian Chen

Contents
Data Mining Project 1
kaggle.com
zip
Updated Jan 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Will Newt (2024). Data Mining Project 1 [Dataset]. https://www.kaggle.com/datasets/willnewt/data-mining-project-1/data
Explore at:
zip(6058765 bytes)Available download formats
Dataset updated
Jan 29, 2024
Authors
Will Newt
Description
Dataset

This dataset was created by Will Newt

Contents
Data Mining Project - Boston
kaggle.com
zip
Updated Nov 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SophieLiu (2019). Data Mining Project - Boston [Dataset]. https://www.kaggle.com/sliu65/data-mining-project-boston
Explore at:
zip(59313797 bytes)Available download formats
Dataset updated
Nov 25, 2019
Authors
SophieLiu
Area covered
Boston
Description
Context

To make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.

Use of Data Files

You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:

This loads the file into R

df<-read.csv('uber.csv')

The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

df_black<-subset(uber_df, uber_df$name == 'Black')

This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

write.csv(df_black, "nameofthefileyouwanttosaveas.csv")

The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

getwd()

The output will be the file path to your working directory. You will find the file you just created in that folder.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
DATA MINING
kaggle.com
zip
Updated Dec 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chimaralavamshireddy (2021). DATA MINING [Dataset]. https://www.kaggle.com/chimaralavamshireddy/data-mining
Explore at:
zip(901512 bytes)Available download formats
Dataset updated
Dec 3, 2021
Authors
chimaralavamshireddy
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
Dataset

This dataset was created by chimaralavamshireddy

Released under U.S. Government Works

Contents
Data Mining Project 1 Sapfile
kaggle.com
zip
Updated Jan 31, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prutchakorn (2019). Data Mining Project 1 Sapfile [Dataset]. https://www.kaggle.com/prutchakorn/data-mining-project-1-sapfile
Explore at:
zip(2244 bytes)Available download formats
Dataset updated
Jan 31, 2019
Authors
Prutchakorn
Description
Dataset

This dataset was created by Prutchakorn

Contents
IGDB Dataset for Data Mining Projects
kaggle.com
zip
Updated Jul 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emir Şahin (2025). IGDB Dataset for Data Mining Projects [Dataset]. https://www.kaggle.com/datasets/emirshn/igdb-dataset-for-data-mining-projects
Explore at:
zip(56776900 bytes)Available download formats
Dataset updated
Jul 26, 2025
Authors
Emir Şahin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains detailed metadata for over 240,000 video games sourced from the IGDB API. It includes information about each game's release, genres, themes, platforms, developers, publishers, player perspectives, game modes, ratings, summaries, media assets (screenshots, artworks, covers), and more. This dataset is ideal for projects in game recommendation, clustering, tagging, genre analysis, and player preference modeling.
Airbnb Berlin 2020
kaggle.com
zip
Updated Sep 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MrRaghav (2020). Airbnb Berlin 2020 [Dataset]. https://www.kaggle.com/raghavs1003/airbnb-berlin-2020
Explore at:
zip(112994067 bytes)Available download formats
Dataset updated
Sep 22, 2020
Authors
MrRaghav
Area covered
Berlin
Description
Acknowledgements

http://insideairbnb.com/get-the-data.html

Inspiration

A. Is there seasonality in the prices of properties listed in Airbnb-Berlin? B. Which are the popular areas of Berlin among the tourists? C. An analysis of reviews – using text mining D. Which are the most commonly available amenities in the properties of Berlin? E. Can we predict the price of properties in Berlin by analyzing other column values?
Applied Data Mining Final Project - T- Shirt Data
kaggle.com
zip
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danizo (2025). Applied Data Mining Final Project - T- Shirt Data [Dataset]. https://www.kaggle.com/datasets/danizo/applied-data-mining-final-project-t-shirt-data/code
Explore at:
zip(375026934 bytes)Available download formats
Dataset updated
Jan 7, 2025
Authors
Danizo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Danizo

Released under Apache 2.0

Contents
Human Activity Recognition Accelerometer data
kaggle.com
zip
Updated Mar 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandy3108 (2022). Human Activity Recognition Accelerometer data [Dataset]. https://www.kaggle.com/datasets/sandy3108/human-activity-recognition-accelerometer-data/suggestions?status=pending&yourSuggestions=true
Explore at:
zip(650483753 bytes)Available download formats
Dataset updated
Mar 2, 2022
Authors
Sandy3108
Description
Dataset

This dataset was created by Sandy3108

Contents
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Diabetes Dataset (Data mining project)
kaggle.com
zip
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Izaz Bhuiyan (2025). Diabetes Dataset (Data mining project) [Dataset]. https://www.kaggle.com/datasets/ahmedizazbhuiyan/diabetes-dataset-data-mining-project
Explore at:
zip(9128 bytes)Available download formats
Dataset updated
Nov 7, 2025
Authors
Ahmed Izaz Bhuiyan
Description
Dataset

This dataset was created by Ahmed Izaz Bhuiyan

Contents
Beginner Data Mining Datasets
kaggle.com
zip
Updated May 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
verdecali (2022). Beginner Data Mining Datasets [Dataset]. https://www.kaggle.com/datasets/verdecali/beginner-data-mining-datasets
Explore at:
zip(1672021 bytes)Available download formats
Dataset updated
May 28, 2022
Authors
verdecali
Description
These are artificially made beginner data mining datasets for learning purposes.

Case study:

FEELS LIKE HOME is an interior design company, which has about 100 000 registered customers and provide services for more than 200 000 clients annually.

The range of the products can be divided in 5 major classes: Decor accessories, Furniture, Textiles, Lighting and Art with an option to purchase Limited Edition versions for an extra charge. These goods can be distributed by 3 channels: Physical stores, yearly catalogs and the companies’ website.

FEELS LIKE HOME has been doing a great job during recent years, achieving decent profits and revenues, but the future remains volatile. In order to solve the problem of instability the company is planning to launch new marketing program, especially to improve the accuracy of marketing campaigns.

The aim of FeelsLikeHome_Campaign dataset is to create project is in which you build a predictive model (using a sample of 2500 clients’ data) forecasting the highest profit from the next marketing campaign, which will indicate the customers who will be the most likely to accept the offer.

The aim of FeelsLikeHome_Cluster dataset is to create project in which you split company’s customer base on homogenous clusters (using 5000 clients’ data) and propose draft marketing strategies for these groups based on customer behavior and information about their profile.

FeelsLikeHome_Score dataset can be used to calculate total profit from marketing campaign and for producing a list of sorted customers by the probability of the dependent variable in predictive model problem.
Web Mining for Collaborative Food Delivery
kaggle.com
zip
Updated Aug 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jocelyn Dumlao (2023). Web Mining for Collaborative Food Delivery [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/web-mining-for-collaborative-food-delivery
Explore at:
zip(396903 bytes)Available download formats
Dataset updated
Aug 26, 2023
Authors
Jocelyn Dumlao
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description

This is the main data set that was built for the work titled: "A Web Mining Approach to Collaborative Consumption of Food Delivery Services" which is the official institutional research project of Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz.

Categories

Urban Transportation, Consumer, e-Commerce Retail

Acknowledgements & Source

Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz

Data Source

View Details

Image Source

Please don't forget to upvote if you find this useful.
Open-Pit Mining Block Model Dataset
kaggle.com
zip
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziya (2025). Open-Pit Mining Block Model Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/open-pit-mining-block-model-dataset/data
Explore at:
zip(1812380 bytes)Available download formats
Dataset updated
Jul 15, 2025
Authors
Ziya
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is a generated representation of an open-pit mining block model, designed to reflect realistic geological, spatial, and economic conditions found in large-scale mineral extraction projects. It contains 75,000 individual blocks, each representing a unit of earth material with associated attributes that influence decision-making in mine planning and resource evaluation.

The dataset includes essential parameters such as ore grade, tonnage, economic value, and operational costs. A calculated profit value and a corresponding binary target label indicate whether a block is considered economically viable for extraction. This setup supports various types of analysis, such as profitability assessments, production scheduling, and resource categorization.

🔑 Key Features Block_ID: Unique identifier for each block in the model.

Spatial Coordinates (X, Y, Z): 3D location data representing the layout of the deposit.

Rock Type: Geological classification of each block (e.g., Hematite, Magnetite, Waste).

Ore Grade (%): Iron content percentage for ore-bearing blocks; set to 0% for waste.

Tonnage (tonnes): Total mass of the block, used in volume and value calculations.

Ore Value (¥/tonne): Estimated revenue based on grade and market assumptions.

Mining Cost (¥): Estimated extraction cost per block.

Processing Cost (¥): Cost associated with refining ore-bearing blocks.

Waste Flag: Indicates whether a block is classified as waste material (1 = Waste, 0 = Ore).

Profit (¥): Net value after subtracting mining and processing costs from potential revenue.

Target: Label indicating whether a block is economically profitable (1 = Yes, 0 = No).

This dataset is ideal for applications related to mineral resource evaluation, production planning, and profitability analysis. It can also be used for teaching and demonstration purposes in mining engineering and resource management contexts.
Turkish TV Series
kaggle.com
zip
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Furkan Erhan (2022). Turkish TV Series [Dataset]. https://www.kaggle.com/datasets/furkanerhan/turkish-tv-series
Explore at:
zip(162949 bytes)Available download formats
Dataset updated
Dec 19, 2022
Authors
Furkan Erhan
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
This dataset created for using in IZU YAM433 - Data Mining Course Project. You must read the describtions for columns to understand the data
Hospital Database Management System SQL Project
kaggle.com
zip
Updated May 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Dolcimascolo-Garrett (2024). Hospital Database Management System SQL Project [Dataset]. https://www.kaggle.com/datasets/andrewdolcigarrett/hospital-database-management-system-sql-project
Explore at:
zip(1487278 bytes)Available download formats
Dataset updated
May 9, 2024
Authors
Andrew Dolcimascolo-Garrett
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Andrew Dolcimascolo-Garrett

Released under MIT

Contents
Titanic Datamining project Yousef
kaggle.com
zip
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr THABIT FURSAN (2023). Titanic Datamining project Yousef [Dataset]. https://www.kaggle.com/datasets/drthabitfursan/titanic-datamining-project-yousefib
Explore at:
zip(22544 bytes)Available download formats
Dataset updated
Dec 19, 2023
Authors
Dr THABIT FURSAN
Description
Dataset

This dataset was created by Dr THABIT FURSAN

Contents
Pattern Mining project
kaggle.com
zip
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zahid Ali (2021). Pattern Mining project [Dataset]. https://www.kaggle.com/datasets/zahidmahar/pattern-mining-project/code
Explore at:
zip(621097 bytes)Available download formats
Dataset updated
Mar 9, 2021
Authors
Zahid Ali
Description
Context

Sequential pattern mining is the discovery of subsequences that are frequent in a set of sequences. The process is similar to the frequent itemset mining1 except that the input database is ordered. As the output of a sequential pattern mining algorithm, it generates a set of frequent sequential patterns, which are sub-sequences that have a frequency in the database greater than or equal to the user-specified minimum support. Let the data set shown in Table 1 where events are accompanied by instants of occurrence in each tuple. https://pasteboard.co/JRNB4rH.png" alt="Image of table">

We can note that, for a fixed threshold equal to 1, the pattern < A, B, C > is considered as frequent because its support (the number of occurrences in the database) is equal to 2.

Content

Problematic and Goal:

Let us assume the example given in Table 1. < A, B, C > is considered a frequent sequential pattern. It shows that events A, B, and C occurred frequently in a sequence manner, but without providing any additional information about the gap between them. For instance, we do not know when B would happen, knowing that A already did. Therefore, we ask you to provide a richer pattern where time constraints are considered. In our data set example, we can deduce that A, B, and C occur sequentially, and that B occurs after A at least after one instant and at most after 5 instants, while C occurs after B in the interval [2, 4] of instants. We represent our pattern as A[1,5]B and B[2,4]C. It is a direct graph where nodes are events and vertices are the instant intervals, denoted by time constraints as shown in Figure 1. https://pasteboard.co/JRNBWWL.png" alt="Image">

Formally, Definition (Event) An event is a couple (e,t) where e ϵ Ε is the type of the event and t ϵ Τ is its time. Definition (Sequence) Let E be a set of event types and T a time domain such that T ⊆ R. E is assumed totally ordered and is denoted #

Facebook

Twitter

Click to copy link

Link copied

Cite

Mark Dobres (2020). Data Mining Project Dataset [Dataset]. https://www.kaggle.com/markdobres/data-mining-project-dataset

Data from: Data Mining Project Dataset

Explore at:

zip(1552418617 bytes)Available download formats

Dataset updated

Dec 10, 2020

Authors

Mark Dobres

Description

Dataset

This dataset was created by Mark Dobres

Clear search

Close search

Google apps

Main menu

Data from: Data Mining Project Dataset

Dataset

Contents

Data from: Data Mining Project

Dataset

Contents

Data from: Data mining Project

Dataset

Contents

Data Mining Project 1

Dataset

Contents

Data Mining Project - Boston

Context

Use of Data Files

This loads the file into R

The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

The output will be the file path to your working directory. You will find the file you just created in that folder.

Inspiration

DATA MINING

Dataset

Contents

Data Mining Project 1 Sapfile

Dataset

Contents

IGDB Dataset for Data Mining Projects

Airbnb Berlin 2020

Acknowledgements

Inspiration

Applied Data Mining Final Project - T- Shirt Data

Dataset

Contents

Human Activity Recognition Accelerometer data

Dataset

Contents

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Diabetes Dataset (Data mining project)

Dataset

Contents

Beginner Data Mining Datasets

Web Mining for Collaborative Food Delivery

Description

Categories

Acknowledgements & Source

Please don't forget to upvote if you find this useful.

Open-Pit Mining Block Model Dataset

Turkish TV Series

Hospital Database Management System SQL Project

Dataset

Contents

Titanic Datamining project Yousef

Dataset

Contents

Pattern Mining project

Context

Content

Problematic and Goal:

Data from: Data Mining Project Dataset

Dataset

Contents