97 datasets found

Data from: Data Mining Project
kaggle.com
zip
Updated Nov 30, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oscar NG (2018). Data Mining Project [Dataset]. https://www.kaggle.com/oscar321a/data-mining-project
Explore at:
zip(8083512 bytes)Available download formats
Dataset updated
Nov 30, 2018
Authors
Oscar NG
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Oscar NG

Released under CC0: Public Domain

Contents
Data from: Data Mining Project Dataset
kaggle.com
zip
Updated Dec 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Dobres (2020). Data Mining Project Dataset [Dataset]. https://www.kaggle.com/markdobres/data-mining-project-dataset
Explore at:
zip(1552418617 bytes)Available download formats
Dataset updated
Dec 10, 2020
Authors
Mark Dobres
Description
Dataset

This dataset was created by Mark Dobres

Contents
u
Data from: The use of project portfolios in effective strategy execution to...
researchdata.up.ac.za
zip
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palesa Agnes Ramashala (2023). The use of project portfolios in effective strategy execution to improve business value [Dataset]. http://doi.org/10.25403/UPresearchdata.13280141.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25403/UPresearchdata.13280141.v3
Dataset updated
May 31, 2023
Dataset provided by
University of Pretoria
Authors
Palesa Agnes Ramashala
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Qualitative data gathered from interviews that were conducted with case organisations. The data is analysed using a qualitative data analysis tool (AtlasTi) to code and generate network diagrams. Software such as Atlas.ti 8 Windows will be a great advantage to use in order to view these results. Interviews were conducted with four case organisations. The details of the responses from the respondents from case organisations are captured. The data gathered during the interview sessions is captured in a tabular form and graphs were also created to identify trends. Also in this study is desktop review of the case organisations that formed part of the study. The desktop study was done using published annual reports over a period of more than seven years. The analysis was done given the scope of the project and its constructs.
Data from: A large-scale comparative analysis of Coding Standard conformance...
figshare.com
application/x-gzip
Updated Oct 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa (2021). A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects [Dataset]. http://doi.org/10.6084/m9.figshare.12377237.v3
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12377237.v3
Dataset updated
Oct 4, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study investigates the extent to which data science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.results.tar.gz: Extracted data for each project, including raw logs of all detected code violations.notebooks_out.tar.gz: Tables and figures generated by notebooks.source_code_anonymized.tar.gz: Anonymized source code (at time of publication) to identify, clone, and analyse the projects. Also includes Jupyter notebooks used to produce figures in the paper.The latest source code can be found at: https://github.com/a2i2/mining-data-science-repositoriesPublished in ESEM 2020: https://doi.org/10.1145/3382494.3410680Preprint: https://arxiv.org/abs/2007.08978
d
Data-Mining-Final-Project-Data
search.dataone.org
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anderson, Ty Julian (2024). Data-Mining-Final-Project-Data [Dataset]. http://doi.org/10.7910/DVN/8ETVW9
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/8ETVW9
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Anderson, Ty Julian
Description
Financial News Headlines. Visit https://dataone.org/datasets/sha256%3Ade01b1cf5318d53f0296b475ff28734d90acd6240a76f1eee1df39fefda07ef0 for complete metadata about this dataset.
Data Mining Project 1
kaggle.com
zip
Updated Jan 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Will Newt (2024). Data Mining Project 1 [Dataset]. https://www.kaggle.com/datasets/willnewt/data-mining-project-1/data
Explore at:
zip(6058765 bytes)Available download formats
Dataset updated
Jan 29, 2024
Authors
Will Newt
Description
Dataset

This dataset was created by Will Newt

Contents
R
Data Mining Dataset
universe.roboflow.com
zip
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ilham project (2023). Data Mining Dataset [Dataset]. https://universe.roboflow.com/ilham-project/data-mining-n52lu/model/1
Explore at:
zipAvailable download formats
Dataset updated
Aug 4, 2023
Dataset authored and provided by
ilham project
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Uangrupiah Bounding Boxes
Description
Data Mining

## Overview Data Mining is a dataset for object detection tasks - it contains Uangrupiah annotations for 692 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
R
Data Mining Kel 11 Dataset
universe.roboflow.com
zip
Updated Oct 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Mining (2025). Data Mining Kel 11 Dataset [Dataset]. https://universe.roboflow.com/data-mining-mtwls/data-mining-kel-11-zp4xe
Explore at:
zipAvailable download formats
Dataset updated
Oct 29, 2025
Dataset authored and provided by
Data Mining
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Beras
Description
Data Mining Kel 11

## Overview Data Mining Kel 11 is a dataset for classification tasks - it contains Beras annotations for 59,785 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Data from: Enhancing the Human Health Status Prediction: The ATHLOS Project
tandf.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos (2023). Enhancing the Human Health Status Prediction: The ATHLOS Project [Dataset]. http://doi.org/10.6084/m9.figshare.14798079.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14798079.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Preventive healthcare is a crucial pillar of health as it contributes to staying healthy and having immediate treatment when needed. Mining knowledge from longitudinal studies has the potential to significantly contribute to the improvement of preventive healthcare. Unfortunately, data originated from such studies are characterized by high complexity, huge volume, and a plethora of missing values. Machine Learning, Data Mining and Data Imputation models are utilized a part of solving these challenges, respectively. Toward this direction, we focus on the development of a complete methodology for the ATHLOS Project – funded by the European Union’s Horizon 2020 Research and Innovation Program, which aims to achieve a better interpretation of the impact of aging on health. The inherent complexity of the provided dataset lies in the fact that the project includes 15 independent European and international longitudinal studies of aging. In this work, we mainly focus on the HealthStatus (HS) score, an index that estimates the human status of health, aiming to examine the effect of various data imputation models to the prediction power of classification and regression models. Our results are promising, indicating the critical importance of data imputation in enhancing preventive medicine’s crucial role.
Locations and numbers of past producing metal and coal mining projects
catalog.data.gov
s.cnmilf.com
Updated Aug 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2022). Locations and numbers of past producing metal and coal mining projects [Dataset]. https://catalog.data.gov/dataset/locations-and-numbers-of-past-producing-metal-and-coal-mining-projects
Explore at:
Dataset updated
Aug 14, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Locations and numbers of past producing metal and coal mining projects in NW US and Canada. This dataset is associated with the following publication: Sergeant, C., E. Sexton, J. Moore, A. Westwood, S. Nagorski, J. Ebersole, D.M. Chambers, S.L. O'Neal, R.L. Malison, R. Hauer, D.C. Whited, J. Weitz, J. Caldwell, M. Capito, M. Connor, C.A. Frissell, G. Knox, E.D. Lowery, R. Macnair, V. Marlatt, J. McIntyre, M.V. McPhee, and N. Skuce. Risks of mining to salmonid-bearing watersheds. Science Advances. American Association for the Advancement of Science (AAAS), Washington, DC, USA, 8(26): eabn0929, (2022).
Data from: Data mining Project
kaggle.com
zip
Updated May 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuxian Chen (2022). Data mining Project [Dataset]. https://www.kaggle.com/datasets/cyanlu/data-mining-project
Explore at:
zip(165846374 bytes)Available download formats
Dataset updated
May 27, 2022
Authors
Yuxian Chen
Description
Dataset

This dataset was created by Yuxian Chen

Contents
m
GitHub training and test data-sets
data.mendeley.com
Updated Jul 31, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Youcef Bouziane (2019). GitHub training and test data-sets [Dataset]. http://doi.org/10.17632/gt3f4jnbvn.1
Explore at:
Unique identifier
https://doi.org/10.17632/gt3f4jnbvn.1
Dataset updated
Jul 31, 2019
Authors
Youcef Bouziane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the SQL tables of the training and test datasets used in our experimentation. These tables contain the preprocessed textual data (in a form of tokens) extracted from each training and test project. Besides the preprocessed textual data, this dataset also contains meta-data about the projects, GitHub topics, and GitHub collections. The GitHub projects are identified by the tuple “Owner” and “Name”. The descriptions of the table fields are attached to their respective data descriptions.
A&I - Safety Programs - Data Mining Tool
data.virginia.gov
data.transportation.gov
+5more
html
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S Department of Transportation (2024). A&I - Safety Programs - Data Mining Tool [Dataset]. https://data.virginia.gov/dataset/ai-safety-programs-data-mining-tool
Explore at:
htmlAvailable download formats
Dataset updated
May 24, 2024
Dataset provided by
Federal Motor Carrier Safety Administrationhttps://www.fmcsa.dot.gov/
Authors
U.S Department of Transportation
Description
This area of the website provides information on three of the safety programs established by FMCSA to support this mission. The three programs covered by this area include reviews, roadside inspections of commercial vehicles and drivers, and traffic enforcement stops of CMVs operating in an unsafe manner. Each program is implemented in conjunction with the states and devoted to improving motor carrier safety by reducing the number and severity of crashes involving large trucks and buses.
d
Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry...
catalog.data.gov
data.openei.org
+3more
Updated Nov 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davis Energy (2023). Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry Climate [Dataset]. https://catalog.data.gov/dataset/community-scale-attic-retrofit-and-home-energy-upgrade-data-mining-hot-dry-climate
Explore at:
Dataset updated
Nov 2, 2023
Dataset provided by
Davis Energy
Description
Retrofitting is an essential element of any comprehensive strategy for improving residential energy efficiency. The residential retrofit market is still developing, and program managers must develop innovative strategies to increase uptake and promote economies of scale. Residential retrofitting remains a challenging proposition to sell to homeowners, because awareness levels are low and financial incentives are lacking. The U.S. Department of Energy's Building America research team, Alliance for Residential Building Innovation (ARBI), implemented a project to increase residential retrofits in Davis, California. The project used a neighborhood-focused strategy for implementation and a low-cost retrofit program that focused on upgraded attic insulation and duct sealing. ARBI worked with a community partner, the not-for-profit Cool Davis Initiative, as well as selected area contractors to implement a strategy that sought to capitalize on the strong local expertise of partners and the unique aspects of the Davis, California, community. Working with community partners also allowed ARBI to collect and analyze data about effective messaging tactics for community-based retrofit programs. ARBI expected this project, called Retrofit Your Attic, to achieve higher uptake than other retrofit projects, because it emphasized a low-cost, one-measure retrofit program. However, this was not the case. The program used a strategy that focused on attics-including air sealing, duct sealing, and attic insulation-as a low-cost entry for homeowners to complete home retrofits. The price was kept below $4,000 after incentives; both contractors in the program offered the same price. The program completed only five retrofits. Interestingly, none of those homeowners used the one-measure strategy. All five homeowners were concerned about cost, comfort, and energy savings and included additional measures in their retrofits. The low-cost, one-measure strategy did not increase the uptake among homeowners, even in a well-educated, affluent community such as Davis. This project has two primary components. One is to complete attic retrofits on a community scale in the hot-dry climate on Davis, CA. Sufficient data will be collected on these projects to include them in the BAFDR. Additionally, ARBI is working with contractors to obtain building and utility data from a large set of retrofit projects in CA (hot-dry). These projects are to be uploaded into the BAFDR.
s
Digital Data Analytics, Public Engagement and the Social Life of Methods
orda.shef.ac.uk
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helen Kennedy; Giles Moss; Stylianos Moshanas; Chris Birchall (2023). Digital Data Analytics, Public Engagement and the Social Life of Methods [Dataset]. http://doi.org/10.15131/shef.data.5194993.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.5194993.v1
Dataset updated
May 30, 2023
Dataset provided by
The University of Sheffield
Authors
Helen Kennedy; Giles Moss; Stylianos Moshanas; Chris Birchall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Interview and workshop transcripts from EPSRC Digital Transformations Communities and Cultures Network + (http://www.communitiesandculture.org/) project Digital Data Analytics, Public Engagement and the Social Life of Methods (http://www.communitiesandculture.org/projects/digital-data-analysis/). Methodology described in papers available at the above link.
Beginner Data Mining Datasets
kaggle.com
zip
Updated May 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
verdecali (2022). Beginner Data Mining Datasets [Dataset]. https://www.kaggle.com/datasets/verdecali/beginner-data-mining-datasets
Explore at:
zip(1672021 bytes)Available download formats
Dataset updated
May 28, 2022
Authors
verdecali
Description
These are artificially made beginner data mining datasets for learning purposes.

Case study:

FEELS LIKE HOME is an interior design company, which has about 100 000 registered customers and provide services for more than 200 000 clients annually.

The range of the products can be divided in 5 major classes: Decor accessories, Furniture, Textiles, Lighting and Art with an option to purchase Limited Edition versions for an extra charge. These goods can be distributed by 3 channels: Physical stores, yearly catalogs and the companies’ website.

FEELS LIKE HOME has been doing a great job during recent years, achieving decent profits and revenues, but the future remains volatile. In order to solve the problem of instability the company is planning to launch new marketing program, especially to improve the accuracy of marketing campaigns.

The aim of FeelsLikeHome_Campaign dataset is to create project is in which you build a predictive model (using a sample of 2500 clients’ data) forecasting the highest profit from the next marketing campaign, which will indicate the customers who will be the most likely to accept the offer.

The aim of FeelsLikeHome_Cluster dataset is to create project in which you split company’s customer base on homogenous clusters (using 5000 clients’ data) and propose draft marketing strategies for these groups based on customer behavior and information about their profile.

FeelsLikeHome_Score dataset can be used to calculate total profit from marketing campaign and for producing a list of sorted customers by the probability of the dependent variable in predictive model problem.
project.json
figshare.com
txt
Updated Oct 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sophia Tintori (2019). project.json [Dataset]. http://doi.org/10.6084/m9.figshare.9933785.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9933785.v1
Dataset updated
Oct 3, 2019
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Sophia Tintori
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Configuration file for DrEdGE website
e
Africa - PowerMining Projects Database
energydata.info
cloud.csiss.gmu.edu
Updated Jul 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Africa - PowerMining Projects Database [Dataset]. https://energydata.info/dataset/africa-powermining-projects-database-2014
Explore at:
Dataset updated
Jul 23, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
"The Africa Power–Mining Database 2014 shows ongoing and forthcoming mining projects in Africa categorized by the type of mineral, ore grade, size of the project. The database draws on basic mining data from Infomine surveys, the United States Geological Survey, annual reports, technical reports, feasibility studies, investor presentations, sustainability reports on property-owner websites or filed in public domains, and mining websites (Mining Weekly, Mining Journal, Mbendi, Mining-technology, and Miningmx). Comprising 455 projects in 28 SSA countries with each project’s ore reserve value assessed at more than $250 million, the database collates publicly available and proprietary information. It also provides a panoramic view of projects operating in 2000–12 and anticipated demand in 2020. The analysis is presented over three timeframes: pre-2000, 2001–12, and 2020 (each containing the projects from the previous period except for those closing during that previous period)."
Z
Meta-study water and mining conflicts
data.niaid.nih.gov
zenodo.org
Updated Feb 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schoderer, Mirja; Ott, Marlen (2023). Meta-study water and mining conflicts [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5151474
Explore at:
Dataset updated
Feb 17, 2023
Dataset provided by
Philipps-Universität Marburg
Deutsches Institut für Entwicklungspolitik
Authors
Schoderer, Mirja; Ott, Marlen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises the raw data and R Script for the following published article: Schoderer, M., & Ott, M. (2022). Contested water-and miningscapes–Explaining the high intensity of water and mining conflicts in a meta-study. World Development, 154, 105888. The article seeks to better understand the dynamics of mining and water conflicts, specifically under which (combinations of) conditions environmental defenders step outside the legal framework in their contestation of mining projects, according to existing case study-based research. More information on the methodology is available in the paper.

The file Water and mining conflicts full dataset includes the qualitative information extracted from published articles, the scoring scheme and the normalized scores used in the R analysis. The R Script QCA_Preventive water and mining conflicts describes the fuzzy-set, two-step Qualitative Comparative Analysis conduct to understand under which conditions environmental defenders choose non-legal means in conflicts that occur in the planning or licensing stage of a mining project The CSV file Normalized scores_preventive is the raw data used in the R Script QCA_Preventive water and mining conflicts The R Script QCA_Reactive water and mining conflicts describes the fuzzy-set, two-step Qualitative Comparative Analysis conduct to understand under which conditions environmental defenders choose non-legal means in conflicts that occur when the mining project is already in operation The CSV file Normalized scores_reactive is the raw data used in the R Script QCA_Reactive water and mining conflicts
R
Geese Counting Project #2 Dataset
universe.roboflow.com
zip
Updated Nov 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Mining (2025). Geese Counting Project #2 Dataset [Dataset]. https://universe.roboflow.com/data-mining-2qphn/geese-counting-project-2-fipqg/model/3
Explore at:
zipAvailable download formats
Dataset updated
Nov 13, 2025
Dataset authored and provided by
Data Mining
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Geese Bounding Boxes
Description
Geese Counting Project #2

## Overview Geese Counting Project #2 is a dataset for object detection tasks - it contains Geese annotations for 647 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).