100+ datasets found

Data from: Data Mining Project
kaggle.com
Updated May 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khanh Vương (2022). Data Mining Project [Dataset]. https://www.kaggle.com/khanhvng/data-mining-project/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Khanh Vương
Description
Dataset

This dataset was created by Khanh Vương

Contents
d
Data-Mining-Final-Project-Data
search.dataone.org
dataverse.harvard.edu
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anderson, Ty Julian (2024). Data-Mining-Final-Project-Data [Dataset]. http://doi.org/10.7910/DVN/8ETVW9
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/8ETVW9
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Anderson, Ty Julian
Description
Financial News Headlines. Visit https://dataone.org/datasets/sha256%3Ade01b1cf5318d53f0296b475ff28734d90acd6240a76f1eee1df39fefda07ef0 for complete metadata about this dataset.
u
Data from: The use of project portfolios in effective strategy execution to...
researchdata.up.ac.za
zip
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palesa Agnes Ramashala (2023). The use of project portfolios in effective strategy execution to improve business value [Dataset]. http://doi.org/10.25403/UPresearchdata.13280141.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25403/UPresearchdata.13280141.v3
Dataset updated
May 31, 2023
Dataset provided by
University of Pretoria
Authors
Palesa Agnes Ramashala
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Qualitative data gathered from interviews that were conducted with case organisations. The data is analysed using a qualitative data analysis tool (AtlasTi) to code and generate network diagrams. Software such as Atlas.ti 8 Windows will be a great advantage to use in order to view these results. Interviews were conducted with four case organisations. The details of the responses from the respondents from case organisations are captured. The data gathered during the interview sessions is captured in a tabular form and graphs were also created to identify trends. Also in this study is desktop review of the case organisations that formed part of the study. The desktop study was done using published annual reports over a period of more than seven years. The analysis was done given the scope of the project and its constructs.
s
Data and source code for "Automating Intention Mining"
researchdata.smu.edu.sg
zip
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiao HUANG; Xin XIA; David LO; Gail C. MURPHY (2023). Data and source code for "Automating Intention Mining" [Dataset]. http://doi.org/10.25440/smu.21261408.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25440/smu.21261408.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
Qiao HUANG; Xin XIA; David LO; Gail C. MURPHY
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
The dataset and source code for paper "Automating Intention Mining".

The code is based on dennybritz's implementation of Yoon Kim's paper Convolutional Neural Networks for Sentence Classification.

By default, the code uses Tensorflow 0.12. Some errors might be reported when using other versions of Tensorflow due to the incompatibility of some APIs.

Running 'online_prediction.py', you can input any sentence and check the classification result produced by a pre-trained CNN model. The model uses all sentences of the four Github projects as training data.

Running 'play.py', you can get the evaluation result of cross-project prediction. Please check the code for more details of the configuration. By default, it will use the four Github projects as training data to predict the sentences in DECA dataset, and in this setting, the category 'aspect evaluation' and 'others' are dropped since DECA dataset does not contain these two categories.
ghtorrent-projects Dataset
zenodo.org
data.niaid.nih.gov
bin, txt
Updated Jul 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marios Papachristou; Marios Papachristou (2021). ghtorrent-projects Dataset [Dataset]. http://doi.org/10.5281/zenodo.5111043
Explore at:
txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5111043
Dataset updated
Jul 17, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marios Papachristou; Marios Papachristou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A hypergraph dataset mined from the GHTorrent project is presented. The dataset contains two files

1. project_members.txt: Contains GitHub projects with at least 2 contributors and the corresponding contributors (as a hyperedge). The format of the data is:

2. num_followers.txt: Contains all GitHub users and their number of followers.

The artifact also contains the SQL queries used to obtain the data from GHTorrent (schema).
m
Educational Attainment in North Carolina Public Schools: Use of statistical...
data.mendeley.com
Updated Nov 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
Explore at:
Unique identifier
https://doi.org/10.17632/6cm9wyd5g5.1
Dataset updated
Nov 14, 2018
Authors
Scott Herford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.
f
Data from: Enhancing the Human Health Status Prediction: The ATHLOS Project
tandf.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos (2023). Enhancing the Human Health Status Prediction: The ATHLOS Project [Dataset]. http://doi.org/10.6084/m9.figshare.14798079.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14798079.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Taylor & Francis
Authors
P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Preventive healthcare is a crucial pillar of health as it contributes to staying healthy and having immediate treatment when needed. Mining knowledge from longitudinal studies has the potential to significantly contribute to the improvement of preventive healthcare. Unfortunately, data originated from such studies are characterized by high complexity, huge volume, and a plethora of missing values. Machine Learning, Data Mining and Data Imputation models are utilized a part of solving these challenges, respectively. Toward this direction, we focus on the development of a complete methodology for the ATHLOS Project – funded by the European Union’s Horizon 2020 Research and Innovation Program, which aims to achieve a better interpretation of the impact of aging on health. The inherent complexity of the provided dataset lies in the fact that the project includes 15 independent European and international longitudinal studies of aging. In this work, we mainly focus on the HealthStatus (HS) score, an index that estimates the human status of health, aiming to examine the effect of various data imputation models to the prediction power of classification and regression models. Our results are promising, indicating the critical importance of data imputation in enhancing preventive medicine’s crucial role.
d
Data from: DATA MINING THE GALAXY ZOO MERGERS
catalog.data.gov
gimi9.com
+3more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). DATA MINING THE GALAXY ZOO MERGERS [Dataset]. https://catalog.data.gov/dataset/data-mining-the-galaxy-zoo-mergers
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
DATA MINING THE GALAXY ZOO MERGERS STEVEN BAEHR, ARUN VEDACHALAM, KIRK BORNE, AND DANIEL SPONSELLER Abstract. Collisions between pairs of galaxies usually end in the coalescence (merger) of the two galaxies. Collisions and mergers are rare phenomena, yet they may signal the ultimate fate of most galaxies, including our own Milky Way. With the onset of massive collection of astronomical data, a computerized and automated method will be necessary for identifying those colliding galaxies worthy of more detailed study. This project researches methods to accomplish that goal. Astronomical data from the Sloan Digital Sky Survey (SDSS) and human-provided classifications on merger status from the Galaxy Zoo project are combined and processed with machine learning algorithms. The goal is to determine indicators of merger status based solely on discovering those automated pipeline-generated attributes in the astronomical database that correlate most strongly with the patterns identified through visual inspection by the Galaxy Zoo volunteers. In the end, we aim to provide a new and improved automated procedure for classification of collisions and mergers in future petascale astronomical sky surveys. Both information gain analysis (via the C4.5 decision tree algorithm) and cluster analysis (via the Davies-Bouldin Index) are explored as techniques for finding the strongest correlations between human-identified patterns and existing database attributes. Galaxy attributes measured in the SDSS green waveband images are found to represent the most influential of the attributes for correct classification of collisions and mergers. Only a nominal information gain is noted in this research, however, there is a clear indication of which attributes contribute so that a direction for further study is apparent.
O
Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry...
data.openei.org
datasets.ai
+2more
data, image_document
Updated Apr 27, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jenifer Jackson; M Berman; P Smith; Jenifer Jackson; M Berman; P Smith (2016). Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry Climate [Dataset]. http://doi.org/10.25984/2204258
Explore at:
data, image_documentAvailable download formats
Unique identifier
https://doi.org/10.25984/2204258
Dataset updated
Apr 27, 2016
Dataset provided by
Davis Energy
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
Open Energy Data Initiative (OEDI)
Authors
Jenifer Jackson; M Berman; P Smith; Jenifer Jackson; M Berman; P Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Retrofitting is an essential element of any comprehensive strategy for improving residential energy efficiency. The residential retrofit market is still developing, and program managers must develop innovative strategies to increase uptake and promote economies of scale. Residential retrofitting remains a challenging proposition to sell to homeowners, because awareness levels are low and financial incentives are lacking.

The U.S. Department of Energy's Building America research team, Alliance for Residential Building Innovation (ARBI), implemented a project to increase residential retrofits in Davis, California. The project used a neighborhood-focused strategy for implementation and a low-cost retrofit program that focused on upgraded attic insulation and duct sealing. ARBI worked with a community partner, the not-for-profit Cool Davis Initiative, as well as selected area contractors to implement a strategy that sought to capitalize on the strong local expertise of partners and the unique aspects of the Davis, California, community. Working with community partners also allowed ARBI to collect and analyze data about effective messaging tactics for community-based retrofit programs.

ARBI expected this project, called Retrofit Your Attic, to achieve higher uptake than other retrofit projects, because it emphasized a low-cost, one-measure retrofit program. However, this was not the case. The program used a strategy that focused on attics-including air sealing, duct sealing, and attic insulation-as a low-cost entry for homeowners to complete home retrofits. The price was kept below $4,000 after incentives; both contractors in the program offered the same price. The program completed only five retrofits. Interestingly, none of those homeowners used the one-measure strategy. All five homeowners were concerned about cost, comfort, and energy savings and included additional measures in their retrofits. The low-cost, one-measure strategy did not increase the uptake among homeowners, even in a well-educated, affluent community such as Davis.

This project has two primary components. One is to complete attic retrofits on a community scale in the hot-dry climate on Davis, CA. Sufficient data will be collected on these projects to include them in the BAFDR. Additionally, ARBI is working with contractors to obtain building and utility data from a large set of retrofit projects in CA (hot-dry). These projects are to be uploaded into the BAFDR.
Empirical data for project: Mining user reviews of COVID contact-tracing...
figshare.com
png
Updated Sep 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vahid Garousi (2020). Empirical data for project: Mining user reviews of COVID contact-tracing apps [Dataset]. http://doi.org/10.6084/m9.figshare.13010402.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13010402.v1
Dataset updated
Sep 25, 2020
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Vahid Garousi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Empirical data for project: Mining user reviews of COVID contact-tracing apps
Hospital Database Management System SQL Project
kaggle.com
Updated May 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Dolcimascolo-Garrett (2024). Hospital Database Management System SQL Project [Dataset]. https://www.kaggle.com/datasets/andrewdolcigarrett/hospital-database-management-system-sql-project/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Andrew Dolcimascolo-Garrett
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Andrew Dolcimascolo-Garrett

Released under MIT

Contents
Data from: A large-scale comparative analysis of Coding Standard conformance...
figshare.com
application/x-gzip
Updated Oct 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa (2021). A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects [Dataset]. http://doi.org/10.6084/m9.figshare.12377237.v3
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12377237.v3
Dataset updated
Oct 4, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study investigates the extent to which data science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.results.tar.gz: Extracted data for each project, including raw logs of all detected code violations.notebooks_out.tar.gz: Tables and figures generated by notebooks.source_code_anonymized.tar.gz: Anonymized source code (at time of publication) to identify, clone, and analyse the projects. Also includes Jupyter notebooks used to produce figures in the paper.The latest source code can be found at: https://github.com/a2i2/mining-data-science-repositoriesPublished in ESEM 2020: https://doi.org/10.1145/3382494.3410680Preprint: https://arxiv.org/abs/2007.08978
Knowledge Graph: tyrolean mining documents 15th and 16th century
zenodo.org
data.niaid.nih.gov
bin
Updated Sep 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine (2024). Knowledge Graph: tyrolean mining documents 15th and 16th century [Dataset]. http://doi.org/10.5281/zenodo.6276586
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6276586
Dataset updated
Sep 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains a Knowledge Graph (.nq file) of two historical mining documents: “Verleihbuch der Rattenberger Bergrichter” ( Hs. 37, 1460-1463) and “Schwazer Berglehenbuch” (Hs. 1587, approx. 1515) stored by the Tyrolean Regional Archive, Innsbruck (Austria). The user of the KG may explore the montanistic network and relations between people, claims and mines in the late medieval Tyrol. The core regions concern the districts Schwaz and Kufstein (Tyrol, Austria).

The ontology used to represent the claims is CIDOC CRM, an ISO certified ontology for Cultural Heritage documentation. Supported by the Karma tool the KG is generated as RDF (Resource Description Framework). The generated RDF data is imported into a Triplestore, in this case GraphDB, and then displayed visually. This puts the data from the early mining texts into a semantically structured context and makes the mutual relationships between people, places and mines visible.

Both documents and the Knowledge Graph were processed and generated by the research team of the project “Text Mining Medieval Mining Texts”. The research project (2019-2022) was carried out at the university of Innsbruck and funded by go!digital next generation programme of the Austrian Academy of Sciences.

Citeable Transcripts of the historical documents are online available:
Hs. 37 DOI: 10.5281/zenodo.6274562
Hs. 1587 DOI: 10.5281/zenodo.6274928
H
Data from: Mining texts to efficiently generate global data on political...
dataverse.harvard.edu
Updated Jul 8, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahryar Minhas; Jay Ulfelder; Michael D. Ward (2015). Mining texts to efficiently generate global data on political regime types [Dataset]. http://doi.org/10.7910/DVN/8MC1LO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/8MC1LO
Dataset updated
Jul 8, 2015
Dataset provided by
Harvard Dataverse
Authors
Shahryar Minhas; Jay Ulfelder; Michael D. Ward
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We describe the design and results of an experiment in using text-mining and machine-learning techniques to generate annual measures of national political regime types. Valid and reliable measures of countries’ forms of national government are essential to cross-national and dynamic analysis of many phenomena of great interest to political scientists, including civil war, interstate war, democratization, and coups d’état. Unfortunately, traditional measures of regime type are very expensive to produce, and observations for ambiguous cases are often sharply contested. In this project, we train a series of support vector machine (SVM) classifiers to infer regime type from textual data sources. To train the classifiers, we used vectorized textual reports from Freedom House and the State Department as features for a training set of prelabeled regime type data. To validate our SVM classifiers, we compare their predictions in an out-of-sample context, and the performance results across a variety of metrics (accuracy, precision, recall) are very high. The results of this project highlight the ability of these techniques to contribute to producing real-time data sources for use in political science that can also be routinely updated at much lower cost than human-coded data. To this end, we set up a text-processing pipeline that pulls updated textual data from selected sources, conducts feature extraction, and applies supervised machine learning methods to produce measures of regime type. This pipeline, written in Python, can be pulled from the Github repository associated with this project and easily extended as more data becomes available.
d
Drake Mining Project : request for variations to original E.I.A. submission....
data.gov.au
pdf
Updated Feb 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NSW Department of Planning, Housing and Infrastructure (2024). Drake Mining Project : request for variations to original E.I.A. submission. [Dataset]. https://data.gov.au/dataset/ds-nsw-180b445f-260c-4661-8aaf-51dbded0a2a0
Explore at:
pdfAvailable download formats
Dataset updated
Feb 26, 2024
Dataset provided by
Department of Planning, Housing and Infrastructurehttps://www.nsw.gov.au/departments-and-agencies/department-of-planning-housing-and-infrastructure
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Environmental Impact Statement: Drake Mining Project : request for variations to original E.I.A. submission. Environmental Impact Statement: Drake Mining Project : request for variations to original E.I.A. submission.
Z
Meta-study water and mining conflicts
data.niaid.nih.gov
Updated Feb 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ott, Marlen (2023). Meta-study water and mining conflicts [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5151474
Explore at:
Dataset updated
Feb 17, 2023
Dataset provided by
Ott, Marlen
Schoderer, Mirja
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises the raw data and R Script for the following published article: Schoderer, M., & Ott, M. (2022). Contested water-and miningscapes–Explaining the high intensity of water and mining conflicts in a meta-study. World Development, 154, 105888. The article seeks to better understand the dynamics of mining and water conflicts, specifically under which (combinations of) conditions environmental defenders step outside the legal framework in their contestation of mining projects, according to existing case study-based research. More information on the methodology is available in the paper.

The file Water and mining conflicts full dataset includes the qualitative information extracted from published articles, the scoring scheme and the normalized scores used in the R analysis. The R Script QCA_Preventive water and mining conflicts describes the fuzzy-set, two-step Qualitative Comparative Analysis conduct to understand under which conditions environmental defenders choose non-legal means in conflicts that occur in the planning or licensing stage of a mining project The CSV file Normalized scores_preventive is the raw data used in the R Script QCA_Preventive water and mining conflicts The R Script QCA_Reactive water and mining conflicts describes the fuzzy-set, two-step Qualitative Comparative Analysis conduct to understand under which conditions environmental defenders choose non-legal means in conflicts that occur when the mining project is already in operation The CSV file Normalized scores_reactive is the raw data used in the R Script QCA_Reactive water and mining conflicts
d
Tokenized Forms of Jane Austen Novels with Positional Information
search.dataone.org
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Duckworth, Tyler J (2024). Tokenized Forms of Jane Austen Novels with Positional Information [Dataset]. http://doi.org/10.7910/DVN/24ZURB
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/24ZURB
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Duckworth, Tyler J
Description
This dataset contains tokenized forms of four Jane Austen novels sourced from Project Gutenberg--Emma, Persuasion, Pride and Prejudice, and Sense and Sensibility--that are broken down by chapter (and volume where appropriate). Each file also includes positional data for each row which will be used for further analysis. This was created to hold the data for the final project for COSC426: Introduction to Data Mining, a class at the University of Tennessee.
m
Data extracted from GitHub repositories (training and test data-sets)
data.mendeley.com
Updated Aug 1, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Youcef Bouziane (2019). Data extracted from GitHub repositories (training and test data-sets) [Dataset]. http://doi.org/10.17632/gt3f4jnbvn.3
Explore at:
Unique identifier
https://doi.org/10.17632/gt3f4jnbvn.3
Dataset updated
Aug 1, 2019
Authors
Youcef Bouziane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the SQL tables of the training and test datasets used in our experimentation. These tables contain the preprocessed textual data (in a form of tokens) extracted from each training and test project. Besides the preprocessed textual data, this dataset also contains meta-data about the projects, GitHub topics, and GitHub collections. The GitHub projects are identified by the tuple “Owner” and “Name”. The descriptions of the table fields are attached to their respective data descriptions.
q
Simulated supermarket transaction data
researchdatafinder.qut.edu.au
researchdata.edu.au
Updated May 31, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuefeng Li (2010). Simulated supermarket transaction data [Dataset]. https://researchdatafinder.qut.edu.au/individual/q44
Explore at:
Dataset updated
May 31, 2010
Dataset provided by
Queensland University of Technology (QUT)
Authors
Yuefeng Li
Description
A database of de-identified supermarket customer transactions. This large simulated dataset was created based on a real data sample.
D
Mine Project Approval Boundary
data.nsw.gov.au
researchdata.edu.au
url
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NSW Resources - Resources Regulator (2024). Mine Project Approval Boundary [Dataset]. https://data.nsw.gov.au/data/dataset/mine-project-approval-boundary
Explore at:
urlAvailable download formats
Dataset updated
Jul 26, 2024
Dataset provided by
NSW Resources - Resources Regulator
Description
The Project Approval Boundary spatial data set provides information on the location of the project approvals granted for each mine in NSW by an approval authority (either NSW Department of Planning or local Council). This information may not align to the mine authorisation (i.e. mine title etc) granted under the Mining Act 1992. This information is created and submitted by each large mine operator to fulfill the Final Landuse and Rehabilitation Plan data submission requirements required under Schedule 8A of the Mining Regulation 2016.

The collection of this spatial data is administered by the Resources Regulator in NSW who conducts reviews of the data submitted for assessment purposes. In some cases, information provided may contain inaccuracies that require adjustment following the assessment process by the Regulator. The Regulator will request data resubmission if issues are identified.

Further information on the reporting requirements associated with mine rehabilitation can be found at https://www.resourcesregulator.nsw.gov.au/rehabilitation/mine-rehabilitation.

Find more information about the data at https://www.seed.nsw.gov.au/project-approvals-boundary-layer

Any data related questions should be directed to nswresourcesregulator@service-now.com