100+ datasets found
  1. Data from: Data Mining Project

    • kaggle.com
    Updated May 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khanh Vương (2022). Data Mining Project [Dataset]. https://www.kaggle.com/khanhvng/data-mining-project/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Khanh Vương
    Description

    Dataset

    This dataset was created by Khanh Vương

    Contents

  2. d

    Data-Mining-Final-Project-Data

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anderson, Ty Julian (2024). Data-Mining-Final-Project-Data [Dataset]. http://doi.org/10.7910/DVN/8ETVW9
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Anderson, Ty Julian
    Description

    Financial News Headlines. Visit https://dataone.org/datasets/sha256%3Ade01b1cf5318d53f0296b475ff28734d90acd6240a76f1eee1df39fefda07ef0 for complete metadata about this dataset.

  3. u

    Data from: The use of project portfolios in effective strategy execution to...

    • researchdata.up.ac.za
    zip
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palesa Agnes Ramashala (2023). The use of project portfolios in effective strategy execution to improve business value [Dataset]. http://doi.org/10.25403/UPresearchdata.13280141.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University of Pretoria
    Authors
    Palesa Agnes Ramashala
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Qualitative data gathered from interviews that were conducted with case organisations. The data is analysed using a qualitative data analysis tool (AtlasTi) to code and generate network diagrams. Software such as Atlas.ti 8 Windows will be a great advantage to use in order to view these results. Interviews were conducted with four case organisations. The details of the responses from the respondents from case organisations are captured. The data gathered during the interview sessions is captured in a tabular form and graphs were also created to identify trends. Also in this study is desktop review of the case organisations that formed part of the study. The desktop study was done using published annual reports over a period of more than seven years. The analysis was done given the scope of the project and its constructs.

  4. s

    Data and source code for "Automating Intention Mining"

    • researchdata.smu.edu.sg
    zip
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiao HUANG; Xin XIA; David LO; Gail C. MURPHY (2023). Data and source code for "Automating Intention Mining" [Dataset]. http://doi.org/10.25440/smu.21261408.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SMU Research Data Repository (RDR)
    Authors
    Qiao HUANG; Xin XIA; David LO; Gail C. MURPHY
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    The dataset and source code for paper "Automating Intention Mining".

    The code is based on dennybritz's implementation of Yoon Kim's paper Convolutional Neural Networks for Sentence Classification.

    By default, the code uses Tensorflow 0.12. Some errors might be reported when using other versions of Tensorflow due to the incompatibility of some APIs.

    Running 'online_prediction.py', you can input any sentence and check the classification result produced by a pre-trained CNN model. The model uses all sentences of the four Github projects as training data.

    Running 'play.py', you can get the evaluation result of cross-project prediction. Please check the code for more details of the configuration. By default, it will use the four Github projects as training data to predict the sentences in DECA dataset, and in this setting, the category 'aspect evaluation' and 'others' are dropped since DECA dataset does not contain these two categories.

  5. ghtorrent-projects Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, txt
    Updated Jul 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marios Papachristou; Marios Papachristou (2021). ghtorrent-projects Dataset [Dataset]. http://doi.org/10.5281/zenodo.5111043
    Explore at:
    txt, binAvailable download formats
    Dataset updated
    Jul 17, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marios Papachristou; Marios Papachristou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A hypergraph dataset mined from the GHTorrent project is presented. The dataset contains two files

    1. project_members.txt: Contains GitHub projects with at least 2 contributors and the corresponding contributors (as a hyperedge). The format of the data is:

    2. num_followers.txt: Contains all GitHub users and their number of followers.

    The artifact also contains the SQL queries used to obtain the data from GHTorrent (schema).

  6. m

    Educational Attainment in North Carolina Public Schools: Use of statistical...

    • data.mendeley.com
    Updated Nov 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
    Explore at:
    Dataset updated
    Nov 14, 2018
    Authors
    Scott Herford
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.

  7. f

    Data from: Enhancing the Human Health Status Prediction: The ATHLOS Project

    • tandf.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos (2023). Enhancing the Human Health Status Prediction: The ATHLOS Project [Dataset]. http://doi.org/10.6084/m9.figshare.14798079.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preventive healthcare is a crucial pillar of health as it contributes to staying healthy and having immediate treatment when needed. Mining knowledge from longitudinal studies has the potential to significantly contribute to the improvement of preventive healthcare. Unfortunately, data originated from such studies are characterized by high complexity, huge volume, and a plethora of missing values. Machine Learning, Data Mining and Data Imputation models are utilized a part of solving these challenges, respectively. Toward this direction, we focus on the development of a complete methodology for the ATHLOS Project – funded by the European Union’s Horizon 2020 Research and Innovation Program, which aims to achieve a better interpretation of the impact of aging on health. The inherent complexity of the provided dataset lies in the fact that the project includes 15 independent European and international longitudinal studies of aging. In this work, we mainly focus on the HealthStatus (HS) score, an index that estimates the human status of health, aiming to examine the effect of various data imputation models to the prediction power of classification and regression models. Our results are promising, indicating the critical importance of data imputation in enhancing preventive medicine’s crucial role.

  8. d

    Data from: DATA MINING THE GALAXY ZOO MERGERS

    • catalog.data.gov
    • gimi9.com
    • +3more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). DATA MINING THE GALAXY ZOO MERGERS [Dataset]. https://catalog.data.gov/dataset/data-mining-the-galaxy-zoo-mergers
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    DATA MINING THE GALAXY ZOO MERGERS STEVEN BAEHR, ARUN VEDACHALAM, KIRK BORNE, AND DANIEL SPONSELLER Abstract. Collisions between pairs of galaxies usually end in the coalescence (merger) of the two galaxies. Collisions and mergers are rare phenomena, yet they may signal the ultimate fate of most galaxies, including our own Milky Way. With the onset of massive collection of astronomical data, a computerized and automated method will be necessary for identifying those colliding galaxies worthy of more detailed study. This project researches methods to accomplish that goal. Astronomical data from the Sloan Digital Sky Survey (SDSS) and human-provided classifications on merger status from the Galaxy Zoo project are combined and processed with machine learning algorithms. The goal is to determine indicators of merger status based solely on discovering those automated pipeline-generated attributes in the astronomical database that correlate most strongly with the patterns identified through visual inspection by the Galaxy Zoo volunteers. In the end, we aim to provide a new and improved automated procedure for classification of collisions and mergers in future petascale astronomical sky surveys. Both information gain analysis (via the C4.5 decision tree algorithm) and cluster analysis (via the Davies-Bouldin Index) are explored as techniques for finding the strongest correlations between human-identified patterns and existing database attributes. Galaxy attributes measured in the SDSS green waveband images are found to represent the most influential of the attributes for correct classification of collisions and mergers. Only a nominal information gain is noted in this research, however, there is a clear indication of which attributes contribute so that a direction for further study is apparent.

  9. O

    Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry...

    • data.openei.org
    • datasets.ai
    • +2more
    data, image_document
    Updated Apr 27, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jenifer Jackson; M Berman; P Smith; Jenifer Jackson; M Berman; P Smith (2016). Community-Scale Attic Retrofit and Home Energy Upgrade Data Mining - Hot Dry Climate [Dataset]. http://doi.org/10.25984/2204258
    Explore at:
    data, image_documentAvailable download formats
    Dataset updated
    Apr 27, 2016
    Dataset provided by
    Davis Energy
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
    Open Energy Data Initiative (OEDI)
    Authors
    Jenifer Jackson; M Berman; P Smith; Jenifer Jackson; M Berman; P Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Retrofitting is an essential element of any comprehensive strategy for improving residential energy efficiency. The residential retrofit market is still developing, and program managers must develop innovative strategies to increase uptake and promote economies of scale. Residential retrofitting remains a challenging proposition to sell to homeowners, because awareness levels are low and financial incentives are lacking.

    The U.S. Department of Energy's Building America research team, Alliance for Residential Building Innovation (ARBI), implemented a project to increase residential retrofits in Davis, California. The project used a neighborhood-focused strategy for implementation and a low-cost retrofit program that focused on upgraded attic insulation and duct sealing. ARBI worked with a community partner, the not-for-profit Cool Davis Initiative, as well as selected area contractors to implement a strategy that sought to capitalize on the strong local expertise of partners and the unique aspects of the Davis, California, community. Working with community partners also allowed ARBI to collect and analyze data about effective messaging tactics for community-based retrofit programs.

    ARBI expected this project, called Retrofit Your Attic, to achieve higher uptake than other retrofit projects, because it emphasized a low-cost, one-measure retrofit program. However, this was not the case. The program used a strategy that focused on attics-including air sealing, duct sealing, and attic insulation-as a low-cost entry for homeowners to complete home retrofits. The price was kept below $4,000 after incentives; both contractors in the program offered the same price. The program completed only five retrofits. Interestingly, none of those homeowners used the one-measure strategy. All five homeowners were concerned about cost, comfort, and energy savings and included additional measures in their retrofits. The low-cost, one-measure strategy did not increase the uptake among homeowners, even in a well-educated, affluent community such as Davis.

    This project has two primary components. One is to complete attic retrofits on a community scale in the hot-dry climate on Davis, CA. Sufficient data will be collected on these projects to include them in the BAFDR. Additionally, ARBI is working with contractors to obtain building and utility data from a large set of retrofit projects in CA (hot-dry). These projects are to be uploaded into the BAFDR.

  10. Empirical data for project: Mining user reviews of COVID contact-tracing...

    • figshare.com
    png
    Updated Sep 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vahid Garousi (2020). Empirical data for project: Mining user reviews of COVID contact-tracing apps [Dataset]. http://doi.org/10.6084/m9.figshare.13010402.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Sep 25, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Vahid Garousi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Empirical data for project: Mining user reviews of COVID contact-tracing apps

  11. Hospital Database Management System SQL Project

    • kaggle.com
    Updated May 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Dolcimascolo-Garrett (2024). Hospital Database Management System SQL Project [Dataset]. https://www.kaggle.com/datasets/andrewdolcigarrett/hospital-database-management-system-sql-project/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Andrew Dolcimascolo-Garrett
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Andrew Dolcimascolo-Garrett

    Released under MIT

    Contents

  12. Data from: A large-scale comparative analysis of Coding Standard conformance...

    • figshare.com
    application/x-gzip
    Updated Oct 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa (2021). A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects [Dataset]. http://doi.org/10.6084/m9.figshare.12377237.v3
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Oct 4, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study investigates the extent to which data science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.results.tar.gz: Extracted data for each project, including raw logs of all detected code violations.notebooks_out.tar.gz: Tables and figures generated by notebooks.source_code_anonymized.tar.gz: Anonymized source code (at time of publication) to identify, clone, and analyse the projects. Also includes Jupyter notebooks used to produce figures in the paper.The latest source code can be found at: https://github.com/a2i2/mining-data-science-repositoriesPublished in ESEM 2020: https://doi.org/10.1145/3382494.3410680Preprint: https://arxiv.org/abs/2007.08978

  13. Knowledge Graph: tyrolean mining documents 15th and 16th century

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine (2024). Knowledge Graph: tyrolean mining documents 15th and 16th century [Dataset]. http://doi.org/10.5281/zenodo.6276586
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains a Knowledge Graph (.nq file) of two historical mining documents: “Verleihbuch der Rattenberger Bergrichter” ( Hs. 37, 1460-1463) and “Schwazer Berglehenbuch” (Hs. 1587, approx. 1515) stored by the Tyrolean Regional Archive, Innsbruck (Austria). The user of the KG may explore the montanistic network and relations between people, claims and mines in the late medieval Tyrol. The core regions concern the districts Schwaz and Kufstein (Tyrol, Austria).

    The ontology used to represent the claims is CIDOC CRM, an ISO certified ontology for Cultural Heritage documentation. Supported by the Karma tool the KG is generated as RDF (Resource Description Framework). The generated RDF data is imported into a Triplestore, in this case GraphDB, and then displayed visually. This puts the data from the early mining texts into a semantically structured context and makes the mutual relationships between people, places and mines visible.

    Both documents and the Knowledge Graph were processed and generated by the research team of the project “Text Mining Medieval Mining Texts”. The research project (2019-2022) was carried out at the university of Innsbruck and funded by go!digital next generation programme of the Austrian Academy of Sciences.

    Citeable Transcripts of the historical documents are online available:
    Hs. 37 DOI: 10.5281/zenodo.6274562
    Hs. 1587 DOI: 10.5281/zenodo.6274928

  14. H

    Data from: Mining texts to efficiently generate global data on political...

    • dataverse.harvard.edu
    Updated Jul 8, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahryar Minhas; Jay Ulfelder; Michael D. Ward (2015). Mining texts to efficiently generate global data on political regime types [Dataset]. http://doi.org/10.7910/DVN/8MC1LO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2015
    Dataset provided by
    Harvard Dataverse
    Authors
    Shahryar Minhas; Jay Ulfelder; Michael D. Ward
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We describe the design and results of an experiment in using text-mining and machine-learning techniques to generate annual measures of national political regime types. Valid and reliable measures of countries’ forms of national government are essential to cross-national and dynamic analysis of many phenomena of great interest to political scientists, including civil war, interstate war, democratization, and coups d’état. Unfortunately, traditional measures of regime type are very expensive to produce, and observations for ambiguous cases are often sharply contested. In this project, we train a series of support vector machine (SVM) classifiers to infer regime type from textual data sources. To train the classifiers, we used vectorized textual reports from Freedom House and the State Department as features for a training set of prelabeled regime type data. To validate our SVM classifiers, we compare their predictions in an out-of-sample context, and the performance results across a variety of metrics (accuracy, precision, recall) are very high. The results of this project highlight the ability of these techniques to contribute to producing real-time data sources for use in political science that can also be routinely updated at much lower cost than human-coded data. To this end, we set up a text-processing pipeline that pulls updated textual data from selected sources, conducts feature extraction, and applies supervised machine learning methods to produce measures of regime type. This pipeline, written in Python, can be pulled from the Github repository associated with this project and easily extended as more data becomes available.

  15. d

    Drake Mining Project : request for variations to original E.I.A. submission....

    • data.gov.au
    pdf
    Updated Feb 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NSW Department of Planning, Housing and Infrastructure (2024). Drake Mining Project : request for variations to original E.I.A. submission. [Dataset]. https://data.gov.au/dataset/ds-nsw-180b445f-260c-4661-8aaf-51dbded0a2a0
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 26, 2024
    Dataset provided by
    Department of Planning, Housing and Infrastructurehttps://www.nsw.gov.au/departments-and-agencies/department-of-planning-housing-and-infrastructure
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Environmental Impact Statement: Drake Mining Project : request for variations to original E.I.A. submission. Environmental Impact Statement: Drake Mining Project : request for variations to original E.I.A. submission.

  16. Z

    Meta-study water and mining conflicts

    • data.niaid.nih.gov
    Updated Feb 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ott, Marlen (2023). Meta-study water and mining conflicts [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5151474
    Explore at:
    Dataset updated
    Feb 17, 2023
    Dataset provided by
    Ott, Marlen
    Schoderer, Mirja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises the raw data and R Script for the following published article: Schoderer, M., & Ott, M. (2022). Contested water-and miningscapes–Explaining the high intensity of water and mining conflicts in a meta-study. World Development, 154, 105888. The article seeks to better understand the dynamics of mining and water conflicts, specifically under which (combinations of) conditions environmental defenders step outside the legal framework in their contestation of mining projects, according to existing case study-based research. More information on the methodology is available in the paper.

    The file Water and mining conflicts full dataset includes the qualitative information extracted from published articles, the scoring scheme and the normalized scores used in the R analysis. The R Script QCA_Preventive water and mining conflicts describes the fuzzy-set, two-step Qualitative Comparative Analysis conduct to understand under which conditions environmental defenders choose non-legal means in conflicts that occur in the planning or licensing stage of a mining project The CSV file Normalized scores_preventive is the raw data used in the R Script QCA_Preventive water and mining conflicts The R Script QCA_Reactive water and mining conflicts describes the fuzzy-set, two-step Qualitative Comparative Analysis conduct to understand under which conditions environmental defenders choose non-legal means in conflicts that occur when the mining project is already in operation The CSV file Normalized scores_reactive is the raw data used in the R Script QCA_Reactive water and mining conflicts

  17. d

    Tokenized Forms of Jane Austen Novels with Positional Information

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Duckworth, Tyler J (2024). Tokenized Forms of Jane Austen Novels with Positional Information [Dataset]. http://doi.org/10.7910/DVN/24ZURB
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Duckworth, Tyler J
    Description

    This dataset contains tokenized forms of four Jane Austen novels sourced from Project Gutenberg--Emma, Persuasion, Pride and Prejudice, and Sense and Sensibility--that are broken down by chapter (and volume where appropriate). Each file also includes positional data for each row which will be used for further analysis. This was created to hold the data for the final project for COSC426: Introduction to Data Mining, a class at the University of Tennessee.

  18. m

    Data extracted from GitHub repositories (training and test data-sets)

    • data.mendeley.com
    Updated Aug 1, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youcef Bouziane (2019). Data extracted from GitHub repositories (training and test data-sets) [Dataset]. http://doi.org/10.17632/gt3f4jnbvn.3
    Explore at:
    Dataset updated
    Aug 1, 2019
    Authors
    Youcef Bouziane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the SQL tables of the training and test datasets used in our experimentation. These tables contain the preprocessed textual data (in a form of tokens) extracted from each training and test project. Besides the preprocessed textual data, this dataset also contains meta-data about the projects, GitHub topics, and GitHub collections. The GitHub projects are identified by the tuple “Owner” and “Name”. The descriptions of the table fields are attached to their respective data descriptions.

  19. q

    Simulated supermarket transaction data

    • researchdatafinder.qut.edu.au
    • researchdata.edu.au
    Updated May 31, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuefeng Li (2010). Simulated supermarket transaction data [Dataset]. https://researchdatafinder.qut.edu.au/individual/q44
    Explore at:
    Dataset updated
    May 31, 2010
    Dataset provided by
    Queensland University of Technology (QUT)
    Authors
    Yuefeng Li
    Description

    A database of de-identified supermarket customer transactions. This large simulated dataset was created based on a real data sample.

  20. D

    Mine Project Approval Boundary

    • data.nsw.gov.au
    • researchdata.edu.au
    url
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NSW Resources - Resources Regulator (2024). Mine Project Approval Boundary [Dataset]. https://data.nsw.gov.au/data/dataset/mine-project-approval-boundary
    Explore at:
    urlAvailable download formats
    Dataset updated
    Jul 26, 2024
    Dataset provided by
    NSW Resources - Resources Regulator
    Description

    The Project Approval Boundary spatial data set provides information on the location of the project approvals granted for each mine in NSW by an approval authority (either NSW Department of Planning or local Council). This information may not align to the mine authorisation (i.e. mine title etc) granted under the Mining Act 1992. This information is created and submitted by each large mine operator to fulfill the Final Landuse and Rehabilitation Plan data submission requirements required under Schedule 8A of the Mining Regulation 2016.

    The collection of this spatial data is administered by the Resources Regulator in NSW who conducts reviews of the data submitted for assessment purposes. In some cases, information provided may contain inaccuracies that require adjustment following the assessment process by the Regulator. The Regulator will request data resubmission if issues are identified.

    Further information on the reporting requirements associated with mine rehabilitation can be found at https://www.resourcesregulator.nsw.gov.au/rehabilitation/mine-rehabilitation.

    Find more information about the data at https://www.seed.nsw.gov.au/project-approvals-boundary-layer

    Any data related questions should be directed to nswresourcesregulator@service-now.com

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Khanh Vương (2022). Data Mining Project [Dataset]. https://www.kaggle.com/khanhvng/data-mining-project/discussion
Organization logo

Data from: Data Mining Project

Related Article
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Khanh Vương
Description

Dataset

This dataset was created by Khanh Vương

Contents

Search
Clear search
Close search
Google apps
Main menu