36 datasets found
  1. Data supporting the Master thesis "Monitoring von Open Data Praktiken -...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katharina Zinke; Katharina Zinke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Dresden
    Description

    Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

    This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

    The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

    ## Data sources

    Folder 01_SourceData/

    - PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

    - ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

    - ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

    - Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

    ## Automatic classification

    Folder 02_AutomaticClassification/

    - (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

    - (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

    - PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

    - oddpub_results_wDOIs.csv (results file of the ODDPub classification)

    - PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

    ## Manual coding

    Folder 03_ManualCheck/

    - CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

    - ManualCheck_2023-06-08.csv (Manual coding results file)

    - PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

    ## Explorative analysis for the discoverability of open data

    Folder04_FurtherAnalyses

    Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

    ## R-Script

    Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)

  2. f

    Experimental data for "Software Data Analytics: Architectural Model...

    • figshare.com
    zip
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cong Liu (2023). Experimental data for "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection" [Dataset]. http://doi.org/10.4121/uuid:ca1b0690-d9c5-4626-a067-525ec9d5881b
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Cong Liu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset includes all experimental data used for the PhD thesis of Cong Liu, entitled "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection". These data are generated by instrumenting both synthetic and real-life software systems, and are formated according to the IEEE XES format. See http://www.xes-standard.org/ and https://www.win.tue.nl/ieeetfpm/lib/exe/fetch.php?media=shared:downloads:2017-06-22-xes-software-event-v5-2.pdf for more explanations.

  3. d

    Thesis Code

    • datamed.org
    • dataverse.harvard.edu
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thesis Code [Dataset]. https://datamed.org/display-item.php?repository=0012&id=56d4b8ace4b0e644d3136f59&query=
    Explore at:
    Description

    Code for getting data,mining text and estimatingVAR model

  4. r

    Data from: Scaling data mining in massively parallel dataflow systems

    • resodate.org
    Updated Feb 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Schelter (2016). Scaling data mining in massively parallel dataflow systems [Dataset]. http://doi.org/10.14279/depositonce-4982
    Explore at:
    Dataset updated
    Feb 5, 2016
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Sebastian Schelter
    Description

    This thesis lays the ground work for enabling scalable data mining in massively parallel dataflow systems, using large datasets. Such datasets have become ubiquitous. We illustrate common fallacies with respect to scalable data mining: It is in no way sufficient to naively implement textbook algorithms on parallel systems; bottlenecks on all layers of the stack prevent the scalability of such naive implementations. We argue that scalability in data mining is a multi-leveled problem and must therefore be approached on the interplay of algorithms, systems, and applications. We therefore discuss a selection of scalability problems on these different levels. We investigate algorithm-specific scalability aspects of collaborative filtering algorithms for computing recommendations, a popular data mining use case with many industry deployments. We show how to efficiently execute the two most common approaches, namely neighborhood methods and latent factor models on MapReduce, and describe a specialized architecture for scaling collaborative filtering to extremely large datasets which we implemented at Twitter. We turn to system-specific scalability aspects, where we improve system performance during the distributed execution of a special class of iterative algorithms by drastically reducing the overhead required for guaranteeing fault tolerance. Therefore we propose a novel optimistic approach to fault-tolerance which exploits the robust convergence properties of a large class of fixpoint algorithms and does not incur measurable overhead in failure-free cases. Finally, we present work on an application-specific scalability aspect of scalable data mining. A common problem when deploying machine learning applications in real-world scenarios is that the prediction quality of ML models heavily depends on hyperparameters that have to be chosen in advance. We propose an algorithmic framework for an important subproblem occuring during hyperparameter search at scale: efficiently generating samples from block-partitioned matrices in a shared-nothing environment. For every selected problem, we show how to execute the resulting computation automatically in a parallel and scalable manner, and evaluate our proposed solution on large datasets with billions of datapoints.

  5. Previous works comparative table.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Previous works comparative table. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Previous works comparative table.

  6. B

    Canadian Copper Mining Data - D Young Thesis

    • borealisdata.ca
    • search.dataone.org
    Updated Nov 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Denise Young (2016). Canadian Copper Mining Data - D Young Thesis [Dataset]. http://doi.org/10.7939/DVN/10950
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 10, 2016
    Dataset provided by
    Borealis
    Authors
    Denise Young
    License

    https://borealisdata.ca/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7939/DVN/10950https://borealisdata.ca/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7939/DVN/10950

    Time period covered
    1953 - 1984
    Area covered
    Canada
    Description

    Mine-level copper data (1953-1984) used in Young, D. (1992), "Cost Specification and Firm Behaviour in a Hotelling Model of Resource Extraction," Canadian Journal of Economics XXV, 41-59. Spreadsheet has 5 tabs (including data and explanatory materials).

  7. f

    Performance of the algorithm.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Performance of the algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of the algorithm.

  8. Categorization of doctoral theses.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Categorization of doctoral theses. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Categorization of doctoral theses.

  9. r

    Data from: Extending the knowledge base of foresight

    • resodate.org
    Updated Apr 14, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victoria Kayser (2016). Extending the knowledge base of foresight [Dataset]. http://doi.org/10.14279/depositonce-5098
    Explore at:
    Dataset updated
    Apr 14, 2016
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Victoria Kayser
    Description

    The future is shaped and influenced by decisions made today. These decisions need to be made on a solid ground and diverse information sources should be considered in the decision process. For exploring different futures, foresight offers a wide range of methods for gaining insights. The starting point of this thesis is the observation that recent foresight methods particularly use patent and publication data or rely on expert opinion, but few other data sources are used. In times of big data, many other options exist and, for example, social media or websites are currently not a major part of these deliberations. While the volume of data from heterogeneous sources grows considerably, foresight and its methods rarely benefit from such available data. One attempt to access and systematically examine this data is text mining that processes textual data in a largely automated manner. Therefore, this thesis addresses the contribution of text mining and further textual data sources for foresight and its methods. After clarifying the potential of combining text mining and foresight, four concrete examples are outlined. As the results show, the existing foresight methods are improved as exemplified by roadmapping and scenario development. By exploiting new data sources (e.g., Twitter and web mining), new options evolve for analyzing data. Thus, more actors and views are integrated, and more emphasis is laid on analyzing social changes. Summarized, using text mining enhances the detection and examination of emerging topics and technologies by extending the knowledge base of foresight. Hence, new foresight applications can be designed. And, in particular, text mining is promising for explorative approaches that require a solid base for reflecting on possible futures.

  10. Bitcoin data part three from Jan 2009 to Feb 2018

    • kaggle.com
    zip
    Updated Apr 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZouJiu (2020). Bitcoin data part three from Jan 2009 to Feb 2018 [Dataset]. https://www.kaggle.com/datasets/shiheyingzhe/bitcoin-data-part-three-from-jan-2009-to-feb-2018
    Explore at:
    zip(11036310629 bytes)Available download formats
    Dataset updated
    Apr 18, 2020
    Authors
    ZouJiu
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    During my Senior in the Shan Dong University, my tutor give me research direction of University thesis, which is bitcoin transaction data analysis, so I crawled all of bitcoin transaction data from January 2009 to February 2018.I make statistical analysis and quantitative analysis,I hope this data will give you some help, data mining is interesting and helping not only in the skill of data mining but also in our life.

    I crawled these data from website https://www.blockchain.com/explorer, each file contains many blocks,the scope of blocks is reflected in the file name,e.g. this file 0-68732.csv is composed of zero block which is also called genesis block until 68732 block.if a block that didn't have input is not in this file. let's see the columns and rows, there has five columns, the Height column represent block height,the Input column represent the input address of this block,the Output column represent the output address of this block,the Sum column represent bitcoin transaction amount corresponding to the Output,the Time column represent the generation time of this block.A block contains many transactions.

    The page is part three of all data, others can be found here https://www.kaggle.com/shiheyingzhe/datasets

  11. 4

    Pattern Mining for Label Ranking

    • data.4tu.nl
    • figshare.com
    zip
    Updated May 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    C.F. (Cláudio) Pinho Rebelo de Sá (2017). Pattern Mining for Label Ranking [Dataset]. http://doi.org/10.4121/uuid:21b1959d-9196-423e-94d0-53883fb0ff21
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 8, 2017
    Dataset provided by
    LIACS
    Authors
    C.F. (Cláudio) Pinho Rebelo de Sá
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    Label Ranking datasets used in the PhD thesis "Pattern Mining for Label Ranking"

  12. n

    Data from: Improving Scientific Information Extraction with Text Generation

    • curate.nd.edu
    pdf
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingkai Zeng (2025). Improving Scientific Information Extraction with Text Generation [Dataset]. http://doi.org/10.7274/28571045.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 9, 2025
    Dataset provided by
    University of Notre Dame
    Authors
    Qingkai Zeng
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Description

    As research communities expand, the number of scientific articles continues to grow rapidly, with no signs of slowing. This information overload drives the need for automated tools to identify relevant materials and extract key ideas. Information extraction (IE) focuses on converting unstructured scientific text into structured knowledge (e.g., ontologies, taxonomies, and knowledge graphs), enabling intelligent systems to excel in tasks like document organization, scientific literature retrieval and recommendation, claim verification even novel idea or hypothesis generation. To pinpoint the scope of this thesis, I focus on the taxonomic structure in this thesis to represent the knowledge in the scientific domain.

    To construct a taxonomy from scientific corpora, traditional methods often rely on pipeline frameworks. These frameworks typically follow a sequence: first, extracting scientific concepts or entities from the corpus; second, identifying hierarchical relationships between the concepts; and finally, organizing these relationships into a cohesive taxonomy. However, such methods encounter several challenges: (1) the quality of the corpus or annotation data, (2) error propagation within the pipeline framework, and (3) limited generalization and transferability to other specific domains. The development of large language models (LLMs) offers promising advancements, as these models have demonstrated remarkable abilities to internalize knowledge and respond effectively to a wide range of inquiries. Unlike traditional pipeline-based approaches, generative methods harness LLMs to achieve (1) better utilization of their internalized knowledge, (2) direct text-to-knowledge conversion, and (3) flexible, schema-free adaptability.

    This thesis explores innovative methods for integrating text generation technologies to improve IE in the scientific domain, with a focus on taxonomy construction. The approach begins with generating entity names and evolves to create or enrich taxonomies directly via text generation. I will explore combining neighborhood structural context, descriptive textual information, and LLMs' internal knowledge to improve output quality. Finally, this thesis will outline future research directions.

  13. d

    Data from: Geospatial Files for the Geologic Map of the Stibnite Mining...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Geospatial Files for the Geologic Map of the Stibnite Mining Area, Valley County, Idaho [Dataset]. https://catalog.data.gov/dataset/geospatial-files-for-the-geologic-map-of-the-stibnite-mining-area-valley-county-idaho
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Valley County, Stibnite, Idaho
    Description

    These geospatial files are the essential components for the Geologic Map of the Stibnite Mining Area in Valley County, Idaho, which was published by the Idaho Geological Survey in 2022. Three main file types are in this dataset: geographic, geologic, and mining. Geographic files are map extent, lidar base, topographic contours, labels for contours, waterways, and roads. Geologic files are geologic map units, faults, structural lines meaning axial traces, structural points like bedding strike and dip locations, cross section lines, and drill core sample locations. Lastly, mining files are disturbed ground features including open pit polygons or outlines, and general mining features such as the location of an adit. File formats are shape, layer, or raster. Of the 14 shapefiles, 7 have layer files that provide pre-set symbolization for use in ESRI ArcMap that match up with the Geologic Map of the Stibnite Mining Area in Valley County, Idaho. The lidar data have two similar, but distinct, raster format types (ESRI GRID and TIFF) intended to increase end user accessibility. This dataset is a compilation of both legacy data (from Smitherman’s 1985 masters thesis published in 1988, Midas Gold Corporation employees, the Geologic Map of the Stibnite Quadrangle (Stewart and others, 2016) and Reed S. Lewis of the Idaho Geological Survey) and new data from 2013, 2015, and 2016 field work by Niki E. Wintzer.

  14. Noun tags.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Noun tags. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Noun tags.

  15. R

    Road Segmentation Cctv Merge Dataset

    • universe.roboflow.com
    zip
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    road001 (2025). Road Segmentation Cctv Merge Dataset [Dataset]. https://universe.roboflow.com/road001/road-segmentation-cctv-merge/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    road001
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Road Road AQlm Polygons
    Description
  16. Key indicators.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Key indicators. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Key indicators.

  17. Quality Prediction in a Mining Process

    • kaggle.com
    zip
    Updated Dec 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EduardoMagalhãesOliveira (2017). Quality Prediction in a Mining Process [Dataset]. https://www.kaggle.com/datasets/edumagalhaes/quality-prediction-in-a-mining-process/code
    Explore at:
    zip(53386037 bytes)Available download formats
    Dataset updated
    Dec 6, 2017
    Authors
    EduardoMagalhãesOliveira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    It is not always easy to find databases from real world manufacturing plants, specially mining plants. So, I would like to share this database with the community, which comes from one of the most important parts of a mining process: a flotation plant!

    PLEASE HELP ME GET MORE DATASETS LIKE THIS FILLING A 30s SURVEY:

    The main goal is to use this data to predict how much impurity is in the ore concentrate. As this impurity is measured every hour, if we can predict how much silica (impurity) is in the ore concentrate, we can help the engineers, giving them early information to take actions (empowering!). Hence, they will be able to take corrective actions in advance (reduce impurity, if it is the case) and also help the environment (reducing the amount of ore that goes to tailings as you reduce silica in the ore concentrate).

    Content

    The first column shows time and date range (from march of 2017 until september of 2017). Some columns were sampled every 20 second. Others were sampled on a hourly base.

    The second and third columns are quality measures of the iron ore pulp right before it is fed into the flotation plant. Column 4 until column 8 are the most important variables that impact in the ore quality in the end of the process. From column 9 until column 22, we can see process data (level and air flow inside the flotation columns, which also impact in ore quality. The last two columns are the final iron ore pulp quality measurement from the lab. Target is to predict the last column, which is the % of silica in the iron ore concentrate.

    Inspiration

    I have been working in this dataset for at least six months and would like to see if the community can help to answer the following questions:

    • Is it possible to predict % Silica Concentrate every minute?

    • How many steps (hours) ahead can we predict % Silica in Concentrate? This would help engineers to act in predictive and optimized way, mitigatin the % of iron that could have gone to tailings.

    • Is it possible to predict % Silica in Concentrate whitout using % Iron Concentrate column (as they are highly correlated)?

    Related research using this dataset

    • Research/Conference Papers and Master Thesis:

      • Purities prediction in a manufacturing froth flotation plant: the deep learning techniques link
      • Soft Sensor: Traditional Machine Learning or Deep Learning link
      • Machine Learning-based Quality Prediction in the Froth Flotation Process of Mining link
  18. r

    Data from: Fractured from fracking: examining the health and wellbeing...

    • resodate.org
    Updated Dec 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fiona Mactaggart (2021). Fractured from fracking: examining the health and wellbeing implications of unconventional natural gas development in rural communities [Dataset]. http://doi.org/10.14279/depositonce-11936
    Explore at:
    Dataset updated
    Dec 23, 2021
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Fiona Mactaggart
    Description

    Coal seam gas (CSG) is an unconventional natural gas (UNG) that is extracted from wells via coal seams, and reserves are found in Australia, the USA and the UK. Other UNG include shale and tight gas, which are sourced from different geological formations and utilise similar processes to CSG mining, and are extracted in Canada, Europe, Asia, the Middle East and Australia. In recent decades, UNG extraction has grown exponentially, with hydraulic fracturing or ‘fracking’ occurring across regional and rural landscapes and in close proximity to communities. Whilst major development projects can facilitate employment and other opportunities in surrounding communities through population growth and increased demand for services, there is evidence that negative impacts on health and wellbeing can outweigh any benefits. Commonly referred to as the ‘resource curse’, when the costs of extraction and exporting natural resources outweigh the economic benefits, the expansion of CSG activity was often met with trepidation from local communities and the broader public. There was uncertainty around the impacts and consequences of rapid development, particularly in the USA and Australia, stemming from a lack of prior experience, mixed messages in the media, perceived lack of governmental support, and little empirical evidence. Presented with the opportunity to address the gap in the literature, this research explores the broader implications of mining activity on surrounding communities, with a focus on CSG and the social determinants of health and wellbeing. The level of community interaction throughout a project lifecycle is greater in CSG mine settings compared to traditional mining methods (like coal, for example) because of their proximity to communities, and so there is a greater expectation of the mining company to monitor and mitigate impacts on the communities in which they operate. There is emerging evidence that the extractives industry may play a more diverse role in regional communities than previously expected, but the pathways in which they do this in the health sector are not clear. Integral to the provision of health services in regional areas is the integration of services and partnerships – it is common for stakeholders external to the health sector, like transport, police or environmental departments to be involved in the planning and availability of health services. There is a dearth of scientific evidence of the ways in which the extractives industry interacts with the health system in the communities in which they operate; what the costs and benefits of this interaction might be and how the relationship might be optimized to enable long-lasting health improvements. This is particularly important in mining communities, where health outcomes could fluctuate with the various stages of mining activity, and more so in communities where mining activity is soon to cease, leading to uncertainty and economic downturn. Objectives This research was conducted in order to inform the regional and rural health sector, extractives industry, and communities who are undergoing a period of uncertainty with little peer reviewed evidence to provide objective direction. The research aims to: respond to the demand in understanding broader public health and wellbeing outcomes of mining beyond direct, physical and biological outcomes; contribute to the growing evidence base around CSG development and potential community-level impacts; and to comment on the interaction between stakeholders in the health system and the extractives industry at a local level.
    Methods This thesis has been organised in to three parts to meet the stated objectives: 1. Two systematic reviews to synthesise the evidence for broader, indirect health and wellbeing implications at community level associated with mining activity in low, middle and high income countries in order to provide a comprehensive account of how communities may be affected by mining; 2. Synthesis of qualitative data collected via a Health Needs Assessment (HNA) in Queensland, Australia to explore the determinants of health and wellbeing in communities living in proximity to CSG developments in order to strengthen understanding of how community and health services can prepare for fluctuations that might come with a mining boom or bust; and 3. Critically review regional health systems and the interaction between the extractives industry and key stakeholders at a local level in order to compile a set of recommendations that optimise health outcomes for local communities. Results Sixteen publications were included in the systematic review of high-income countries, and included studies that took place in the USA, Australia and Canada. Products mined included coal and mountain-top mining. There was evidence that mining activity can affect the social, physical and economic environment in which communities live, and these factors can in turn have adverse effects on health and wellbeing if not adequately measured and mitigated. Specific examples of self-reported health implications included increased risk of chronic disease and poor overall health, relationship breakdown, lack of social connectedness, and decreased access to health services. Twelve publications were included in the systematic review of low and middle-income countries, and included studies that took place in Ghana, Namibia, South Africa Tanzania, India, Brazil, Guatemala and French Guiana. Products mined included gold and silver, iron ore and platinum. Mining was perceived to influence health behaviours, employment conditions, livelihoods and socio-political factors, which were linked to poorer health outcomes. Family relationships, mental health and community cohesion were negatively associated with mining activity. High-risk health behaviours, population growth and changes in vector ecology from environmental modification was associated with increased infectious disease prevalence. The HNA was implemented in four towns in regional Queensland situated in proximity to CSG development. Eleven focus group discussions, nine in-depth interviews, and forty-five key informant interviews (KIIs) with health and community service providers and community members were conducted. Framework analysis was conducted following a recurrent theme that emerged from the qualitative data around health and wellbeing implications of the CSG industry. CSG mining was deemed a rapid development in the otherwise predominantly agricultural, rural communities. With this rapid development came fluctuations in the local economy, population, social structure and environmental conditions. There were perceived direct and indirect effects of CSG activity at an individual and community level, including impacts on alcohol and drug use; family relationships; social capital and mental health; and social connectedness, civic engagement and trust. Before examining the interaction between the health system and mining sector, it was important to describe the rural health system and its complementary parts. Systems theory underpinned analysis of qualitative data from KIIs to assist in describing the characteristics of the health system and unique influences on its functionality. Results showed that communities are closely interconnected with the health system, and that the rural health systems in the case study were defined by geography, climate and economic fluctuations. Understanding unique system pressures is important for recognising the impact that policy decisions may have on rural health. Decentralisation of decision making, greater flexibility and predictability of programs will assist in health system strengthening in rural areas. Another key theme emerged from the HNA: the mining sector played a diverse role in health and community service planning and delivery. Key informant transcripts were analysed again using phenomenology theory. Of these, 23 mentioned the presence of CSG or mining activity at least once during the interview without any specific reference to the extractives industry. Mining activity was perceived to influence the ability of service providers to meet demand, recruit and retain staff, and effectively plan and maintain programs. The level of interaction between mining companies with service providers and regulatory bodies varied and was commented on extensively. Several key informants identified pathways for the mining sector to engage with services more effectively, which included strengthening multi-sectoral engagement and enabling transparent, public consultation and evidence-based funding initiatives. Conclusion Unconventional natural gas extraction and the implications of mining activity on nearby communities is a subject of major concern internationally. Through the application of core public health theories and methodologies, including the Social Determinants of Health model, complex adaptive systems theory and health needs assessments; this thesis has significantly contributed to the discourse and demonstrated a significant association between mining activity and health. This thesis sought to strengthen the evidence base of the association between the extractives industry and the social determinants of health of surrounding communities, with a focus on the potential impacts of CSG developments. The hypothesis that there may be broader, direct and indirect impacts on health and wellbeing at an individual or community-level was tested and proven. The secondary aim was to examine the relationship of stakeholders in the local health system with the mining sector, with the intention to develop recommendations that improve measurement, monitoring and response to potential impacts of mining in surrounding communities. This research established that there are both common and unique health and wellbeing issues experienced by communities living in proximity to mining internationally. Our understanding

  19. r

    Plida blade mining analysis

    • researchdata.edu.au
    • acquire.cqu.edu.au
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khajidmaa Enkhbazar (2025). Plida blade mining analysis [Dataset]. http://doi.org/10.25946/29266406.V1
    Explore at:
    Dataset updated
    Aug 8, 2025
    Dataset provided by
    Central Queensland University
    Authors
    Khajidmaa Enkhbazar
    Description

    This research project builds upon my Master of Philosophy thesis which investigated the relationship between environmental performance, environmental management systems (EMS), ISO 14001 compliance, Sustainable Development Goals (SDGs), and the financial performance of Australian mining firms between 2016 and 2021. According to the requirements of PhD-level research, this upgraded proposal expands upon my previous work by introducing new variables, methodologies, and extended datasets to ensure substantial novelty and depth. Based on my current Master's research, this upgraded study will:

    • Utilize PLIDA alongside BLADE datasets to enhance data analysis capabilities and incorporate workforce demographics such as gender diversity, employee turnover, and injury rates (TRIFR/LTIFR).
    • Integrate innovation and R&D expenditure data from the BLADE dataset alongside with PLIDA, providing a comprehensive analysis of these variables and their impact on financial outcomes.
    • Extend the analytical timeframe from 2012–2023 to allow for longitudinal data insights.
    • Employ advanced statistical approaches, including moderation and interaction effects, and multilevel modelling.
    • Broaden the theoretical underpinning by incorporating Resource-Based View (RBV) and Institutional Theory to explain complex interactions among the variables.

    These enhancements significantly extend my Master's thesis and fulfill doctoral-level expectation for originality and methodological rigor.

  20. s

    Data supporting University of Southampton Doctoral Thesis entitled: Patterns...

    • eprints.soton.ac.uk
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Drennan, Regan; Glover, Adrian G; Copley, Jon; Linse, Katrin; Dahlgren, Thomas G; Taboada, Sergio; Arias, Maria Belen (2024). Data supporting University of Southampton Doctoral Thesis entitled: Patterns of diversity, connectivity, and evolution in southern ocean and deep-sea annelids [Dataset]. http://doi.org/10.5258/SOTON/D2958
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    University of Southampton
    Authors
    Drennan, Regan; Glover, Adrian G; Copley, Jon; Linse, Katrin; Dahlgren, Thomas G; Taboada, Sergio; Arias, Maria Belen
    Area covered
    Southern Ocean
    Description

    Data supporting University of Southampton Doctoral Thesis entitled: Patterns of diversity, connectivity, and evolution in southern ocean and deep-sea annelids (2024) by Regan Drennan See README for detail of each dataset. Data includes genomic data (Single Nucleotide Polymorphism (SNP) catalogs), held externally on ZENODO due to file size (DOI: 10.5281/zenodo.10606641).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
Organization logo

Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden"

Explore at:
zipAvailable download formats
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katharina Zinke; Katharina Zinke
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Dresden
Description

Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

## Data sources

Folder 01_SourceData/

- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

## Automatic classification

Folder 02_AutomaticClassification/

- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

- oddpub_results_wDOIs.csv (results file of the ODDPub classification)

- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

## Manual coding

Folder 03_ManualCheck/

- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

- ManualCheck_2023-06-08.csv (Manual coding results file)

- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

## Explorative analysis for the discoverability of open data

Folder04_FurtherAnalyses

Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

## R-Script

Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)

Search
Clear search
Close search
Google apps
Main menu