40 datasets found
  1. Data supporting the Master thesis "Monitoring von Open Data Praktiken -...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katharina Zinke; Katharina Zinke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Dresden
    Description

    Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

    This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

    The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

    ## Data sources

    Folder 01_SourceData/

    - PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

    - ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

    - ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

    - Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

    ## Automatic classification

    Folder 02_AutomaticClassification/

    - (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

    - (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

    - PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

    - oddpub_results_wDOIs.csv (results file of the ODDPub classification)

    - PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

    ## Manual coding

    Folder 03_ManualCheck/

    - CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

    - ManualCheck_2023-06-08.csv (Manual coding results file)

    - PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

    ## Explorative analysis for the discoverability of open data

    Folder04_FurtherAnalyses

    Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

    ## R-Script

    Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)

  2. f

    Experimental data for "Software Data Analytics: Architectural Model...

    • figshare.com
    zip
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cong Liu (2023). Experimental data for "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection" [Dataset]. http://doi.org/10.4121/uuid:ca1b0690-d9c5-4626-a067-525ec9d5881b
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Cong Liu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset includes all experimental data used for the PhD thesis of Cong Liu, entitled "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection". These data are generated by instrumenting both synthetic and real-life software systems, and are formated according to the IEEE XES format. See http://www.xes-standard.org/ and https://www.win.tue.nl/ieeetfpm/lib/exe/fetch.php?media=shared:downloads:2017-06-22-xes-software-event-v5-2.pdf for more explanations.

  3. H

    Thesis Code

    • dataverse.harvard.edu
    • datamed.org
    Updated Dec 30, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    weicen zhang (2015). Thesis Code [Dataset]. http://doi.org/10.7910/DVN/TIIJAK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 30, 2015
    Dataset provided by
    Harvard Dataverse
    Authors
    weicen zhang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Code for getting data,mining text and estimatingVAR model

  4. Categorization of doctoral theses.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Categorization of doctoral theses. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Categorization of doctoral theses.

  5. 4

    Data and software accompanying the doctoral dissertation: Automated...

    • data.4tu.nl
    zip
    Updated Feb 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cagri Tekinay (2022). Data and software accompanying the doctoral dissertation: Automated abstraction of discrete-event simulation models using state-trace data [Dataset]. http://doi.org/10.4121/19224927.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 28, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Cagri Tekinay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    State-trace data and source code accompanying Chapter 3,4,5 and Appendix A of the dissertation: Automated abstraction of discrete-event simulation models using state-trace data.

  6. r

    Data from: Scaling data mining in massively parallel dataflow systems

    • resodate.org
    Updated Feb 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Schelter (2016). Scaling data mining in massively parallel dataflow systems [Dataset]. http://doi.org/10.14279/depositonce-4982
    Explore at:
    Dataset updated
    Feb 5, 2016
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Sebastian Schelter
    Description

    This thesis lays the ground work for enabling scalable data mining in massively parallel dataflow systems, using large datasets. Such datasets have become ubiquitous. We illustrate common fallacies with respect to scalable data mining: It is in no way sufficient to naively implement textbook algorithms on parallel systems; bottlenecks on all layers of the stack prevent the scalability of such naive implementations. We argue that scalability in data mining is a multi-leveled problem and must therefore be approached on the interplay of algorithms, systems, and applications. We therefore discuss a selection of scalability problems on these different levels. We investigate algorithm-specific scalability aspects of collaborative filtering algorithms for computing recommendations, a popular data mining use case with many industry deployments. We show how to efficiently execute the two most common approaches, namely neighborhood methods and latent factor models on MapReduce, and describe a specialized architecture for scaling collaborative filtering to extremely large datasets which we implemented at Twitter. We turn to system-specific scalability aspects, where we improve system performance during the distributed execution of a special class of iterative algorithms by drastically reducing the overhead required for guaranteeing fault tolerance. Therefore we propose a novel optimistic approach to fault-tolerance which exploits the robust convergence properties of a large class of fixpoint algorithms and does not incur measurable overhead in failure-free cases. Finally, we present work on an application-specific scalability aspect of scalable data mining. A common problem when deploying machine learning applications in real-world scenarios is that the prediction quality of ML models heavily depends on hyperparameters that have to be chosen in advance. We propose an algorithmic framework for an important subproblem occuring during hyperparameter search at scale: efficiently generating samples from block-partitioned matrices in a shared-nothing environment. For every selected problem, we show how to execute the resulting computation automatically in a parallel and scalable manner, and evaluate our proposed solution on large datasets with billions of datapoints.

  7. f

    Performance of the algorithm.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Performance of the algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of the algorithm.

  8. Electronic Invoicing Event Logs

    • search.datacite.org
    • data.4tu.nl
    • +1more
    Updated Jun 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Almir Djedović (2018). Electronic Invoicing Event Logs [Dataset]. http://doi.org/10.4121/uuid:5a9039b8-794a-4ccd-a5ef-4671f0a258a4
    Explore at:
    Dataset updated
    Jun 27, 2018
    Dataset provided by
    DataCitehttps://www.datacite.org/
    4TU.Centre for Research Data
    Authors
    Almir Djedović
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This set of data contains information about the process execution of electronic invoicing. The process of electronic invoicing contains the following activities: invoice scanning, approve invoice, liquidation and so on. The data set contains information about the event name, event type, time of the event's execution and the participant whose execution the event is related to. The data is formatted in the MXML format in order to be used for the process mining analysis using tools such as ProM and so on.

  9. B

    Canadian Copper Mining Data - D Young Thesis

    • borealisdata.ca
    • search.dataone.org
    Updated Nov 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Denise Young (2016). Canadian Copper Mining Data - D Young Thesis [Dataset]. http://doi.org/10.7939/DVN/10950
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 10, 2016
    Dataset provided by
    Borealis
    Authors
    Denise Young
    License

    https://borealisdata.ca/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7939/DVN/10950https://borealisdata.ca/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7939/DVN/10950

    Time period covered
    1953 - 1984
    Area covered
    Canada
    Description

    Mine-level copper data (1953-1984) used in Young, D. (1992), "Cost Specification and Firm Behaviour in a Hotelling Model of Resource Extraction," Canadian Journal of Economics XXV, 41-59. Spreadsheet has 5 tabs (including data and explanatory materials).

  10. Previous works comparative table.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Previous works comparative table. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Previous works comparative table.

  11. 4

    Pattern Mining for Label Ranking

    • data.4tu.nl
    • figshare.com
    zip
    Updated May 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    C.F. (Cláudio) Pinho Rebelo de Sá (2017). Pattern Mining for Label Ranking [Dataset]. http://doi.org/10.4121/uuid:21b1959d-9196-423e-94d0-53883fb0ff21
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 8, 2017
    Dataset provided by
    LIACS
    Authors
    C.F. (Cláudio) Pinho Rebelo de Sá
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    Label Ranking datasets used in the PhD thesis "Pattern Mining for Label Ranking"

  12. n

    Real-world VRP data with realistic non-standard constraints - parameter...

    • narcis.nl
    • data.4tu.nl
    Updated Dec 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emir Žunić (2018). Real-world VRP data with realistic non-standard constraints - parameter setting problem regression input data [Dataset]. http://doi.org/10.4121/uuid:97006624-d6a3-4a29-bffa-e8daf60699d8
    Explore at:
    media types: application/vnd.ms-excel, text/plainAvailable download formats
    Dataset updated
    Dec 14, 2018
    Dataset provided by
    4TU.Centre for Research Data
    Authors
    Emir Žunić
    Description

    This file is in Excel (xls) format, and contains data about regression model for input and output parameters (constants) that can be used for the solving of real-world vehicle routing problems with realistic non-standard constraints. All data are real and obtained experimentally by using VRP algorithm on production environment in one of the biggest distribution companies in Bosnia and Herzegovina.

  13. Bitcoin data part three from Jan 2009 to Feb 2018

    • kaggle.com
    zip
    Updated Apr 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZouJiu (2020). Bitcoin data part three from Jan 2009 to Feb 2018 [Dataset]. https://www.kaggle.com/datasets/shiheyingzhe/bitcoin-data-part-three-from-jan-2009-to-feb-2018
    Explore at:
    zip(11036310629 bytes)Available download formats
    Dataset updated
    Apr 18, 2020
    Authors
    ZouJiu
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    During my Senior in the Shan Dong University, my tutor give me research direction of University thesis, which is bitcoin transaction data analysis, so I crawled all of bitcoin transaction data from January 2009 to February 2018.I make statistical analysis and quantitative analysis,I hope this data will give you some help, data mining is interesting and helping not only in the skill of data mining but also in our life.

    I crawled these data from website https://www.blockchain.com/explorer, each file contains many blocks,the scope of blocks is reflected in the file name,e.g. this file 0-68732.csv is composed of zero block which is also called genesis block until 68732 block.if a block that didn't have input is not in this file. let's see the columns and rows, there has five columns, the Height column represent block height,the Input column represent the input address of this block,the Output column represent the output address of this block,the Sum column represent bitcoin transaction amount corresponding to the Output,the Time column represent the generation time of this block.A block contains many transactions.

    The page is part three of all data, others can be found here https://www.kaggle.com/shiheyingzhe/datasets

  14. r

    Data from: Extending the knowledge base of foresight

    • resodate.org
    Updated Apr 14, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victoria Kayser (2016). Extending the knowledge base of foresight [Dataset]. http://doi.org/10.14279/depositonce-5098
    Explore at:
    Dataset updated
    Apr 14, 2016
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Victoria Kayser
    Description

    The future is shaped and influenced by decisions made today. These decisions need to be made on a solid ground and diverse information sources should be considered in the decision process. For exploring different futures, foresight offers a wide range of methods for gaining insights. The starting point of this thesis is the observation that recent foresight methods particularly use patent and publication data or rely on expert opinion, but few other data sources are used. In times of big data, many other options exist and, for example, social media or websites are currently not a major part of these deliberations. While the volume of data from heterogeneous sources grows considerably, foresight and its methods rarely benefit from such available data. One attempt to access and systematically examine this data is text mining that processes textual data in a largely automated manner. Therefore, this thesis addresses the contribution of text mining and further textual data sources for foresight and its methods. After clarifying the potential of combining text mining and foresight, four concrete examples are outlined. As the results show, the existing foresight methods are improved as exemplified by roadmapping and scenario development. By exploiting new data sources (e.g., Twitter and web mining), new options evolve for analyzing data. Thus, more actors and views are integrated, and more emphasis is laid on analyzing social changes. Summarized, using text mining enhances the detection and examination of emerging topics and technologies by extending the knowledge base of foresight. Hence, new foresight applications can be designed. And, in particular, text mining is promising for explorative approaches that require a solid base for reflecting on possible futures.

  15. f

    Mapping vectors to words.

    • figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Mapping vectors to words. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mapping vectors to words.

  16. 4

    Beyoglu Preservation Area Building Features Database

    • data.4tu.nl
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Jun 13, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahu Sokmenoglu Sohtorik (2016). Beyoglu Preservation Area Building Features Database [Dataset]. http://doi.org/10.4121/uuid:37ccd095-8523-419f-97e1-b6368a821a4f
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 13, 2016
    Dataset provided by
    TU Delft
    Authors
    Ahu Sokmenoglu Sohtorik
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Time period covered
    2008
    Area covered
    Description

    The Beyoglu Preservation Area Building Features Database. A large and quite comprehensive GIS database was constructed in order to implement the data mining analysis, based mainly on the traditional thematic maps of the Master Plan for the Beyoğlu Preservation Area. This database consists of 45 spatial and non-spatial features attributed to the 11,984 buildings located in the Beyoğlu Preservation Area and it is one of the original outputs of the PhD Thesis entitled "A Knowledge Discovery Approach to Urban Analysis: The Beyoglu Preservation Area as a data mine".

  17. n

    Data from: Improving Scientific Information Extraction with Text Generation

    • curate.nd.edu
    pdf
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingkai Zeng (2025). Improving Scientific Information Extraction with Text Generation [Dataset]. http://doi.org/10.7274/28571045.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 9, 2025
    Dataset provided by
    University of Notre Dame
    Authors
    Qingkai Zeng
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Description

    As research communities expand, the number of scientific articles continues to grow rapidly, with no signs of slowing. This information overload drives the need for automated tools to identify relevant materials and extract key ideas. Information extraction (IE) focuses on converting unstructured scientific text into structured knowledge (e.g., ontologies, taxonomies, and knowledge graphs), enabling intelligent systems to excel in tasks like document organization, scientific literature retrieval and recommendation, claim verification even novel idea or hypothesis generation. To pinpoint the scope of this thesis, I focus on the taxonomic structure in this thesis to represent the knowledge in the scientific domain.

    To construct a taxonomy from scientific corpora, traditional methods often rely on pipeline frameworks. These frameworks typically follow a sequence: first, extracting scientific concepts or entities from the corpus; second, identifying hierarchical relationships between the concepts; and finally, organizing these relationships into a cohesive taxonomy. However, such methods encounter several challenges: (1) the quality of the corpus or annotation data, (2) error propagation within the pipeline framework, and (3) limited generalization and transferability to other specific domains. The development of large language models (LLMs) offers promising advancements, as these models have demonstrated remarkable abilities to internalize knowledge and respond effectively to a wide range of inquiries. Unlike traditional pipeline-based approaches, generative methods harness LLMs to achieve (1) better utilization of their internalized knowledge, (2) direct text-to-knowledge conversion, and (3) flexible, schema-free adaptability.

    This thesis explores innovative methods for integrating text generation technologies to improve IE in the scientific domain, with a focus on taxonomy construction. The approach begins with generating entity names and evolves to create or enrich taxonomies directly via text generation. I will explore combining neighborhood structural context, descriptive textual information, and LLMs' internal knowledge to improve output quality. Finally, this thesis will outline future research directions.

  18. Key indicators.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Key indicators. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Key indicators.

  19. d

    Data from: Geospatial Files for the Geologic Map of the Stibnite Mining...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Geospatial Files for the Geologic Map of the Stibnite Mining Area, Valley County, Idaho [Dataset]. https://catalog.data.gov/dataset/geospatial-files-for-the-geologic-map-of-the-stibnite-mining-area-valley-county-idaho
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Idaho, Stibnite, Valley County
    Description

    These geospatial files are the essential components for the Geologic Map of the Stibnite Mining Area in Valley County, Idaho, which was published by the Idaho Geological Survey in 2022. Three main file types are in this dataset: geographic, geologic, and mining. Geographic files are map extent, lidar base, topographic contours, labels for contours, waterways, and roads. Geologic files are geologic map units, faults, structural lines meaning axial traces, structural points like bedding strike and dip locations, cross section lines, and drill core sample locations. Lastly, mining files are disturbed ground features including open pit polygons or outlines, and general mining features such as the location of an adit. File formats are shape, layer, or raster. Of the 14 shapefiles, 7 have layer files that provide pre-set symbolization for use in ESRI ArcMap that match up with the Geologic Map of the Stibnite Mining Area in Valley County, Idaho. The lidar data have two similar, but distinct, raster format types (ESRI GRID and TIFF) intended to increase end user accessibility. This dataset is a compilation of both legacy data (from Smitherman’s 1985 masters thesis published in 1988, Midas Gold Corporation employees, the Geologic Map of the Stibnite Quadrangle (Stewart and others, 2016) and Reed S. Lewis of the Idaho Geological Survey) and new data from 2013, 2015, and 2016 field work by Niki E. Wintzer.

  20. Slums and informal settlements detection

    • kaggle.com
    zip
    Updated Jul 13, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federico Baylé (2017). Slums and informal settlements detection [Dataset]. https://www.kaggle.com/fedebayle/slums-argentina
    Explore at:
    zip(709576870 bytes)Available download formats
    Dataset updated
    Jul 13, 2017
    Authors
    Federico Baylé
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    This datasets comes from a personal project that begun with my MSc thesis in Data Mining at Buenos Aires University. There I detect slums and informal settlements for La Matanza (Buenos Aires) district. The algorithm developed there helps to reduce to 15% of the total territory to analyze.

    After successfully finish the thesis, I created a map of slums for whole Argentina. Map and thesis content are available at fedebayle.github.io/potencialesvya.

    As far as I know, this is the first research of its kind in Argentina, which I think would help my country to contribute to UN Millennium Development Goal 7, Target 11, "Improving the lives of 100 million slum dwellers".

    Content

    This datasets contains georeferenced images about urban slums and informal settlements for two districts in Argentina: Buenos Aires and Córdoba (~ 15M habitants).

    The image of Cordoba was taken on 2017-06-09 and the images of Buenos Aires on 2017-05-04.

    Each image comes from Sentinel-2 sensor, with 32x32px and 4 bands (bands 2, 3, 4, 8A, 10 meter resolution). Those who prefix is "vya_" contains slums (positive class). Sentinel-2 is an Earth observation mission developed by ESA as part of the Copernicus Programme to perform terrestrial observations in support of services such as forest monitoring, land cover changes detection, and natural disaster management.

    Images are in .tif format.

    Image names consist of:

    (vya_)[tile id]_[raster row start]_[raster row end]_[raster column start]_[raster column end].tif

    This is a highly imbalanced class problem:

    • Córdoba: 37 of 13,004 images corresponds to slums.
    • Buenos Aires: 1,008 of 46,047 images corresponds to slums.

    Acknowledgements

    I would not have been able to create this dataset if the Sentinel program did not exist. Thanks to European Space Agency!

    Inspiration

    The cost of conducting a survey of informal settlements and slums is high and requires copious logistical resources. In Argentina, these surveys have been conducted only each 10 years at census.

    Algorithms developed with this data could be used in different countries and help to fight poverty around the world.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
Organization logo

Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden"

Explore at:
zipAvailable download formats
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katharina Zinke; Katharina Zinke
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Dresden
Description

Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

## Data sources

Folder 01_SourceData/

- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

## Automatic classification

Folder 02_AutomaticClassification/

- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

- oddpub_results_wDOIs.csv (results file of the ODDPub classification)

- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

## Manual coding

Folder 03_ManualCheck/

- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

- ManualCheck_2023-06-08.csv (Manual coding results file)

- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

## Explorative analysis for the discoverability of open data

Folder04_FurtherAnalyses

Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

## R-Script

Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)

Search
Clear search
Close search
Google apps
Main menu