39 datasets found
  1. Data supporting the Master thesis "Monitoring von Open Data Praktiken -...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katharina Zinke; Katharina Zinke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

    This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

    The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

    ## Data sources

    Folder 01_SourceData/

    - PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

    - ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

    - ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

    - Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

    ## Automatic classification

    Folder 02_AutomaticClassification/

    - (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

    - (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

    - PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

    - oddpub_results_wDOIs.csv (results file of the ODDPub classification)

    - PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

    ## Manual coding

    Folder 03_ManualCheck/

    - CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

    - ManualCheck_2023-06-08.csv (Manual coding results file)

    - PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

    ## Explorative analysis for the discoverability of open data

    Folder04_FurtherAnalyses

    Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

    ## R-Script

    Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)

  2. f

    Experimental data for "Software Data Analytics: Architectural Model...

    • figshare.com
    • data.4tu.nl
    zip
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cong Liu (2023). Experimental data for "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection" [Dataset]. http://doi.org/10.4121/uuid:ca1b0690-d9c5-4626-a067-525ec9d5881b
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Cong Liu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset includes all experimental data used for the PhD thesis of Cong Liu, entitled "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection". These data are generated by instrumenting both synthetic and real-life software systems, and are formated according to the IEEE XES format. See http://www.xes-standard.org/ and https://www.win.tue.nl/ieeetfpm/lib/exe/fetch.php?media=shared:downloads:2017-06-22-xes-software-event-v5-2.pdf for more explanations.

  3. H

    Thesis Code

    • dataverse.harvard.edu
    • datamed.org
    Updated Dec 30, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    weicen zhang (2015). Thesis Code [Dataset]. http://doi.org/10.7910/DVN/TIIJAK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 30, 2015
    Dataset provided by
    Harvard Dataverse
    Authors
    weicen zhang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Code for getting data,mining text and estimatingVAR model

  4. f

    Gender classification of PA-100K dataset, a Pedestrian Attribute dataset

    • figshare.com
    • data.4tu.nl
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P. (Panagiotis) Soilis (2023). Gender classification of PA-100K dataset, a Pedestrian Attribute dataset [Dataset]. http://doi.org/10.4121/uuid:38dab37c-1179-495e-b357-0568b9aaaa7a
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    P. (Panagiotis) Soilis
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset is based on the work of Liu et al and their paper "Hydraplus-net: Attentive deep features for pedestrian analysis". In our work, we structure the images for a gender classification task based on the gender attribute annotated. Moreover, we pre-process the images to a 75x75 dimension that can be used by pre-trained deep learning models.

  5. f

    Categorization of doctoral theses.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Categorization of doctoral theses. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Categorization of doctoral theses.

  6. d

    Canadian Copper Mining Data - D Young Thesis

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young, Denise (2023). Canadian Copper Mining Data - D Young Thesis [Dataset]. http://doi.org/10.7939/DVN/10950
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Young, Denise
    Time period covered
    Jan 1, 1953 - Jan 1, 1984
    Description

    Mine-level copper data (1953-1984) used in Young, D. (1992), "Cost Specification and Firm Behaviour in a Hotelling Model of Resource Extraction," Canadian Journal of Economics XXV, 41-59. Spreadsheet has 5 tabs (including data and explanatory materials).

  7. f

    Electronic Invoicing Event Logs

    • figshare.com
    • search.datacite.org
    xml
    Updated Jun 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Almir Djedović (2023). Electronic Invoicing Event Logs [Dataset]. http://doi.org/10.4121/uuid:5a9039b8-794a-4ccd-a5ef-4671f0a258a4
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Jun 18, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Almir Djedović
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This set of data contains information about the process execution of electronic invoicing. The process of electronic invoicing contains the following activities: invoice scanning, approve invoice, liquidation and so on. The data set contains information about the event name, event type, time of the event's execution and the participant whose execution the event is related to. The data is formatted in the MXML format in order to be used for the process mining analysis using tools such as ProM and so on.

  8. f

    Pattern Mining for Label Ranking

    • figshare.com
    • data.4tu.nl
    zip
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    C.F. (Cláudio) Pinho Rebelo de Sá (2023). Pattern Mining for Label Ranking [Dataset]. http://doi.org/10.4121/uuid:21b1959d-9196-423e-94d0-53883fb0ff21
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    C.F. (Cláudio) Pinho Rebelo de Sá
    License

    https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use

    Description

    Label Ranking datasets used in the PhD thesis "Pattern Mining for Label Ranking"

  9. n

    Real-world VRP data with realistic non-standard constraints - parameter...

    • narcis.nl
    Updated Dec 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emir Žunić (2018). Real-world VRP data with realistic non-standard constraints - parameter setting problem regression input data [Dataset]. http://doi.org/10.4121/uuid:97006624-d6a3-4a29-bffa-e8daf60699d8
    Explore at:
    media types: application/vnd.ms-excel, text/plainAvailable download formats
    Dataset updated
    Dec 14, 2018
    Dataset provided by
    4TU.Centre for Research Data
    Authors
    Emir Žunić
    Description

    This file is in Excel (xls) format, and contains data about regression model for input and output parameters (constants) that can be used for the solving of real-world vehicle routing problems with realistic non-standard constraints. All data are real and obtained experimentally by using VRP algorithm on production environment in one of the biggest distribution companies in Bosnia and Herzegovina.

  10. Performance parameters.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Performance parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance parameters.

  11. Bitcoin data part three from Jan 2009 to Feb 2018

    • kaggle.com
    Updated Apr 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZouJiu (2020). Bitcoin data part three from Jan 2009 to Feb 2018 [Dataset]. https://www.kaggle.com/shiheyingzhe/bitcoin-data-part-three-from-jan-2009-to-feb-2018/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ZouJiu
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    During my Senior in the Shan Dong University, my tutor give me research direction of University thesis, which is bitcoin transaction data analysis, so I crawled all of bitcoin transaction data from January 2009 to February 2018.I make statistical analysis and quantitative analysis,I hope this data will give you some help, data mining is interesting and helping not only in the skill of data mining but also in our life.

    I crawled these data from website https://www.blockchain.com/explorer, each file contains many blocks,the scope of blocks is reflected in the file name,e.g. this file 0-68732.csv is composed of zero block which is also called genesis block until 68732 block.if a block that didn't have input is not in this file. let's see the columns and rows, there has five columns, the Height column represent block height,the Input column represent the input address of this block,the Output column represent the output address of this block,the Sum column represent bitcoin transaction amount corresponding to the Output,the Time column represent the generation time of this block.A block contains many transactions.

    The page is part three of all data, others can be found here https://www.kaggle.com/shiheyingzhe/datasets

  12. f

    Previous works comparative table.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Previous works comparative table. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Previous works comparative table.

  13. Software Developer Expertise GitHub and Stack Overflow data sets

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, html, txt
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Norbert Eke; Olga Baysal; Norbert Eke; Olga Baysal (2025). Software Developer Expertise GitHub and Stack Overflow data sets [Dataset]. http://doi.org/10.5281/zenodo.3696079
    Explore at:
    csv, html, bin, txtAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Norbert Eke; Olga Baysal; Norbert Eke; Olga Baysal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cross-Platform Software Developer Expertise Learning by Norbert Eke

    This data set is part of my Master's thesis project on developer expertise learning by mining Stack Overflow (SOTorrent) and Github (GHTorrent) data. Check out my portfolio website at norberte.github.io

  14. n

    Data from: Improving Scientific Information Extraction with Text Generation

    • curate.nd.edu
    pdf
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingkai Zeng (2025). Improving Scientific Information Extraction with Text Generation [Dataset]. http://doi.org/10.7274/28571045.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 9, 2025
    Dataset provided by
    University of Notre Dame
    Authors
    Qingkai Zeng
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Description

    As research communities expand, the number of scientific articles continues to grow rapidly, with no signs of slowing. This information overload drives the need for automated tools to identify relevant materials and extract key ideas. Information extraction (IE) focuses on converting unstructured scientific text into structured knowledge (e.g., ontologies, taxonomies, and knowledge graphs), enabling intelligent systems to excel in tasks like document organization, scientific literature retrieval and recommendation, claim verification even novel idea or hypothesis generation. To pinpoint the scope of this thesis, I focus on the taxonomic structure in this thesis to represent the knowledge in the scientific domain.

    To construct a taxonomy from scientific corpora, traditional methods often rely on pipeline frameworks. These frameworks typically follow a sequence: first, extracting scientific concepts or entities from the corpus; second, identifying hierarchical relationships between the concepts; and finally, organizing these relationships into a cohesive taxonomy. However, such methods encounter several challenges: (1) the quality of the corpus or annotation data, (2) error propagation within the pipeline framework, and (3) limited generalization and transferability to other specific domains. The development of large language models (LLMs) offers promising advancements, as these models have demonstrated remarkable abilities to internalize knowledge and respond effectively to a wide range of inquiries. Unlike traditional pipeline-based approaches, generative methods harness LLMs to achieve (1) better utilization of their internalized knowledge, (2) direct text-to-knowledge conversion, and (3) flexible, schema-free adaptability.

    This thesis explores innovative methods for integrating text generation technologies to improve IE in the scientific domain, with a focus on taxonomy construction. The approach begins with generating entity names and evolves to create or enrich taxonomies directly via text generation. I will explore combining neighborhood structural context, descriptive textual information, and LLMs' internal knowledge to improve output quality. Finally, this thesis will outline future research directions.

  15. d

    Geospatial Files for the Geologic Map of the Stibnite Mining Area, Valley...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Geospatial Files for the Geologic Map of the Stibnite Mining Area, Valley County, Idaho [Dataset]. https://catalog.data.gov/dataset/geospatial-files-for-the-geologic-map-of-the-stibnite-mining-area-valley-county-idaho
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Valley County, Stibnite, Idaho
    Description

    These geospatial files are the essential components for the Geologic Map of the Stibnite Mining Area in Valley County, Idaho, which was published by the Idaho Geological Survey in 2022. Three main file types are in this dataset: geographic, geologic, and mining. Geographic files are map extent, lidar base, topographic contours, labels for contours, waterways, and roads. Geologic files are geologic map units, faults, structural lines meaning axial traces, structural points like bedding strike and dip locations, cross section lines, and drill core sample locations. Lastly, mining files are disturbed ground features including open pit polygons or outlines, and general mining features such as the location of an adit. File formats are shape, layer, or raster. Of the 14 shapefiles, 7 have layer files that provide pre-set symbolization for use in ESRI ArcMap that match up with the Geologic Map of the Stibnite Mining Area in Valley County, Idaho. The lidar data have two similar, but distinct, raster format types (ESRI GRID and TIFF) intended to increase end user accessibility. This dataset is a compilation of both legacy data (from Smitherman’s 1985 masters thesis published in 1988, Midas Gold Corporation employees, the Geologic Map of the Stibnite Quadrangle (Stewart and others, 2016) and Reed S. Lewis of the Idaho Geological Survey) and new data from 2013, 2015, and 2016 field work by Niki E. Wintzer.

  16. Key indicators.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Key indicators. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Key indicators.

  17. m

    Criteria for evaluating and qualifying public datasets obtained from the...

    • data.mendeley.com
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gyslla Vasconcelos (2025). Criteria for evaluating and qualifying public datasets obtained from the Brazilian Federal Government's Open Data Portal - dados.gov [Dataset]. http://doi.org/10.17632/x8sgcykthn.2
    Explore at:
    Dataset updated
    May 19, 2025
    Authors
    Gyslla Vasconcelos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These criteria (file 1) were drawn up empirically, based on the practical challenges faced during the development of the thesis research, based on tests carried out with various datasets applied to process mining tools. These criteria were elaborated empirically, based on the practical challenges faced during the development of the thesis research, based on tests conducted with various datasets applied to process mining tools. These criteria were prepared with the aim of creating a ranking of the datasets selected and published (https://doi.org/10.6084/m9.figshare.25514884.v3), in order to classify them according to their score. The criteria are divided into informative (In), importance (I), difficulty (D) and ease (F) of handling (file 2). The datasets were selected (file 3) and, for ranking, calculations were made (file 5) to normalize the values for standardization (file 4). This data is part of a study on the application of process mining techniques to Brazilian public service data, available on the open data portal dados.gov.

  18. R

    Road Segmentation Cctv Merge Dataset

    • universe.roboflow.com
    zip
    Updated Jun 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    road001 (2025). Road Segmentation Cctv Merge Dataset [Dataset]. https://universe.roboflow.com/road001/road-segmentation-cctv-merge/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2025
    Dataset authored and provided by
    road001
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Road Road AQlm Polygons
    Description
  19. Z

    Tinkerforge environmental datasets

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tinkerforge environmental datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1468441
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Miguel Yuste Fernández Alonso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Collection of environmental datasets recorded with Tinkerforge sensors and used in the development of a bachelor thesis on the topic of frequent pattern mining. The data was collected in several locations in the city of Graz, Austria, as well as an additional dataset recorded in Santander, Spain. The following bricklets were used:

    Graz datasets (i12, library_at, mensa_nt, muenzgrabenstrasse, neutorgasse, studienzentrum, vguh, kaiserfeldgasse):

    Barometer Bricklet

    Moisture Bricklet

    Sound Intensity Bricklet

    Ambient Light Bricklet

    Humidity Bricklet

    Temperature Bricklet

    CO2 Bricklet

    Motion Detector Bricklet

    Barometer Bricklet

    Santander dataset:

    Motion Detector Bricklet

    Ambient Light Bricklet 2.0

    Sound Intensity Bricklet

    Temperature Bricklet

    Humidity Bricklet

    CO2 Bricklet

    Accelerometer Bricklet

    Barometer Bricklet (recording also altitude)

    Additionally, the datasets contain the voltage and chip temperature readings of the Master Brick.

    It should be noted that Tinkerforge bricklets occasionally do not manage to write their recorded values in the time window between two recording frames, and they can also suffer from other disruptions. This produces a considerable number of instances that do not include the data of all sensors (incomplete instants), as well as some readings flagged as erroneous, which should be taken into account when working with the datasets.

  20. f

    Performance of the algorithm.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Performance of the algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of the algorithm.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
Organization logo

Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden"

Explore at:
zipAvailable download formats
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katharina Zinke; Katharina Zinke
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

## Data sources

Folder 01_SourceData/

- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

## Automatic classification

Folder 02_AutomaticClassification/

- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

- oddpub_results_wDOIs.csv (results file of the ODDPub classification)

- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

## Manual coding

Folder 03_ManualCheck/

- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

- ManualCheck_2023-06-08.csv (Manual coding results file)

- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

## Explorative analysis for the discoverability of open data

Folder04_FurtherAnalyses

Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

## R-Script

Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)

Search
Clear search
Close search
Google apps
Main menu