39 datasets found

Data supporting the Master thesis "Monitoring von Open Data Praktiken -...
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14196539
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katharina Zinke; Katharina Zinke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

## Data sources

Folder 01_SourceData/

- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

## Automatic classification

Folder 02_AutomaticClassification/

- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

- oddpub_results_wDOIs.csv (results file of the ODDPub classification)

- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

## Manual coding

Folder 03_ManualCheck/

- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

- ManualCheck_2023-06-08.csv (Manual coding results file)

- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

## Explorative analysis for the discoverability of open data

Folder04_FurtherAnalyses

Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

## R-Script

Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)
f
Experimental data for "Software Data Analytics: Architectural Model...
figshare.com
data.4tu.nl
zip
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cong Liu (2023). Experimental data for "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection" [Dataset]. http://doi.org/10.4121/uuid:ca1b0690-d9c5-4626-a067-525ec9d5881b
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:ca1b0690-d9c5-4626-a067-525ec9d5881b
Dataset updated
Jun 6, 2023
Dataset provided by
4TU.ResearchData
Authors
Cong Liu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes all experimental data used for the PhD thesis of Cong Liu, entitled "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection". These data are generated by instrumenting both synthetic and real-life software systems, and are formated according to the IEEE XES format. See http://www.xes-standard.org/ and https://www.win.tue.nl/ieeetfpm/lib/exe/fetch.php?media=shared:downloads:2017-06-22-xes-software-event-v5-2.pdf for more explanations.
H
Thesis Code
dataverse.harvard.edu
datamed.org
Updated Dec 30, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
weicen zhang (2015). Thesis Code [Dataset]. http://doi.org/10.7910/DVN/TIIJAK
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/TIIJAK
Dataset updated
Dec 30, 2015
Dataset provided by
Harvard Dataverse
Authors
weicen zhang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Code for getting data,mining text and estimatingVAR model
f
Gender classification of PA-100K dataset, a Pedestrian Attribute dataset
figshare.com
data.4tu.nl
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. (Panagiotis) Soilis (2023). Gender classification of PA-100K dataset, a Pedestrian Attribute dataset [Dataset]. http://doi.org/10.4121/uuid:38dab37c-1179-495e-b357-0568b9aaaa7a
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:38dab37c-1179-495e-b357-0568b9aaaa7a
Dataset updated
Jun 1, 2023
Dataset provided by
4TU.ResearchData
Authors
P. (Panagiotis) Soilis
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset is based on the work of Liu et al and their paper "Hydraplus-net: Attentive deep features for pedestrian analysis". In our work, we structure the images for a gender classification task based on the gender attribute annotated. Moreover, we pre-process the images to a 75x75 dimension that can be used by pre-trained deep learning models.
f
Categorization of doctoral theses.
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Categorization of doctoral theses. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t003
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorization of doctoral theses.
d
Canadian Copper Mining Data - D Young Thesis
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Young, Denise (2023). Canadian Copper Mining Data - D Young Thesis [Dataset]. http://doi.org/10.7939/DVN/10950
Explore at:
Unique identifier
https://doi.org/10.7939/DVN/10950
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Young, Denise
Time period covered
Jan 1, 1953 - Jan 1, 1984
Description
Mine-level copper data (1953-1984) used in Young, D. (1992), "Cost Specification and Firm Behaviour in a Hotelling Model of Resource Extraction," Canadian Journal of Economics XXV, 41-59. Spreadsheet has 5 tabs (including data and explanatory materials).
f
Electronic Invoicing Event Logs
figshare.com
search.datacite.org
xml
Updated Jun 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Almir Djedović (2023). Electronic Invoicing Event Logs [Dataset]. http://doi.org/10.4121/uuid:5a9039b8-794a-4ccd-a5ef-4671f0a258a4
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:5a9039b8-794a-4ccd-a5ef-4671f0a258a4
Dataset updated
Jun 18, 2023
Dataset provided by
4TU.ResearchData
Authors
Almir Djedović
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This set of data contains information about the process execution of electronic invoicing. The process of electronic invoicing contains the following activities: invoice scanning, approve invoice, liquidation and so on. The data set contains information about the event name, event type, time of the event's execution and the participant whose execution the event is related to. The data is formatted in the MXML format in order to be used for the process mining analysis using tools such as ProM and so on.
f
Pattern Mining for Label Ranking
figshare.com
data.4tu.nl
zip
Updated Jun 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
C.F. (Cláudio) Pinho Rebelo de Sá (2023). Pattern Mining for Label Ranking [Dataset]. http://doi.org/10.4121/uuid:21b1959d-9196-423e-94d0-53883fb0ff21
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:21b1959d-9196-423e-94d0-53883fb0ff21
Dataset updated
Jun 19, 2023
Dataset provided by
4TU.ResearchData
Authors
C.F. (Cláudio) Pinho Rebelo de Sá
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Description
Label Ranking datasets used in the PhD thesis "Pattern Mining for Label Ranking"
n
Real-world VRP data with realistic non-standard constraints - parameter...
narcis.nl
Updated Dec 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emir Žunić (2018). Real-world VRP data with realistic non-standard constraints - parameter setting problem regression input data [Dataset]. http://doi.org/10.4121/uuid:97006624-d6a3-4a29-bffa-e8daf60699d8
Explore at:
media types: application/vnd.ms-excel, text/plainAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:97006624-d6a3-4a29-bffa-e8daf60699d8
Dataset updated
Dec 14, 2018
Dataset provided by
4TU.Centre for Research Data
Authors
Emir Žunić
Description
This file is in Excel (xls) format, and contains data about regression model for input and output parameters (constants) that can be used for the solving of real-world vehicle routing problems with realistic non-standard constraints. All data are real and obtained experimentally by using VRP algorithm on production environment in one of the biggest distribution companies in Bosnia and Herzegovina.
Performance parameters.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Performance parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t007
Dataset updated
Jun 9, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance parameters.
Bitcoin data part three from Jan 2009 to Feb 2018
kaggle.com
Updated Apr 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZouJiu (2020). Bitcoin data part three from Jan 2009 to Feb 2018 [Dataset]. https://www.kaggle.com/shiheyingzhe/bitcoin-data-part-three-from-jan-2009-to-feb-2018/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 18, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ZouJiu
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
During my Senior in the Shan Dong University, my tutor give me research direction of University thesis, which is bitcoin transaction data analysis, so I crawled all of bitcoin transaction data from January 2009 to February 2018.I make statistical analysis and quantitative analysis,I hope this data will give you some help, data mining is interesting and helping not only in the skill of data mining but also in our life.

I crawled these data from website https://www.blockchain.com/explorer, each file contains many blocks,the scope of blocks is reflected in the file name,e.g. this file 0-68732.csv is composed of zero block which is also called genesis block until 68732 block.if a block that didn't have input is not in this file. let's see the columns and rows, there has five columns, the Height column represent block height,the Input column represent the input address of this block,the Output column represent the output address of this block,the Sum column represent bitcoin transaction amount corresponding to the Output,the Time column represent the generation time of this block.A block contains many transactions.

The page is part three of all data, others can be found here https://www.kaggle.com/shiheyingzhe/datasets
f
Previous works comparative table.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Previous works comparative table. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t001
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Previous works comparative table.
Software Developer Expertise GitHub and Stack Overflow data sets
zenodo.org
data.niaid.nih.gov
bin, csv, html, txt
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Norbert Eke; Olga Baysal; Norbert Eke; Olga Baysal (2025). Software Developer Expertise GitHub and Stack Overflow data sets [Dataset]. http://doi.org/10.5281/zenodo.3696079
Explore at:
csv, html, bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3696079
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Norbert Eke; Olga Baysal; Norbert Eke; Olga Baysal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cross-Platform Software Developer Expertise Learning by Norbert Eke

This data set is part of my Master's thesis project on developer expertise learning by mining Stack Overflow (SOTorrent) and Github (GHTorrent) data. Check out my portfolio website at norberte.github.io
n
Data from: Improving Scientific Information Extraction with Text Generation
curate.nd.edu
pdf
Updated Apr 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qingkai Zeng (2025). Improving Scientific Information Extraction with Text Generation [Dataset]. http://doi.org/10.7274/28571045.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/28571045.v1
Dataset updated
Apr 9, 2025
Dataset provided by
University of Notre Dame
Authors
Qingkai Zeng
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Description
As research communities expand, the number of scientific articles continues to grow rapidly, with no signs of slowing. This information overload drives the need for automated tools to identify relevant materials and extract key ideas. Information extraction (IE) focuses on converting unstructured scientific text into structured knowledge (e.g., ontologies, taxonomies, and knowledge graphs), enabling intelligent systems to excel in tasks like document organization, scientific literature retrieval and recommendation, claim verification even novel idea or hypothesis generation. To pinpoint the scope of this thesis, I focus on the taxonomic structure in this thesis to represent the knowledge in the scientific domain.

To construct a taxonomy from scientific corpora, traditional methods often rely on pipeline frameworks. These frameworks typically follow a sequence: first, extracting scientific concepts or entities from the corpus; second, identifying hierarchical relationships between the concepts; and finally, organizing these relationships into a cohesive taxonomy. However, such methods encounter several challenges: (1) the quality of the corpus or annotation data, (2) error propagation within the pipeline framework, and (3) limited generalization and transferability to other specific domains. The development of large language models (LLMs) offers promising advancements, as these models have demonstrated remarkable abilities to internalize knowledge and respond effectively to a wide range of inquiries. Unlike traditional pipeline-based approaches, generative methods harness LLMs to achieve (1) better utilization of their internalized knowledge, (2) direct text-to-knowledge conversion, and (3) flexible, schema-free adaptability.

This thesis explores innovative methods for integrating text generation technologies to improve IE in the scientific domain, with a focus on taxonomy construction. The approach begins with generating entity names and evolves to create or enrich taxonomies directly via text generation. I will explore combining neighborhood structural context, descriptive textual information, and LLMs' internal knowledge to improve output quality. Finally, this thesis will outline future research directions.
d
Geospatial Files for the Geologic Map of the Stibnite Mining Area, Valley...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Geospatial Files for the Geologic Map of the Stibnite Mining Area, Valley County, Idaho [Dataset]. https://catalog.data.gov/dataset/geospatial-files-for-the-geologic-map-of-the-stibnite-mining-area-valley-county-idaho
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Valley County, Stibnite, Idaho
Description
These geospatial files are the essential components for the Geologic Map of the Stibnite Mining Area in Valley County, Idaho, which was published by the Idaho Geological Survey in 2022. Three main file types are in this dataset: geographic, geologic, and mining. Geographic files are map extent, lidar base, topographic contours, labels for contours, waterways, and roads. Geologic files are geologic map units, faults, structural lines meaning axial traces, structural points like bedding strike and dip locations, cross section lines, and drill core sample locations. Lastly, mining files are disturbed ground features including open pit polygons or outlines, and general mining features such as the location of an adit. File formats are shape, layer, or raster. Of the 14 shapefiles, 7 have layer files that provide pre-set symbolization for use in ESRI ArcMap that match up with the Geologic Map of the Stibnite Mining Area in Valley County, Idaho. The lidar data have two similar, but distinct, raster format types (ESRI GRID and TIFF) intended to increase end user accessibility. This dataset is a compilation of both legacy data (from Smitherman’s 1985 masters thesis published in 1988, Midas Gold Corporation employees, the Geologic Map of the Stibnite Quadrangle (Stewart and others, 2016) and Reed S. Lewis of the Idaho Geological Survey) and new data from 2013, 2015, and 2016 field work by Niki E. Wintzer.
Key indicators.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Key indicators. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t002
Dataset updated
Jun 9, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Key indicators.
m
Criteria for evaluating and qualifying public datasets obtained from the...
data.mendeley.com
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gyslla Vasconcelos (2025). Criteria for evaluating and qualifying public datasets obtained from the Brazilian Federal Government's Open Data Portal - dados.gov [Dataset]. http://doi.org/10.17632/x8sgcykthn.2
Explore at:
Unique identifier
https://doi.org/10.17632/x8sgcykthn.2
Dataset updated
May 19, 2025
Authors
Gyslla Vasconcelos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These criteria (file 1) were drawn up empirically, based on the practical challenges faced during the development of the thesis research, based on tests carried out with various datasets applied to process mining tools. These criteria were elaborated empirically, based on the practical challenges faced during the development of the thesis research, based on tests conducted with various datasets applied to process mining tools. These criteria were prepared with the aim of creating a ranking of the datasets selected and published (https://doi.org/10.6084/m9.figshare.25514884.v3), in order to classify them according to their score. The criteria are divided into informative (In), importance (I), difficulty (D) and ease (F) of handling (file 2). The datasets were selected (file 3) and, for ranking, calculations were made (file 5) to normalize the values for standardization (file 4). This data is part of a study on the application of process mining techniques to Brazilian public service data, available on the open data portal dados.gov.
R
Road Segmentation Cctv Merge Dataset
universe.roboflow.com
zip
Updated Jun 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
road001 (2025). Road Segmentation Cctv Merge Dataset [Dataset]. https://universe.roboflow.com/road001/road-segmentation-cctv-merge/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 16, 2025
Dataset authored and provided by
road001
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Road Road AQlm Polygons
Description
input.jpg - https://universe.roboflow.com/labelingvehicles/yolov7_hieu/ = 5 E644 - https://universe.roboflow.com/avcwaqar/e6_rashakai/ = 10 input - https://universe.roboflow.com/labelingvehicles/yolov7_rev02/ = 10 suhat - https://universe.roboflow.com/thesis-jauyq/road-segmentation-thesis/ = 16 169 - https://universe.roboflow.com/bku-11kfx/master-thesis-ranfb/ = 49 https://universe.roboflow.com/wicak/road-fyb4f/ = 100 -xx - https://universe.roboflow.com/kitazonoken/hakusenninnshiki/ = 20 vlcsnap - https://universe.roboflow.com/galuh-dataset/unstructured-road/ = 20 output - https://universe.roboflow.com/slietrover1/road-turns/ = 20 data14meipng - https://universe.roboflow.com/datamining-7hstn/datamining/ = 40 00 - https://universe.roboflow.com/anemone-rgurc/road_data-vxfl5/ = 120
Z
Tinkerforge environmental datasets
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tinkerforge environmental datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1468441
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Miguel Yuste Fernández Alonso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Collection of environmental datasets recorded with Tinkerforge sensors and used in the development of a bachelor thesis on the topic of frequent pattern mining. The data was collected in several locations in the city of Graz, Austria, as well as an additional dataset recorded in Santander, Spain. The following bricklets were used:

Graz datasets (i12, library_at, mensa_nt, muenzgrabenstrasse, neutorgasse, studienzentrum, vguh, kaiserfeldgasse):

Barometer Bricklet

Moisture Bricklet

Sound Intensity Bricklet

Ambient Light Bricklet

Humidity Bricklet

Temperature Bricklet

CO2 Bricklet

Motion Detector Bricklet

Barometer Bricklet

Santander dataset:

Motion Detector Bricklet

Ambient Light Bricklet 2.0

Sound Intensity Bricklet

Temperature Bricklet

Humidity Bricklet

CO2 Bricklet

Accelerometer Bricklet

Barometer Bricklet (recording also altitude)

Additionally, the datasets contain the voltage and chip temperature readings of the Master Brick.

It should be noted that Tinkerforge bricklets occasionally do not manage to write their recorded values in the time window between two recording frames, and they can also suffer from other disruptions. This produces a considerable number of instances that do not include the data of all sensors (incomplete instants), as well as some readings flagged as erroneous, which should be taken into account when working with the datasets.
f
Performance of the algorithm.
plos.figshare.com
xls
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López (2023). Performance of the algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0257903.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0257903.t008
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Pedro Orgeira-Crespo; Carla Míguez-Álvarez; Miguel Cuevas-Alonso; Elena Rivo-López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of the algorithm.

Facebook

Twitter

Click to copy link

Link copied

Cite

Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539

Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden"

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14196539

Dataset updated

Nov 21, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Katharina Zinke; Katharina Zinke

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

## Data sources

Folder 01_SourceData/

- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

## Automatic classification

Folder 02_AutomaticClassification/

- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

- oddpub_results_wDOIs.csv (results file of the ODDPub classification)

- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

## Manual coding

Folder 03_ManualCheck/

- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

- ManualCheck_2023-06-08.csv (Manual coding results file)

- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

## Explorative analysis for the discoverability of open data

Folder04_FurtherAnalyses

Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

## R-Script

Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)

Clear search

Close search

Google apps

Main menu

Data supporting the Master thesis "Monitoring von Open Data Praktiken -...

Experimental data for "Software Data Analytics: Architectural Model...

Thesis Code

Gender classification of PA-100K dataset, a Pedestrian Attribute dataset

Categorization of doctoral theses.

Canadian Copper Mining Data - D Young Thesis

Electronic Invoicing Event Logs

Pattern Mining for Label Ranking

Real-world VRP data with realistic non-standard constraints - parameter...

Performance parameters.

Bitcoin data part three from Jan 2009 to Feb 2018

Previous works comparative table.

Software Developer Expertise GitHub and Stack Overflow data sets

Data from: Improving Scientific Information Extraction with Text Generation

Geospatial Files for the Geologic Map of the Stibnite Mining Area, Valley...

Key indicators.

Criteria for evaluating and qualifying public datasets obtained from the...

Road Segmentation Cctv Merge Dataset

Tinkerforge environmental datasets

Performance of the algorithm.

Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden"