Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023
This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.
The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).
## Data sources
Folder 01_SourceData/
- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)
- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)
- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)
- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)
## Automatic classification
Folder 02_AutomaticClassification/
- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)
- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)
- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)
- oddpub_results_wDOIs.csv (results file of the ODDPub classification)
- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)
## Manual coding
Folder 03_ManualCheck/
- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)
- ManualCheck_2023-06-08.csv (Manual coding results file)
- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)
## Explorative analysis for the discoverability of open data
Folder04_FurtherAnalyses
Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German
## R-Script
Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset includes all experimental data used for the PhD thesis of Cong Liu, entitled "Software Data Analytics: Architectural Model Discovery and Design Pattern Detection". These data are generated by instrumenting both synthetic and real-life software systems, and are formated according to the IEEE XES format. See http://www.xes-standard.org/ and https://www.win.tue.nl/ieeetfpm/lib/exe/fetch.php?media=shared:downloads:2017-06-22-xes-software-event-v5-2.pdf for more explanations.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Code for getting data,mining text and estimatingVAR model
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is based on the work of Liu et al and their paper "Hydraplus-net: Attentive deep features for pedestrian analysis". In our work, we structure the images for a gender classification task based on the gender attribute annotated. Moreover, we pre-process the images to a 75x75 dimension that can be used by pre-trained deep learning models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorization of doctoral theses.
Mine-level copper data (1953-1984) used in Young, D. (1992), "Cost Specification and Firm Behaviour in a Hotelling Model of Resource Extraction," Canadian Journal of Economics XXV, 41-59. Spreadsheet has 5 tabs (including data and explanatory materials).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This set of data contains information about the process execution of electronic invoicing. The process of electronic invoicing contains the following activities: invoice scanning, approve invoice, liquidation and so on. The data set contains information about the event name, event type, time of the event's execution and the participant whose execution the event is related to. The data is formatted in the MXML format in order to be used for the process mining analysis using tools such as ProM and so on.
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Label Ranking datasets used in the PhD thesis "Pattern Mining for Label Ranking"
This file is in Excel (xls) format, and contains data about regression model for input and output parameters (constants) that can be used for the solving of real-world vehicle routing problems with realistic non-standard constraints. All data are real and obtained experimentally by using VRP algorithm on production environment in one of the biggest distribution companies in Bosnia and Herzegovina.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance parameters.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
During my Senior in the Shan Dong University, my tutor give me research direction of University thesis, which is bitcoin transaction data analysis, so I crawled all of bitcoin transaction data from January 2009 to February 2018.I make statistical analysis and quantitative analysis,I hope this data will give you some help, data mining is interesting and helping not only in the skill of data mining but also in our life.
I crawled these data from website https://www.blockchain.com/explorer, each file contains many blocks,the scope of blocks is reflected in the file name,e.g. this file 0-68732.csv is composed of zero block which is also called genesis block until 68732 block.if a block that didn't have input is not in this file. let's see the columns and rows, there has five columns, the Height column represent block height,the Input column represent the input address of this block,the Output column represent the output address of this block,the Sum column represent bitcoin transaction amount corresponding to the Output,the Time column represent the generation time of this block.A block contains many transactions.
The page is part three of all data, others can be found here https://www.kaggle.com/shiheyingzhe/datasets
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Previous works comparative table.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cross-Platform Software Developer Expertise Learning by Norbert Eke
This data set is part of my Master's thesis project on developer expertise learning by mining Stack Overflow (SOTorrent) and Github (GHTorrent) data. Check out my portfolio website at norberte.github.io
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
As research communities expand, the number of scientific articles continues to grow rapidly, with no signs of slowing. This information overload drives the need for automated tools to identify relevant materials and extract key ideas. Information extraction (IE) focuses on converting unstructured scientific text into structured knowledge (e.g., ontologies, taxonomies, and knowledge graphs), enabling intelligent systems to excel in tasks like document organization, scientific literature retrieval and recommendation, claim verification even novel idea or hypothesis generation. To pinpoint the scope of this thesis, I focus on the taxonomic structure in this thesis to represent the knowledge in the scientific domain.
To construct a taxonomy from scientific corpora, traditional methods often rely on pipeline frameworks. These frameworks typically follow a sequence: first, extracting scientific concepts or entities from the corpus; second, identifying hierarchical relationships between the concepts; and finally, organizing these relationships into a cohesive taxonomy. However, such methods encounter several challenges: (1) the quality of the corpus or annotation data, (2) error propagation within the pipeline framework, and (3) limited generalization and transferability to other specific domains. The development of large language models (LLMs) offers promising advancements, as these models have demonstrated remarkable abilities to internalize knowledge and respond effectively to a wide range of inquiries. Unlike traditional pipeline-based approaches, generative methods harness LLMs to achieve (1) better utilization of their internalized knowledge, (2) direct text-to-knowledge conversion, and (3) flexible, schema-free adaptability.
This thesis explores innovative methods for integrating text generation technologies to improve IE in the scientific domain, with a focus on taxonomy construction. The approach begins with generating entity names and evolves to create or enrich taxonomies directly via text generation. I will explore combining neighborhood structural context, descriptive textual information, and LLMs' internal knowledge to improve output quality. Finally, this thesis will outline future research directions.
These geospatial files are the essential components for the Geologic Map of the Stibnite Mining Area in Valley County, Idaho, which was published by the Idaho Geological Survey in 2022. Three main file types are in this dataset: geographic, geologic, and mining. Geographic files are map extent, lidar base, topographic contours, labels for contours, waterways, and roads. Geologic files are geologic map units, faults, structural lines meaning axial traces, structural points like bedding strike and dip locations, cross section lines, and drill core sample locations. Lastly, mining files are disturbed ground features including open pit polygons or outlines, and general mining features such as the location of an adit. File formats are shape, layer, or raster. Of the 14 shapefiles, 7 have layer files that provide pre-set symbolization for use in ESRI ArcMap that match up with the Geologic Map of the Stibnite Mining Area in Valley County, Idaho. The lidar data have two similar, but distinct, raster format types (ESRI GRID and TIFF) intended to increase end user accessibility. This dataset is a compilation of both legacy data (from Smitherman’s 1985 masters thesis published in 1988, Midas Gold Corporation employees, the Geologic Map of the Stibnite Quadrangle (Stewart and others, 2016) and Reed S. Lewis of the Idaho Geological Survey) and new data from 2013, 2015, and 2016 field work by Niki E. Wintzer.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Key indicators.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These criteria (file 1) were drawn up empirically, based on the practical challenges faced during the development of the thesis research, based on tests carried out with various datasets applied to process mining tools. These criteria were elaborated empirically, based on the practical challenges faced during the development of the thesis research, based on tests conducted with various datasets applied to process mining tools. These criteria were prepared with the aim of creating a ranking of the datasets selected and published (https://doi.org/10.6084/m9.figshare.25514884.v3), in order to classify them according to their score. The criteria are divided into informative (In), importance (I), difficulty (D) and ease (F) of handling (file 2). The datasets were selected (file 3) and, for ranking, calculations were made (file 5) to normalize the values for standardization (file 4). This data is part of a study on the application of process mining techniques to Brazilian public service data, available on the open data portal dados.gov.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
input.jpg - https://universe.roboflow.com/labelingvehicles/yolov7_hieu/ = 5 E644 - https://universe.roboflow.com/avcwaqar/e6_rashakai/ = 10 input - https://universe.roboflow.com/labelingvehicles/yolov7_rev02/ = 10 suhat - https://universe.roboflow.com/thesis-jauyq/road-segmentation-thesis/ = 16 169 - https://universe.roboflow.com/bku-11kfx/master-thesis-ranfb/ = 49 https://universe.roboflow.com/wicak/road-fyb4f/ = 100 -xx - https://universe.roboflow.com/kitazonoken/hakusenninnshiki/ = 20 vlcsnap - https://universe.roboflow.com/galuh-dataset/unstructured-road/ = 20 output - https://universe.roboflow.com/slietrover1/road-turns/ = 20 data14meipng - https://universe.roboflow.com/datamining-7hstn/datamining/ = 40 00 - https://universe.roboflow.com/anemone-rgurc/road_data-vxfl5/ = 120
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Collection of environmental datasets recorded with Tinkerforge sensors and used in the development of a bachelor thesis on the topic of frequent pattern mining. The data was collected in several locations in the city of Graz, Austria, as well as an additional dataset recorded in Santander, Spain. The following bricklets were used:
Graz datasets (i12, library_at, mensa_nt, muenzgrabenstrasse, neutorgasse, studienzentrum, vguh, kaiserfeldgasse):
Barometer Bricklet
Moisture Bricklet
Sound Intensity Bricklet
Ambient Light Bricklet
Humidity Bricklet
Temperature Bricklet
CO2 Bricklet
Motion Detector Bricklet
Barometer Bricklet
Santander dataset:
Motion Detector Bricklet
Ambient Light Bricklet 2.0
Sound Intensity Bricklet
Temperature Bricklet
Humidity Bricklet
CO2 Bricklet
Accelerometer Bricklet
Barometer Bricklet (recording also altitude)
Additionally, the datasets contain the voltage and chip temperature readings of the Master Brick.
It should be noted that Tinkerforge bricklets occasionally do not manage to write their recorded values in the time window between two recording frames, and they can also suffer from other disruptions. This produces a considerable number of instances that do not include the data of all sensors (incomplete instants), as well as some readings flagged as erroneous, which should be taken into account when working with the datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance of the algorithm.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023
This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.
The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).
## Data sources
Folder 01_SourceData/
- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)
- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)
- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)
- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)
## Automatic classification
Folder 02_AutomaticClassification/
- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)
- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)
- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)
- oddpub_results_wDOIs.csv (results file of the ODDPub classification)
- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)
## Manual coding
Folder 03_ManualCheck/
- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)
- ManualCheck_2023-06-08.csv (Manual coding results file)
- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)
## Explorative analysis for the discoverability of open data
Folder04_FurtherAnalyses
Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German
## R-Script
Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)