Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data contain aggregated survey responses assessing the quality and completeness of metadata for datasets deposited in public repositories and for the same datasets after professional curation.Responses were provided by 10 professional editors representing life, social and physical sciences. Each were randomly assigned four datasets to assess, half (20) of which had been curated according to the standards of Springer Nature's Research Data Support service and half (20) which had not.Curated datasets were shared privately with research participants. The versions that did not receive curation via Springer Nature's Research Data Support are openly accessible.Single-blind testing was employed; the researchers were not made aware which datasets had been curated and which had not, and it was ensured that no participant assessed the same dataset before and after curation. Responses were collected via an online survey. The relevant question and scoring is provided below:Rate the overall quality and completeness of the metadata for the dataset (with regards to finding and accessing and citing the data, not reusing the data)1 = not complete, 5 = very complete
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains an open and curated scholarly graph we built as a training and test set for data discovery, data connection, author disambiguation, and link prediction tasks. This graph represents the European Marine Science community included in the OpenAIRE Graph. The nodes of the graph we release represent publications, datasets, software, and authors respectively; edges interconnecting research products always have the publication as source, and the dataset/software as target. In addition, edges are labeled with semantics that outline whether the publication is referencing, citing, documenting, or supplementing the related outcome. To curate and enrich nodes metadata and edges semantics, we relied on the information extracted from the PDF of the publications and the datasets/software webpages respectively. We curated the authors so to remove duplicated nodes representing the same person.
The resource we release counts 4,047 publications, 5,488 datasets, 22 software, 21,561 authors, and 9,692 edges connect publications to datasets/software. This graph is in the curated_MES folder. We provide this resource as:
We provide two additional scholarly graphs:
Facebook
TwitterThis is the supplementary material accompanying the manuscript "Daily life in the Open Biologist’s second job, as a Data Curator", published in Wellcome Open Research.It contains:- Python_scripts.zip: Python scripts used for data cleaning and organization:-add_headers.py: adds specified headers automatically to a list of csv files, creating new output files containing a "_with_headers" suffix.-count_NaN_values.py: counts the total number of rows containing null values in a csv file and prints the location of null values in the (row, column) format.-remove_rowsNaN_file.py: removes rows containing null values in a single csv file and saves the modified file with a "_dropNaN" suffix.-remove_rowsNaN_list.py: removes rows containing null values in list of csv files and saves the modified files with a "_dropNaN" suffix.- README_template.txt: a template for a README file to be used to describe and accompany a dataset.- template_for_source_data_information.xlsx: a spreadsheet to help manuscript authors to keep track of data used for each figure (e.g., information about data location and links to dataset description).- Supplementary_Figure_1.tif: Example of a dataset shared by us on Zenodo. The elements that make the dataset FAIR are indicated by the respective letters. Findability (F) is achieved by the dataset unique and persistent identifier (DOI), as well as by the related identifiers for the publication and dataset on GitHub. Additionally, the dataset is described with rich metadata, (e.g., keywords). Accessibility (A) is achieved by the ease of visualization and downloading using a standardised communications protocol (https). Also, the metadata are publicly accessible and licensed under the public domain. Interoperability (I) is achieved by the open formats used (CSV; R), and metadata are harvestable using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), a low-barrier mechanism for repository interoperability. Reusability (R) is achieved by the complete description of the data with metadata in README files and links to the related publication (which contains more detailed information, as well as links to protocols on protocols.io). The dataset has a clear and accessible data usage license (CC-BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a standard template for representing the metadata of rock specimens (e.g., core, microanalysis, hand grab) in the CSIRO Mineral Resources Discovery program. The template includes core properties of samples such as their name, identifier, type, and location, as well as associated metadata such as project, drilling contexts, hazard declaration and physical storage. The template will be used to catalogue legacy and specimens systematically collected through mineral exploration projects. It has been developed iteratively, revised, and improved based on feedback from researchers and lab technicians. This standardized template can prevent duplicate sample metadata entry and lower metadata redundancy, thereby improving the program's physical sample curation and discovery. Lineage: The template includes a readme section summarising all the metadata fields, including their requirements and definitions. The template incorporates several established controlled terms representing, e.g., sample type, rock type, drill type, EPSG and hazard information to ensure consistency in metadata entry.
Facebook
TwitterIf you use this software, please cite it using the metadata from this file.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The aim of the Data Rescue & Curation Best Practices Guide is to provide an accessible and hands-on approach to handling data rescue and digital curation of at-risk data for use in secondary research. We provide a set of examples and workflows for addressing common challenges with social science survey data that can be applied to other social and behavioural research data. The goal of this guide and set of workflows presented is to improve librarians’ and data curators’ skills in providing access to high-quality, well-documented, and reusable research data. The aspects of data curation that are addressed throughout this guide are adopted from long-standing data library and archiving practices, including: documenting data using standard metadata, file and data organization; using open and software-agnostic formats; and curating research data for reuse.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor Data-driven curation process for describing the blood glucose management in the intensive care unit. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Facebook
TwitterThe COVID-19 CDCS represents a metadata repository that provides a catalog of COVID-19 related research literature and data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a standard template for representing the metadata of mineral spectral reference specimens in the CSIRO Mineral Resources Discovery program. The template includes core properties of samples such as their name, identifier, type, and location, as well as associated metadata such as project, hazard declaration and physical storage. The template will be used to catalogue reference samples used for mineral spectral analysis (NVCL). It has been developed iteratively, revised, and improved based on feedback from researchers and lab technicians. This standardized template can prevent duplicate sample metadata entry and lower metadata redundancy, thereby improving the program's physical sample curation and discovery. Lineage: This template was built on the CMR rock metadata template (https://doi.org/10.25919/2prf-dk88). The template includes a readme section summarising all the metadata fields, including their requirements and definitions. The template incorporates several established controlled terms representing, e.g., sample type, mineral type, EPSG and hazard information to ensure consistency in metadata entry. The template also contains few metadata fields that are specific to mineral spectra samples like different analysis conducted for the samples (XRD, Whole-rock geochemical analysis, etc).
Facebook
TwitterWhy DMPS are helpful to our researchers. Visit https://dataone.org/datasets/sha256%3A20f68cc9df1e285bd047214421264aaa881de9d21bcde4a6fceb1d48867076da for complete metadata about this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplemental materials for Web3D paper - Levels of Representation and Data Infrastructures in Entomo-3D: An applied research approach to addressing metadata curation issues to support preservation and access of 3D.
One 3D metadata schema and three dataset pipeline figures
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A curated dataset of XRefs extracted from agri-food ontologies and curated using OMHT (Ontology Mapping Harvester Tool), which is a script in Java language designed to automatically extract and semi-automatically curate declared mappings from ontologies and reify them into specific objects with metadata and provenance information.
Facebook
TwitterIdentifiers of many kinds are the key to creating unambiguous and persistent connections between research objects and other items in the global research infrastructure (GRI). Many repositories are implementing mechanisms to collect and integrate these identifiers into their submission and record curation processes. This bodes well for a well-connected future, but many existing resources submitted in the past are missing these identifiers, thus missing the connections required for inclusion in the connected infrastructure. Re-curation of these metadata is required to make these connections. The Dryad Data Repository has existed since 2008 and has successfully re-curated the repository metadata several times, adding identifiers for research organizations, funders, and researchers. Understanding and quantifying these successes depends on measuring repository and identifier connectivity. Metrics are described and applied to the entire repository here. Identifiers for papers (DOIs) connected..., These data are Dryad metadata retrieved from https://datadryad.org and translated into csv files. There are two datasets: Â 1. DryadJournalDataset was retrieved from Dryad using the ISSNs in the file DryadJournalDataset_ISSNs.txt, although some had no data. Â 2. DryadOrganizationDataset was retrieved from Dryad using the RORs in the file DryadOrganizationDataset_RORs.txt, although some had no data. Each dataset includes four types of metadata: identifiers, funders, keywords, and related works, each in a separate comma (.csv) or tab (.tsv) delimited files. There are also Microsoft Excel files (.xlsx) for the identifier metadata and connectivity summaries for each dataset (*.html). The connectivity summaries include summaries of each parameter in all four data files with definitions, counts, unique counts, most frequent values, and completeness. These data formed the basis for an analysis of the connectivity of the Dryad repository for organizations, funders, and people., , # Data For: Sustainable Connectivity in a Community Repository
This readme.txt file was generated on 30231110 by Ted Habermann
Data For: Sustainable Connectivity in a Community Repository
Principal Investigator Contact Information Name: Ted Habermann (0000-0003-3585-6733) Institution: Metadata Game Changers () Email: ORCID: 0000-0003-3585-6733
November 10, 2023
May and June 2023
National Science Foundation (Crossref Funder ID: 100000001) Award 2134956.
These data are Dryad metadata retrieved from and translated into csv files. There are two datasets:
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Title: YouTube Video Curation (Metadata and URLs)😇 Subtitle: Analyzing YouTube Content: From Video Descriptions to Viewer Engagement Metrics
Introduction
The YouTube Video Metadata Explorer dataset is a comprehensive collection of metadata related to YouTube videos, encompassing a wide range of information including video IDs, content details, statistical data, descriptions, and associated URLs. This rich dataset provides a unique opportunity to explore, analyze, and understand the digital media landscape on one of the world's largest video-sharing platforms.
Content
The dataset consists of 307,623 entries and six main attributes, detailed as follows:
ID: Unique identifier for each video. Snippet: Contains detailed information, including: Category ID: YouTube video category identifier Channel ID: Unique identifier for the channel hosting the video Channel Title: Name of the channel hosting the video Default Audio Language: The default audio language of the video Default Language: The default language of the video Live Broadcast Content: Indicator for Live Broadcast Content Localized: Information related to localization Title: Title of the video Published At: Publication date and time Tags: Associated tags for the video Thumbnails: Different resolution thumbnails, including: Default: 90x120 pixels. High: 360x480 pixels. Maxres: 720x1280 pixels Medium: 180x320 pixels. Standard: 480x640 pixels. Content Details: Includes information about the video's technical specifications and features: Caption: Indicates whether captions are available (true or false). Content Rating: YouTube content rating (e.g., 'ytRating': None). Definition: Video definition quality (e.g., 'hd' for high definition). - -Dimension: Video dimension (e.g., '2d' for 2-dimensional). - -Duration: Duration of the video (e.g., 'PT16M34S' for 16 minutes and 34 seconds). Licensed Content: Indicates whether the content is licensed (true or false). Projection: Type of video projection (e.g., 'rectangular'). Region Restriction: Any region restrictions applied to the video Statistics: Features video engagement metrics: Comment Count: Number of comments on the video - -Favorite Count: Number of times the video has been marked as a favorite (e.g., '0'). Like Count: Number of likes on the video (e.g., '29942'). View Count: Number of views for the video (e.g., '704710'). Description: A brief description or summary of the video content - -URLs: Links associated with the videos description
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset comprises survey data collected from graduate students at Simmons University's School of Library and Information Science (LIS), focusing on their Research Data Management (RDM) awareness, experience, preparedness and need for professional development. The files in this dataset include the raw survey responses (in CSV format).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor A dataset describing data discovery and reuse practices in research. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor A DICOM dataset for evaluation of medical image de-identification. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Facebook
TwitterThis object has been created as a part of the web harvesting project of the Eötvös Loránd University Department of Digital Humanities ELTE DH. Learn more about the workflow HERE about the software used HERE.The aim of the project is to make online news articles and their metadata suitable for research purposes. The archiving workflow is designed to prevent modification or manipulation of the downloaded content. The current version of the curated content with normalized formatting in standard TEI XML format with Schema.org encoded metadata is available HERE. The detailed description of the raw content is the following:The portal's archived content (from 2017-01-29 to 2021-05-21) in WARC format available HERE (crawled: 2021-05-21T09:51:12.531750 - 2021-05-21T18:38:24.961226).Please fill in the following form before requesting access to this dataset:ACCES FORM {"references": ["https://doi.org/10.5281/zenodo.3755323"]}
Facebook
TwitterThis dataset contains key characteristics about the data described in the Data Descriptor A global database of Holocene paleotemperature records. Contents: 1. human readable metadata summary table in CSV format 2. machine readable metadata file in JSON format
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data contain aggregated survey responses assessing the quality and completeness of metadata for datasets deposited in public repositories and for the same datasets after professional curation.Responses were provided by 10 professional editors representing life, social and physical sciences. Each were randomly assigned four datasets to assess, half (20) of which had been curated according to the standards of Springer Nature's Research Data Support service and half (20) which had not.Curated datasets were shared privately with research participants. The versions that did not receive curation via Springer Nature's Research Data Support are openly accessible.Single-blind testing was employed; the researchers were not made aware which datasets had been curated and which had not, and it was ensured that no participant assessed the same dataset before and after curation. Responses were collected via an online survey. The relevant question and scoring is provided below:Rate the overall quality and completeness of the metadata for the dataset (with regards to finding and accessing and citing the data, not reusing the data)1 = not complete, 5 = very complete