Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine-readable metadata available from landing pages for datasets facilitate data citation by enabling easy integration with reference managers and other tools used in a data citation workflow. Embedding these metadata using the schema.org standard with the JSON-LD is emerging as the community standard. This dataset is a listing of data repositories that have implemented this approach or are in the progress of doing so.
This is the first version of this dataset and was generated via community consultation. We expect to update this dataset, as an increasing number of data repositories adopt this approach, and we hope to see this information added to registries of data repositories such as re3data and FAIRsharing.
In addition to the listing of data repositories we provide information of the schema.org properties supported by these data repositories, focussing on the required and recommended properties from the "Data Citation Roadmap for Scholarly Data Repositories".
This resource contains slides for the AGU Fall Meeting 2023 presentation (#IN23A-07) in San Francisco on Dec 12. Session: IN23A: Advancing Open Science: Emerging Techniques in Knowledge Management and Discovery II Oral
Effective response to global crises relies on universal access to scientific data and models, understanding their attributes, and representing their interconnectivity to facilitate collaborative research and decision making. In the age of distributed data, geospatial researchers frequently invest significant time searching for, accessing, and working to understand scientific data. This often leads to the recreation of existing datasets as well as challenges in determining methods for accessing, using, and ultimately establishing connections between resources. In recent years, following FAIR and CARE principles, there is an emerging practice to leverage structured and robust metadata to accelerate the discovery of web-based scientific resources and products. This practice assists users in not only discovery, but also in understanding the context, quality, and provenance of data, as well as the rights and responsibilities of data owners and consumers. It also empowers organizations to leverage their data more effectively and derive meaningful insights from them. Doing so, however, can be difficult, especially when diverse resources needed for scientific applications may be spread across multiple repositories or locations. We present a solution for leveraging the Schema.org vocabulary along with various web encodings such as the Resource Description Framework (RDF) with JSON-LD to create an actionable, curated catalog of scientific resources ranging from spatio-temporal data to software source code. We explore how resources of various types and common scientific formats, such as multidimensional, software containers, source code, and spatial features, which are stored across various repositories and distributed cloud storage, can be described and cataloged. Recognizing the impracticality of manually cataloging metadata, we have developed generic capabilities to automatically extract metadata for such resources, while empowering scientists to provide additional context. By incorporating comprehensive metadata, the exploration of diverse data relationships can be realized to gain insight into gaps and opportunities to improve the connectivity between science communities.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Combines all individual instances, models (shapes) and metadata from RKD schema.org datasets into one unified dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Excel file contains crosswalks among different metadata schemas that can be used for the description of data cubes in the areas of Marine Science, Earth Sciences and Climate Research. These data cubes common contain observations of some variables in some feature of interest, taken by Earth Observation systems (e.g., satellites) or as in-situ observations.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Earth.Org.UK (EOU) Product, Service and Event Review metadata for schema.org as key/value pairs in plain-text files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This netCDF data is the simulation output from Utah Energy Balance (UEB) model.It includes the simulation result of snow water equivalent during the period Oct. 2009 to June 2010 for TWDEF site in Utah.
The document contains a mapping of metadata elements of the CESSDA Data Catalogue to the CESSDA Metadata Model (CMM), to OpenAire, to B2Find, to schema.org and to Dublin Core. It also provides definitions, information on requirements and notes for every metadata element.
The NIST Extensible Resource Data Model (NERDm) is a set of schemas for encoding in JSON format metadatathat describe digital resources. The variety of digital resources it can describe includes not onlydigital data sets and collections, but also software, digital services, web sites and portals, anddigital twins. It was created to serve as the internal metadata format used by the NIST Public DataRepository and Science Portal to drive rich presentations on the web and to enable discovery; however, itwas also designed to enable programmatic access to resources and their metadata by external users.Interoperability was also a key design aim: the schemas are defined using the JSON Schema standard,metadata are encoded as JSON-LD, and their semantics are tied to community ontologies, with an emphasison DCAT and the US federal Project Open Data (POD) models. Finally, extensibility is also central to itsdesign: the schemas are composed of a central core schema and various extension schemas. New extensionsto support richer metadata concepts can be added over time without breaking existing applications.Validation is central to NERDm's extensibility model. Consuming applications should be able to choosewhich metadata extensions they care to support and ignore terms and extensions they don't support.Furthermore, they should not fail when a NERDm document leverages extensions they don't recognize, evenwhen on-the-fly validation is required. To support this flexibility, the NERDm framework allowsdocuments to declare what extensions are being used and where. We have developed an optional extensionto the standard JSON Schema validation (see ejsonschema below) to support flexible validation: while astandard JSON Schema validater can validate a NERDm document against the NERDm core schema, our extensionwill validate a NERDm document against any recognized extensions and ignore those that are notrecognized.The NERDm data model is based around the concept of resource, semantically equivalent to a schema.orgResource, and as in schema.org, there can be different types of resources, such as data sets andsoftware. A NERDm document indicates what types the resource qualifies as via the JSON-LD "@type"property. All NERDm Resources are described by metadata terms from the core NERDm schema; however,different resource types can be described by additional metadata properties (often drawing on particularNERDm extension schemas). A Resource contains Components of various types (includingDCAT-defined Distributions) that are considered part of the Resource; specifically, these can include downloadable data files, hierachical datacollecitons, links to web sites (like software repositories), software tools, or other NERDm Resources.Through the NERDm extension system, domain-specific metadata can be included at either the resource orcomponent level. The direct semantic and syntactic connections to the DCAT, POD, and schema.org schemasis intended to ensure unambiguous conversion of NERDm documents into those schemas.As of this writing, the Core NERDm schema and its framework stands at version 0.7 and is compatible withthe "draft-04" version of JSON Schema. Version 1.0 is projected to be released in 2025. In thatrelease, the NERDm schemas will be updated to the "draft2020" version of JSON Schema. Other improvementswill include stronger support for RDF and the Linked Data Platform through its support of JSON-LD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource provides crosswalks among the most commonly used metadata schemes and guidelines to describe digital objects in Open Science, including:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains two tables. One table contains metadata for "citable" datasets (datasets that have either a DOI or a compact identifier). The other table contains the relationships between each pair of datasets in the first table.We generated this corpus of dataset metadata by crawling the Web to find pages with schema.org or DCAT metadata indicating that the page contains a dataset. The metadata for datasets includes information such as the dataset’s name, description, provider, creation date, Digital Object Identifiers (DOI), and more. Out of the 46 million dataset pages that have schema.org, we publish this subset of 4.3 million dataset-metadata entries that are citable. We also include an additional table on relationships between these datasets.Please contact dataset-search@googlegroups.com if you have any questions or requests to remove a dataset that you own from this collection.
Metadata and Annotations (structured data) of tirol.at
This resource contains several different file types to help identify appropriate properties for each file type when designing the metadata schema based on Schema.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource contains several different file types to help identify appropriate properties for each file type when designing the metadata schema based on Schema.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This fileset contains a preprint version of the conference paper (.pdf), presentation slides (as .pptx) and the dataset(s) and validation schema(s) for the IDCC 2019 (Melbourne) conference paper: The Red Queen in the Repository: metadata quality in an ever-changing environment. Datasets and schemas are in .xml, .xsd , Excel (.xlsx) and .csv (two files representing two different sheets in the .xslx -file). The validationSchemas.zip holds the additional validation schemas (.xsd), that were not found in the schemaLocations of the metadata xml-files to be validated. The schemas must all be placed in the same folder, and are to be used for validating the Dataverse dcterms records (with metadataDCT.xsd) and the Zenodo oai_datacite feeds respectively (schema.datacite.org_oai_oai-1.0_oai.xsd). In the latter case, a simpler way of doing it might be to replace the incorrect URL "http://schema.datacite.org/oai/oai-1.0/ oai_datacite.xsd" in the schemaLocation of these xml-files by the CORRECT: schemaLocation="http://schema.datacite.org/oai/oai-1.0/ http://schema.datacite.org/oai/oai-1.0/oai.xsd" as has been done already in the sample files here. The sample file folders testDVNcoll.zip (Dataverse), testFigColl.zip (Figshare) and testZenColl.zip (Zenodo) contain all the metadata files tested and validated that are registered in the spreadsheet with objectIDs.
In the case of Zenodo, one original file feed,
zen2018oai_datacite3orig-https%20_zenodo.org_oai2d%20verb=ListRecords%26metadata
Prefix=oai_datacite%26from=2018-11-29%26until=2018-11-30.xml ,
is also supplied to show what was necessary to change in order to perform validation as indicated in the paper.
For Dataverse, a corrected version of a file,
dvn2014ddi-27595Corr_https%20_dataverse.harvard.edu_api_datasets_export%20
exporter=ddi%26persistentId=doi%253A10.7910_DVN_27595Corr.xml ,
is also supplied in order to show the changes it would take to make the file validate without error.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data associated with "Developing a standardized but extendable framework to increase the findability of infectious disease datasets"
Includes:
The open access movement and scientific reproducibility concerns have led the biomedical research community to embrace efforts to make scientific datasets openly accessible. While many datasets are now available, there are still challenges in ensuring that they are Findable, Accessible, Interoperable, and Reusable (FAIR). To improve the FAIRness of datasets, we evaluated dataset repositories for compliance with Schema.org standards – a collection of standards developed to increase metadata searchability across the internet. Adoption of the Schema.org Dataset standard was highly variable in biomedical research datasets, and the standard omitted many desirable metadata fields. We customized the Schema.org Dataset standard to catalog datasets collected across a Systems Biology research consortium consisting of 15 Centers. We developed a reusable process for creating a schema which is interoperable with other standards, but still extendable and customizable to a particular context. Here, we describe our process along with the associated gains in FAIRness, and discuss ongoing challenges with dataset discoverability – the first step to ensure that the vast amount of open data published by the research community is reused to its maximum value.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dryad is a general-purpose curated repository for data underlying scholarly publications. Dryad's metadata framework is supported by a Dublin Core Application Profile (DCAP, hereafter referred to as application profile). This paper examines the evolution of Dryad's application profile, which has been revised over time, in an operational system, serving day-to-day needs of stakeholders. We model the relationships between data packages and data files over time, from its initial implementation in 2007 to its current practice, version 3.2, and present a crosswalk analysis. Results covering versions 1.0 to 3.0 show an increase in the number of metadata elements used to describe Dryad's data objects in Dryad. Results also confirm that Version 3.0, which envisioned separate metadata element sets for data package, data files, and publication metadata, was never fully realized due to constraints in Dryad system architecture. Version 3.1 subsequently reduced the number of metadata elements captured by recombining the publication and data package element sets. This paper documents a real world application profile implemented in an operational system, noting practical system and infrastructure constraints. Finally, the analysis presented informs an ongoing effort to update the application profile to support Dryad's diverse and expanding community of stakeholders.
This excel workbook is a compilation of the major metadata schemas for life cycle assessment.
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
ClaimsKG is a knowledge graph of metadata information for 59580 fact-checked claims scraped from 13 fact-checking sites. In addition to providing a single dataset of claims and associated metadata, truth ratings are harmonised and additional information is provided for each claim, e.g., about mentioned entities. Please see (https://data.gesis.org/claimskg/) for further details about the data model and statistics.
The dataset facilitates structured queries about claims, their truth values, involved entities, authors, dates, and other kinds of metadata. ClaimsKG is generated through a (semi-)automated pipeline, which harvests claim-related data from popular fact-checking web sites, annotates them with related entities from DBpedia/Wikipedia, and lifts all data to RDF using established vocabularies (such as schema.org).
The latest release of ClaimsKG covers 59580 claims. The data was scraped till August, of 2022 containing claims published between the years 1996-2022 from 13 factchecking websites. The claim-review (fact checking) period for claims ranges between the year 1996 to 2022. Entity fishing python client (https://github.com/hirmeos/entity-fishing-client-python) has been used for entity linking and disambiguation in this release. The dataset contains a total of 1371271 entities detected and referenced with DBpedia. More information, such as detailed statistics, query examples and a user-friendly interface to explore the knowledge graph is available at: https://data.gesis.org/claimskg/ .
The first two releases of ClaimsKG are hosted at Zenodo (https://doi.org/10.5281/zenodo.3518960), ClaimsKGV1.0 (published on 04.04.2019), ClaimsKGV2.0 (published on 01.09.2019). This latest release of ClaimsKG supersedes the previous versions as it contains all the claims from the previous versions together with additional claims as well as improved entity annotations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Zenodo's published open access records and communities metadata, including entries marked by the Zenodo staff as spam and deleted.
The datasets are gzipped compressed JSON-lines files, where each line is a JSON object representation of a Zenodo record or community.
Records dataset
Filename: zenodo_open_metadata_{ date of export }.jsonl.gz
Each object contains the terms: part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date
which correspond to the fields with the same name available in Zenodo's record JSON Schema at https://zenodo.org/schemas/records/record-v1.0.0.json.
In addition, some terms have been altered:
The term files contains a list of dictionaries containing filetype, size, and filename only.
The term license contains a short Zenodo ID of the license (e.g. "cc-by").
Communities dataset
Filename: zenodo_community_metadata_{ date of export }.jsonl.gz
Each object contains the terms: id, title, description, curation_policy, page
which correspond to the fields with the same name available in Zenodo's community creation form.
Notes for all datasets
For each object the term spam contains a boolean value, determining whether a given record/community was marked as spam content by Zenodo staff.
Some values for the top-level terms, which were missing in the metadata may contain a null value.
A smaller uncompressed random sample of 200 JSON lines is also included for each dataset to test and get familiar with the format without having to download the entire dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Metadata for the Sound & Vision catalogue items have been transformed to RDF and are mapped to schema.org. Descriptive metadata that is not subject to copyright laws are included. All available metadata are loaded in one S&V knowledge graph, that is accessible via a SPARQL endpoint.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine-readable metadata available from landing pages for datasets facilitate data citation by enabling easy integration with reference managers and other tools used in a data citation workflow. Embedding these metadata using the schema.org standard with the JSON-LD is emerging as the community standard. This dataset is a listing of data repositories that have implemented this approach or are in the progress of doing so.
This is the first version of this dataset and was generated via community consultation. We expect to update this dataset, as an increasing number of data repositories adopt this approach, and we hope to see this information added to registries of data repositories such as re3data and FAIRsharing.
In addition to the listing of data repositories we provide information of the schema.org properties supported by these data repositories, focussing on the required and recommended properties from the "Data Citation Roadmap for Scholarly Data Repositories".