100+ datasets found

Meta data and supporting documentation
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
q
The Open Metadata Exchange: A decentralized network supporting FAIR...
qubeshub.org
Updated Nov 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anoop Aryal; Christian Clauss; Sam Donovan; Drew LaMar; Anjani Nambiar; Lisa Petrides (2025). The Open Metadata Exchange: A decentralized network supporting FAIR (meta)data practices and persistence for science gateways [Dataset]. http://doi.org/10.25334/JCTY-3G78
Explore at:
Unique identifier
https://doi.org/10.25334/JCTY-3G78
Dataset updated
Nov 5, 2025
Dataset provided by
QUBES
Authors
Anoop Aryal; Christian Clauss; Sam Donovan; Drew LaMar; Anjani Nambiar; Lisa Petrides
Description
The integration of FAIR (Findability, Accessibility, Interoperability, and Reusability) standards in both open science and open education holds transformative potential for advancing STEM education and broadening participation in scientific research. Despite the well-documented benefits of the open practices associated with FAIR standards, adoption remains limited. Science gateways are a particularly attractive area for FAIR implementation, as they form a distributed network of user-friendly web portals supporting access for domain researchers to high-performance computing, scientific data, software, and open educational resources. This paper introduces the Open Metadata Exchange (OME), a decentralized network of repositories enabling the sharing of content metadata and resources. The OME network provides a novel solution to FAIRification challenges associated with the distributed nature of science gateways by directly addressing resource persistence in the context of sustainability and decommissioning.
Meta-Data
figshare.com
txt
Updated Aug 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haley Chatelaine (2022). Meta-Data [Dataset]. http://doi.org/10.6084/m9.figshare.19233555.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19233555.v1
Dataset updated
Aug 23, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Haley Chatelaine
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mouse ID, diet group, colon region, study cohort, analytical batch, and sample weight data
OpenCitations Meta RDF dataset of page numbers metadata and its provenance...
nde-dev.biothings.io
data.niaid.nih.gov
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2024). OpenCitations Meta RDF dataset of page numbers metadata and its provenance information [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_10936231
Explore at:
Dataset updated
Apr 6, 2024
Dataset authored and provided by
OpenCitationshttps://opencitations.net/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset is a specialized subset of the OpenCitations Meta RDF data, focusing exclusively on data related to page numbers of bibliographic resources, known as manifestations (http://purl.org/spar/fabio/Manifestation). It contains all the bibliographic metadata and its provenance information, structured specifically around manifestations (page numbers), in JSON-LD format.

The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/se.zip

Additional information about OpenCitations Meta at the official webpage.
Z
OpenCitations Meta RDF dataset of agent roles metadata and its provenance...
data.niaid.nih.gov
nde-dev.biothings.io
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2024). OpenCitations Meta RDF dataset of agent roles metadata and its provenance information [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10936245
Explore at:
Dataset updated
Apr 6, 2024
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset is a specialized subset of the OpenCitations Meta RDF data, focusing exclusively on data related to agent roles of bibliographic resources (http://purl.org/spar/pro/RoleInTime). These agents can be authors, editors, or publishers. It contains all the metadata and its provenance information, structured specifically around agent roles, in JSON-LD format.

The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

For example, data related to the entity is located in the folder /ar/06250/10000/1000/1000.zip, while information about provenance in /ar/06250/10000/1000/prov/se.zip

Additional information about OpenCitations Meta at the official webpage.
Meta Kaggle Code : Metadata ( CSV )
kaggle.com
zip
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AYUSH KHAIRE ( Previously 😊 ) (2025). Meta Kaggle Code : Metadata ( CSV ) [Dataset]. https://www.kaggle.com/datasets/ayushkhaire/meta-kaggle-codemetadata-csv
Explore at:
zip(836224705 bytes)Available download formats
Dataset updated
Jun 13, 2025
Authors
AYUSH KHAIRE ( Previously 😊 )
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Background

This is the dataset about all the notebooks in Meta Kaggle Code Dataset . The original dataset is owned by kaggle team and I am just trying to extract meta data about meta kaggle code . My dataset contains following columns and hence , giving their description . If you have a feedback , you can view either Discussions or you can create a new topic as well . I hope you like the dataset , and you will utilize it for the Meta Kaggle Hackethon .

Cheers , ayush
Machine Learning YouTube Meta Data
kaggle.com
zip
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Durgesh Samariya (2020). Machine Learning YouTube Meta Data [Dataset]. https://www.kaggle.com/themlphdstudent/machine-learning-youtube-meta-data
Explore at:
zip(327521 bytes)Available download formats
Dataset updated
Aug 20, 2020
Authors
Durgesh Samariya
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
YouTube
Description
Context

Metadata of Machine Learning videos on YouTube.

Content

This dataset contains meta data of 500 videos on machine learning. Simply first 500 videos when your search machine learning in youtube search.

Acknowledgements

Data scraped from https://wiki.digitalmethods.net/Dmi/ToolDatabase . Cover Photo: Photo by Rachit Tank on Unsplash.

Motivation : Dataset by Gabriel Preda

Inspiration

Using this dataset, analyse popularity of machine learning videos and channel with their like, dislike counts.
Z
OpenCitations Meta RDF dataset of identifiers metadata and its provenance...
data.niaid.nih.gov
Updated Apr 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2024). OpenCitations Meta RDF dataset of identifiers metadata and its provenance information [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10936285
Explore at:
Dataset updated
Apr 6, 2024
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset is a specialized subset of the OpenCitations Meta RDF data, focusing exclusively on data related to identifiers (http://purl.org/spar/datacite/Identifier) of bibliographic resources. It contains all the metadata and its provenance information, structured specifically around identifiers, in JSON-LD format.

The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

For example, data related to the entity is located in the folder /id/06250/10000/1000/1000.zip, while information about provenance in /id/06250/10000/1000/prov/se.zip

Additional information about OpenCitations Meta at the official webpage.
Extracted Schemas from the Life Sciences Linked Open Data Cloud
figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maulik Kamdar (2023). Extracted Schemas from the Life Sciences Linked Open Data Cloud [Dataset]. http://doi.org/10.6084/m9.figshare.12402425.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12402425.v2
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Maulik Kamdar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is related to the manuscript "An empirical meta-analysis of the life sciences linked open data on the web" published at Nature Scientific Data. If you use the dataset, please cite the manuscript as follows:Kamdar, M.R., Musen, M.A. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8, 24 (2021). https://doi.org/10.1038/s41597-021-00797-yWe have extracted schemas from more than 80 publicly available biomedical linked data graphs in the Life Sciences Linked Open Data (LSLOD) cloud into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. The dataset published here contains the following files:- The set of Linked Data Graphs from the LSLOD cloud from which schemas are extracted.- Refined Sets of extracted classes, object properties, data properties, and datatypes, shared across the Linked Data Graphs on LSLOD cloud. Where the schema element is reused from a Linked Open Vocabulary or an ontology, it is explicitly indicated.- The LSLOD Schema Graph, which contains all the above extracted schema elements interlinked with each other based on the underlying content. Sample instances and sample assertions are also provided along with broad level characteristics of the modeled content. The LSLOD Schema Graph is saved as a JSON Pickle File. To read the JSON object in this Pickle file use the Python command as follows:with open('LSLOD-Schema-Graph.json.pickle' , 'rb') as infile: x = pickle.load(infile, encoding='iso-8859-1')Check the Referenced Link for more details on this research, raw data files, and code references.
c
Passive Metadata
catalog.caida.org
Updated Jan 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CAIDA (2019). Passive Metadata [Dataset]. https://catalog.caida.org/dataset/passive_metadata
Explore at:
Dataset updated
Jan 15, 2019
Dataset authored and provided by
CAIDA
License
https://www.caida.org/about/legal/aua/public_aua/https://www.caida.org/about/legal/aua/public_aua/
Time period covered
Mar 2008 - Jan 2019
Description
Meta data for all passive monthly traces, incl. chicago and sanjose monitors. This includes the files used to generate the public trace stats.
OpenCitations Meta RDF dataset of all bibliographic metadata and its...
figshare.com
bin
Updated Feb 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2025). OpenCitations Meta RDF dataset of all bibliographic metadata and its provenance information [Dataset]. http://doi.org/10.6084/m9.figshare.21747536.v8
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21747536.v8
Dataset updated
Feb 2, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Compared to the previous version, this release includes metadata related to citing and cited bibliographic resources added in the November 2024 version of Crossref, as well as the November 2024 dump of JaLC (Japan Link Center).In this version, we have focused on correcting a specific type of error, namely the erroneous duplication of resources with the same identifier. We have successfully merged:100% of duplicated identifiers (datacite:Identifier)100% of duplicated responsible agents (foaf:Agent)70% of duplicated bibliographic resources (fabio:Expression)This dataset contains all the bibliographic metadata and its provenance information (in JSON-LD format) included in OpenCitations Meta. The data and the provenance are organized through a complex structure of folders and subfolders, allowing you to quickly find any entity from its URI. The first level consists of the following folders, provided compressed and separately:[folder "ar"]: contains the data and provenance of the responsible agent type entities (http://purl.org/spar/pro/RoleInTime);[folder "br"]: contains the data and provenance of the entities of type bibliographic resource (http:///purl.org/spar/fabio/Expression);[folder "id"]: contains the data and provenance of the identifier entities (http://purl.org/spar/datacite/Identifier);[folder "ra"]: contains the data and provenance of the responsible agent type entities (http://xmlns.com/foaf/0.1/Agent);[folder "re"]: contains the data and provenance of resource embodiment entities (http://purl.org/spar/fabio/Manifestation).The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/1000.zipThis version of the dataset contains:121,302,680 bibliographic entities368,061,399 authors, 2,718,222 editors, and 101,612,475 publishers (counted by their roles, without disambiguating individual698,995 publication venuesThe compressed archives total 47GB, using the tar.gz compression algorithm, and expand to 145G when decompressed. The JSON-LD files inside the archives are further compressed using the zip algorithm. It is recommended to process these inner files as compressed without extracting them, to manage data more efficiently.Additional information about OpenCitations Meta at the official webpage.
o
Open Data Portal Catalogue Metadata
ukpowernetworks.opendatasoft.com
csv, excel, json
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Open Data Portal Catalogue Metadata [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/domain-dataset0/
Explore at:
json, excel, csvAvailable download formats
Dataset updated
Dec 2, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionA special dataset that contains metadata for all the published datasets. Dataset profile fields conform to Dublin Core standard.Other

You can download metadata for individual datasets, via the links provided in descriptions.

Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/
e
Meta-data for data.gov.uk datasets
data.europa.eu
data.wu.ac.at
csv, html, json +2
Updated Oct 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government Digital Service (2021). Meta-data for data.gov.uk datasets [Dataset]. https://data.europa.eu/data/datasets/data_gov_uk-datasets
Explore at:
json, html, csv, unknown, xmlAvailable download formats
Dataset updated
Oct 11, 2021
Dataset authored and provided by
Government Digital Service
License
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Description
A dataset of all the meta-data for all of the datasets available through the data.gov.uk service. This is provided as a zipped CSV or JSON file. It is published nightly.

Updates: 27 Sep 2017: we've moved all the previous dumps to an S3 bucket at https://dgu-ckan-metadata-dumps.s3-eu-west-1.amazonaws.com/ - This link is now listed here as a data file.

From 13/10/16 we added .v2.jsonl dump, which is set to replace the .json dump (which will be discontinued after a 3 month transition). This is produced using 'ckanapi dump'. It provides an enhanced version of each dataset ('validated', or what you get from package_show in CKAN API v3 - the old json was the unvalidated version). This now includes full details of the organization the dataset is in, rather than just the owner_id. Plus it includes the results of the archival & qa for each dataset and resource, showing whether the link is broken, detected format and stars of openness. It also benefits from being json lines http://jsonlines.org/ format, so you don't need to load the whole thing into memory to parse the json - just a line at a time.

On 12/1/2015 the organizations of the CSV was changed:

Before this date, each dataset was one line, and resources added as numbered columns. Since a dataset may have up to 300 resources, it ends up with 1025 columns, which is wider than many versions of Excel and Libreoffice will open. And the uncompressed size of 170Mb is more than most will deal with too. It is suggested you load it into a database, ahandle it with a python or ruby script, or use tools such as Refine or Google Fusion Tables.

After this date, the datasets are provided in one CSV and resources in another. On occasions that you want to join them, you can join them using the (dataset) "Name" column. These are now manageable in spreadsheet software.

You can also use the standard CKAN API if you want to search or get a small section of the data. Please respect the traffic limits in the API: http://data.gov.uk/terms-and-conditions
E
cante2midi Metadata
live.european-language-grid.eu
Updated Oct 19, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2015). cante2midi Metadata [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1004
Explore at:
Dataset updated
Oct 19, 2015
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The cante2midi dataset contains 20 tracks taken from the corpus and includes a large variety of styles and complexity with respect to melodic ornamentation. We provide note-level transcriptions of the singing voice melody in a MIDI-like format, where each note is defined by onset time, duration and a quantized MIDI pitch. In addition, we provide a number of low-level descriptors and the fundamental frequency corresponding to the predominant melody for each track. The meta-information includes editoral meta-data and the musicBrainz IDs.
Content:
README (5KB): Text file containing detailed descriptions of manual and automatic annotations.
meta-data (10KB): XML file containing meta-information: Source (anthology name, CD no. and track no.) and editorial meta-data (artist name, title, style and musicBrainzID).
manual transcriptions (82KB): MIDI (.mid) and text files (.notes) containing manual note-level transcriptions of the singing voice.
automatic transcriptions (75KB): Text files (.notes) and MIDI files (.mid) containing automatic note-level transcriptions of the singing voice.
Bark band energies (39.9MB): Text files (.csv) containing the frame-wise extracted bark band energies.
predominant melody (6.2MB): Text files (.csv) containing the frame-wise extracted predominant melody.
low-level descriptors (7.9MB)Text files (.csv) containing a set of frame-wise extracted low-level features.
MFCCs (17.8MB): Text files (.csv) containing the frame-wise extracted mel-frequency cepstral coefficients (MFCCs).
Magnitude spectrum (709.1MB, optional): Text files (.csv) containing the frame-wise extracted magnitudes of the discrete fourier transform (DFT)
Publications
This work has been accepted for publication in the ACM Journal of Computation and Cultural heritage and is currently available in arXiv.
N. Kroher, J. M. Díaz-Báñez, J. Mora and E. Gómez (2015): Corpus COFLA: A research corpus for the Computational study of Flamenco Music. arXiv:1510.04029 [cs.SD cs.IR].
https://doi.org/10.1145/2875428
Conditions of use
The provided datasets are offered free of charge for internal non-commercial use. We do not grant any rights for redistribution or modification. All data collections were gathered by the COFLA team.
© COFLA 2015. All rights reserved.
OpenCitations Meta RDF dataset of all bibliographic metadata and its...
zenodo.org
bin
Updated Oct 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arcangelo Massari; Arcangelo Massari (2025). OpenCitations Meta RDF dataset of all bibliographic metadata and its provenance information [Dataset]. http://doi.org/10.5281/zenodo.17483301
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17483301
Dataset updated
Oct 31, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Arcangelo Massari; Arcangelo Massari
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Dec 25, 2022
Description
Released on 2025-06-06, compared to the previous version, includes metadata related to citing and cited bibliographic resources added in the April 2025 version of Crossref, as well as the December 2024 dump of JaLC (Japan Link Center).

This dataset contains all the bibliographic metadata and its provenance information (in JSON-LD format) included in OpenCitations Meta. The data and the provenance are organized through a complex structure of folders and subfolders, allowing you to quickly find any entity from its URI. The first level consists of the following folders, provided compressed and separately:

[folder "ar"]: contains the data and provenance of the responsible agent type entities (http://purl.org/spar/pro/RoleInTime" href="http://purl.org/spar/pro/RoleInTime" target="_blank" rel="noopener">http://purl.org/spar/pro/RoleInTime);

[folder "br"]: contains the data and provenance of the entities of type bibliographic resource (http:///purl.org/spar/fabio/Expression);

[folder "id"]: contains the data and provenance of the identifier entities (http://purl.org/spar/datacite/Identifier" href="http://purl.org/spar/datacite/Identifier" target="_blank" rel="noopener">http://purl.org/spar/datacite/Identifier);

[folder "ra"]: contains the data and provenance of the responsible agent type entities (http://xmlns.com/foaf/0.1/Agent" href="http://xmlns.com/foaf/0.1/Agent" target="_blank" rel="noopener">http://xmlns.com/foaf/0.1/Agent);

[folder "re"]: contains the data and provenance of resource embodiment entities (http://purl.org/spar/fabio/Manifestation" href="http://purl.org/spar/fabio/Manifestation" target="_blank" rel="noopener">http://purl.org/spar/fabio/Manifestation)

The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/1000.zip

This version of the dataset contains:

124,526,660 bibliographic entities

376,295,095 authors, 2,765,927 editors, and 103,928,927 publishers (counted by their roles, without disambiguating individual entities)

1,019,563 publication venues

The compressed archives total 46.5 GB, using the 7-zip compression algorithm, and expand to 66 GB when decompressed. The JSON-LD files inside the archives are further compressed using the zip algorithm. It is recommended to process these inner files as compressed without extracting them, to manage data more efficiently.

Additional information about OpenCitations Meta at the https://download.opencitations.net/#meta" target="_blank" rel="noopener">official webpage.
e
Metadata Catalog of the GDI-BSH
data.europa.eu
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Metadata Catalog of the GDI-BSH [Dataset]. https://data.europa.eu/data/datasets/68a7cc6f-0065-47f1-bb02-2b5cb8abdb58
Explore at:
unknownAvailable download formats
Description
This metadata set describes the CSW interface of the metadata catalogue of the spatial data infrastructure of the Federal Maritime and Hydrographic Agency (GDI-BSH).
E
corpusCOFLA Metadata
live.european-language-grid.eu
zenodo.org
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). corpusCOFLA Metadata [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/947
Explore at:
Dataset updated
Apr 17, 2024
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The corpusCOFLA is a collection of more than 1500 flamenco recordings which are representative of what is considered classical flamenco. All contained tracks are taken from 12 commercially available flamenco anthologies in order to minimize a possible bias towards geographic location, singer or record label. We provide the editorial meta-information together with the musicBrainz IDs for all tracks as well as the anthologies as XML documents.
Content:
corpus meta data (619KB): XML file containing editorial meta-information for all tracks: source (anthology, CD number, track number), artist, title, style and musicBrainzID.
anthology meta data (3KB): XML file containing editorial meta-information for all anthologies comprising the corpus: name, record label, year edition, year re-edition, number of CDs
Version 1 (released Nov 23rd, 2017):
the anthology “Antología del Cante Flamenco. Flamencología.” is no longer commercially available and has been removed from the corpus
in the corpus meta-data, a field “style_annotated” has been added, which contains unified styles annotations
singer names have been assigned unique identifiers
Publications
This work has been accepted for publication in the ACM Journal of Computation and Cultural heritage and is currently available in arXiv.
N. Kroher, J. M. Díaz-Báñez, J. Mora and E. Gómez (2015): Corpus COFLA: A research corpus for the Computational study of Flamenco Music. arXiv:1510.04029 [cs.SD cs.IR].
https://doi.org/10.1145/2875428
Conditions of use
The provided datasets are offered free of charge for internal non-commercial use. We do not grant any rights for redistribution or modification. All data collections were gathered by the COFLA team.
© COFLA 2015. All rights reserved.
e
Data from: “Enabling FAIR data in Earth and environmental science with...
knb.ecoinformatics.org
osti.gov
Updated May 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Crystal-Ornelas; Charuleka Varadharajan; Kathleen Beilsmith; Ben Bond-Lamberty; Kristin Boye; Madison Burrus; Shreyas Cholia; Danielle S. Christianson; Michael Crow; Joan Damerow; Kim S. Ely; Amy E. Goldman; Susan Heinz; Valerie C. Hendrix; Zarine Kakalia; Kayla Mathes; Fianna O'Brien; Dylan O'Ryan; Stephanie C. Pennington; Emily Robles; Alistair Rogers; Maegen Simmonds; Terri Velliquette; Pamela Weisenhorn; Jessica Nicole Welch; Karen Whitenack; Deb Agarwal (2023). Data from: “Enabling FAIR data in Earth and environmental science with community-centric (meta)data reporting formats” [Dataset]. http://doi.org/10.15485/1866606
Explore at:
Unique identifier
https://doi.org/10.15485/1866606
Dataset updated
May 4, 2023
Dataset provided by
ESS-DIVE
Authors
Robert Crystal-Ornelas; Charuleka Varadharajan; Kathleen Beilsmith; Ben Bond-Lamberty; Kristin Boye; Madison Burrus; Shreyas Cholia; Danielle S. Christianson; Michael Crow; Joan Damerow; Kim S. Ely; Amy E. Goldman; Susan Heinz; Valerie C. Hendrix; Zarine Kakalia; Kayla Mathes; Fianna O'Brien; Dylan O'Ryan; Stephanie C. Pennington; Emily Robles; Alistair Rogers; Maegen Simmonds; Terri Velliquette; Pamela Weisenhorn; Jessica Nicole Welch; Karen Whitenack; Deb Agarwal
Time period covered
Jan 1, 2017
Description
This dataset contains supplementary information for a manuscript describing the ESS-DIVE (Environmental Systems Science Data Infrastructure for a Virtual Ecosystem) data repository's community data and metadata reporting formats. The purpose of creating the ESS-DIVE reporting formats was to provide guidelines for formatting some of the diverse data types that can be found in the ESS-DIVE repository. The 6 teams of community partners who developed the reporting formats included scientists and engineers from across the Department of Energy National Lab network. Additionally, during the development process, 247 individuals representing 128 institutions provided input on the formats. The primary files in this dataset are 10 data and metadata crosswalk for ESS-DIVE’s reporting formats (all files ending in _crosswalk.csv). The crosswalks compare elements used in each of the reporting formats to other related standards and data resources (e.g., repositories, datasets, data systems). This dataset also contains additional files recommended by ESS-DIVE’s file-level metadata reporting format. Each data file has an associated dictionary (files ending in _dd.csv) which provide a brief description of each standard or data resource consulted in the data reporting format development process. The flmd.csv file describes each file contained within the dataset.
E
cante100 Metadata
live.european-language-grid.eu
zenodo.org
Updated Apr 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). cante100 Metadata [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1005
Explore at:
Dataset updated
Apr 11, 2024
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The cante100 dataset contains 100 tracks taken from the corpus. We defined 10 style families of which 10 tracks each are included. Apart from the style family, we manually annotated the sections of the track in which the vocals are present. In addition, we provide a number of low-level descriptors and the fundamental frequency corresponding to the predominant melody for each track. The meta-information includes editoral meta-data and the musicBrainz ID.
Content:
README (5KB): Text file containing detailed descriptions of manual and automatic annotations.
meta-data (59KB): XML file containing meta-information: Source (anthology name, CD no. and track no.), editorial meta-data (artist name, title, style, musicBrainzID) and the manually annotated style family.
vocal sections (8.9MB): Text file (.csv) containing frame-wise vocal section annotations.
automatic transcriptions (375KB): Text files (.notes) and MIDI files (.mid) containing automatic note-level transcriptions of the singing voice.
Bark band energies (216.6MB): Text files (.csv) containing the frame-wise extracted bark band energies.
predominant melody (33.5MB): Text files (.csv) containing the frame-wise extracted predominant melody.
low-level descriptors (42.9MB): Text files (.csv) containing a set of frame-wise extracted low-level features.
MFCCs (97.1MB): Text files (.csv) containing the frame-wise extracted mel-frequency cepstral coefficients (MFCCs).
Magnitude spectrum (3.85GB): Text files (.csv) containing the frame-wise extracted magnitudes of the discrete fourier transform (DFT).
Publications
This work has been accepted for publication in the ACM Journal of Computation and Cultural heritage and is currently available in arXiv.
N. Kroher, J. M. Díaz-Báñez, J. Mora and E. Gómez (2015): Corpus COFLA: A research corpus for the Computational study of Flamenco Music. arXiv:1510.04029 [cs.SD cs.IR].
https://doi.org/10.1145/2875428
Conditions of use
The provided datasets are offered free of charge for internal non-commercial use. We do not grant any rights for redistribution or modification. All data collections were gathered by the COFLA team.
© COFLA 2015. All rights reserved.
CMAQ v5.2 and WRF v3.8.1 model data, meta data and figures
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). CMAQ v5.2 and WRF v3.8.1 model data, meta data and figures [Dataset]. https://catalog.data.gov/dataset/cmaq-v5-2-and-wrf-v3-8-1-model-data-meta-data-and-figures
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The data are described in detail in the uploaded file "Science hub metadata.docx". This dataset is associated with the following publication: Zhang, Y., J. Bash, S. Roselle, A. Shatas, A. Repinsky, R. Mathur, C. Hogrefe, J. Piziali, T. Jacobs, and A. Gilliland. Unexpected air quality impacts from implementation of green infrastructure in urban environments: a Kansas City Case Study. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 744(20): 140960, (2020).

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation

Meta data and supporting documentation

Explore at:

Dataset updated

Nov 12, 2020

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

Clear search

Close search

Google apps

Main menu

Meta data and supporting documentation

The Open Metadata Exchange: A decentralized network supporting FAIR...

Meta-Data

OpenCitations Meta RDF dataset of page numbers metadata and its provenance...

OpenCitations Meta RDF dataset of agent roles metadata and its provenance...

Meta Kaggle Code : Metadata ( CSV )

Background

Machine Learning YouTube Meta Data

Context

Content

Acknowledgements

Inspiration

OpenCitations Meta RDF dataset of identifiers metadata and its provenance...

Extracted Schemas from the Life Sciences Linked Open Data Cloud

Passive Metadata

OpenCitations Meta RDF dataset of all bibliographic metadata and its...

Open Data Portal Catalogue Metadata

Meta-data for data.gov.uk datasets

cante2midi Metadata

OpenCitations Meta RDF dataset of all bibliographic metadata and its...

Metadata Catalog of the GDI-BSH

corpusCOFLA Metadata

Data from: “Enabling FAIR data in Earth and environmental science with...

cante100 Metadata

CMAQ v5.2 and WRF v3.8.1 model data, meta data and figures

Meta data and supporting documentation