100+ datasets found
  1. Meta data and supporting documentation

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) ‱ y: Vector of binary responses (1: preterm birth, 0: control) ‱ x: Matrix of covariates; one row for each simulated individual ‱ z: Matrix of standardized pollution exposures ‱ n: Number of simulated individuals ‱ m: Number of exposure time periods (e.g., weeks of pregnancy) ‱ p: Number of columns in the covariate design matrix ‱ alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  2. q

    The Open Metadata Exchange: A decentralized network supporting FAIR...

    • qubeshub.org
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anoop Aryal; Christian Clauss; Sam Donovan; Drew LaMar; Anjani Nambiar; Lisa Petrides (2025). The Open Metadata Exchange: A decentralized network supporting FAIR (meta)data practices and persistence for science gateways [Dataset]. http://doi.org/10.25334/JCTY-3G78
    Explore at:
    Dataset updated
    Nov 5, 2025
    Dataset provided by
    QUBES
    Authors
    Anoop Aryal; Christian Clauss; Sam Donovan; Drew LaMar; Anjani Nambiar; Lisa Petrides
    Description

    The integration of FAIR (Findability, Accessibility, Interoperability, and Reusability) standards in both open science and open education holds transformative potential for advancing STEM education and broadening participation in scientific research. Despite the well-documented benefits of the open practices associated with FAIR standards, adoption remains limited. Science gateways are a particularly attractive area for FAIR implementation, as they form a distributed network of user-friendly web portals supporting access for domain researchers to high-performance computing, scientific data, software, and open educational resources. This paper introduces the Open Metadata Exchange (OME), a decentralized network of repositories enabling the sharing of content metadata and resources. The OME network provides a novel solution to FAIRification challenges associated with the distributed nature of science gateways by directly addressing resource persistence in the context of sustainability and decommissioning.

  3. Meta-Data

    • figshare.com
    txt
    Updated Aug 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haley Chatelaine (2022). Meta-Data [Dataset]. http://doi.org/10.6084/m9.figshare.19233555.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 23, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Haley Chatelaine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mouse ID, diet group, colon region, study cohort, analytical batch, and sample weight data

  4. OpenCitations Meta RDF dataset of page numbers metadata and its provenance...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations (2024). OpenCitations Meta RDF dataset of page numbers metadata and its provenance information [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_10936231
    Explore at:
    Dataset updated
    Apr 6, 2024
    Dataset authored and provided by
    OpenCitationshttps://opencitations.net/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset is a specialized subset of the OpenCitations Meta RDF data, focusing exclusively on data related to page numbers of bibliographic resources, known as manifestations (http://purl.org/spar/fabio/Manifestation). It contains all the bibliographic metadata and its provenance information, structured specifically around manifestations (page numbers), in JSON-LD format.

    The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

    After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

    At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

    For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/se.zip

    Additional information about OpenCitations Meta at the official webpage.

  5. Z

    OpenCitations Meta RDF dataset of agent roles metadata and its provenance...

    • data.niaid.nih.gov
    • nde-dev.biothings.io
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations (2024). OpenCitations Meta RDF dataset of agent roles metadata and its provenance information [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10936245
    Explore at:
    Dataset updated
    Apr 6, 2024
    Authors
    OpenCitations
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset is a specialized subset of the OpenCitations Meta RDF data, focusing exclusively on data related to agent roles of bibliographic resources (http://purl.org/spar/pro/RoleInTime). These agents can be authors, editors, or publishers. It contains all the metadata and its provenance information, structured specifically around agent roles, in JSON-LD format.

    The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

    After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

    At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

    For example, data related to the entity is located in the folder /ar/06250/10000/1000/1000.zip, while information about provenance in /ar/06250/10000/1000/prov/se.zip

    Additional information about OpenCitations Meta at the official webpage.

  6. Meta Kaggle Code : Metadata ( CSV )

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AYUSH KHAIRE ( Previously 😊 ) (2025). Meta Kaggle Code : Metadata ( CSV ) [Dataset]. https://www.kaggle.com/datasets/ayushkhaire/meta-kaggle-codemetadata-csv
    Explore at:
    zip(836224705 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    AYUSH KHAIRE ( Previously 😊 )
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Background

    This is the dataset about all the notebooks in Meta Kaggle Code Dataset . The original dataset is owned by kaggle team and I am just trying to extract meta data about meta kaggle code . My dataset contains following columns and hence , giving their description . If you have a feedback , you can view either Discussions or you can create a new topic as well . I hope you like the dataset , and you will utilize it for the Meta Kaggle Hackethon .

    Cheers , ayush

  7. Machine Learning YouTube Meta Data

    • kaggle.com
    zip
    Updated Aug 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Durgesh Samariya (2020). Machine Learning YouTube Meta Data [Dataset]. https://www.kaggle.com/themlphdstudent/machine-learning-youtube-meta-data
    Explore at:
    zip(327521 bytes)Available download formats
    Dataset updated
    Aug 20, 2020
    Authors
    Durgesh Samariya
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Context

    Metadata of Machine Learning videos on YouTube.

    Content

    This dataset contains meta data of 500 videos on machine learning. Simply first 500 videos when your search machine learning in youtube search.

    Acknowledgements

    Data scraped from https://wiki.digitalmethods.net/Dmi/ToolDatabase . Cover Photo: Photo by Rachit Tank on Unsplash.

    Motivation : Dataset by Gabriel Preda

    Inspiration

    Using this dataset, analyse popularity of machine learning videos and channel with their like, dislike counts.

  8. Z

    OpenCitations Meta RDF dataset of identifiers metadata and its provenance...

    • data.niaid.nih.gov
    Updated Apr 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations (2024). OpenCitations Meta RDF dataset of identifiers metadata and its provenance information [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10936285
    Explore at:
    Dataset updated
    Apr 6, 2024
    Authors
    OpenCitations
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset is a specialized subset of the OpenCitations Meta RDF data, focusing exclusively on data related to identifiers (http://purl.org/spar/datacite/Identifier) of bibliographic resources. It contains all the metadata and its provenance information, structured specifically around identifiers, in JSON-LD format.

    The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

    After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

    At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

    For example, data related to the entity is located in the folder /id/06250/10000/1000/1000.zip, while information about provenance in /id/06250/10000/1000/prov/se.zip

    Additional information about OpenCitations Meta at the official webpage.

  9. Extracted Schemas from the Life Sciences Linked Open Data Cloud

    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maulik Kamdar (2023). Extracted Schemas from the Life Sciences Linked Open Data Cloud [Dataset]. http://doi.org/10.6084/m9.figshare.12402425.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Maulik Kamdar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is related to the manuscript "An empirical meta-analysis of the life sciences linked open data on the web" published at Nature Scientific Data. If you use the dataset, please cite the manuscript as follows:Kamdar, M.R., Musen, M.A. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8, 24 (2021). https://doi.org/10.1038/s41597-021-00797-yWe have extracted schemas from more than 80 publicly available biomedical linked data graphs in the Life Sciences Linked Open Data (LSLOD) cloud into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. The dataset published here contains the following files:- The set of Linked Data Graphs from the LSLOD cloud from which schemas are extracted.- Refined Sets of extracted classes, object properties, data properties, and datatypes, shared across the Linked Data Graphs on LSLOD cloud. Where the schema element is reused from a Linked Open Vocabulary or an ontology, it is explicitly indicated.- The LSLOD Schema Graph, which contains all the above extracted schema elements interlinked with each other based on the underlying content. Sample instances and sample assertions are also provided along with broad level characteristics of the modeled content. The LSLOD Schema Graph is saved as a JSON Pickle File. To read the JSON object in this Pickle file use the Python command as follows:with open('LSLOD-Schema-Graph.json.pickle' , 'rb') as infile: x = pickle.load(infile, encoding='iso-8859-1')Check the Referenced Link for more details on this research, raw data files, and code references.

  10. c

    Passive Metadata

    • catalog.caida.org
    Updated Jan 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CAIDA (2019). Passive Metadata [Dataset]. https://catalog.caida.org/dataset/passive_metadata
    Explore at:
    Dataset updated
    Jan 15, 2019
    Dataset authored and provided by
    CAIDA
    License

    https://www.caida.org/about/legal/aua/public_aua/https://www.caida.org/about/legal/aua/public_aua/

    Time period covered
    Mar 2008 - Jan 2019
    Description

    Meta data for all passive monthly traces, incl. chicago and sanjose monitors. This includes the files used to generate the public trace stats.

  11. OpenCitations Meta RDF dataset of all bibliographic metadata and its...

    • figshare.com
    bin
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenCitations ​ (2025). OpenCitations Meta RDF dataset of all bibliographic metadata and its provenance information [Dataset]. http://doi.org/10.6084/m9.figshare.21747536.v8
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 2, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    OpenCitations ​
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Compared to the previous version, this release includes metadata related to citing and cited bibliographic resources added in the November 2024 version of Crossref, as well as the November 2024 dump of JaLC (Japan Link Center).In this version, we have focused on correcting a specific type of error, namely the erroneous duplication of resources with the same identifier. We have successfully merged:100% of duplicated identifiers (datacite:Identifier)100% of duplicated responsible agents (foaf:Agent)70% of duplicated bibliographic resources (fabio:Expression)This dataset contains all the bibliographic metadata and its provenance information (in JSON-LD format) included in OpenCitations Meta. The data and the provenance are organized through a complex structure of folders and subfolders, allowing you to quickly find any entity from its URI. The first level consists of the following folders, provided compressed and separately:[folder "ar"]: contains the data and provenance of the responsible agent type entities (http://purl.org/spar/pro/RoleInTime);[folder "br"]: contains the data and provenance of the entities of type bibliographic resource (http:///purl.org/spar/fabio/Expression);[folder "id"]: contains the data and provenance of the identifier entities (http://purl.org/spar/datacite/Identifier);[folder "ra"]: contains the data and provenance of the responsible agent type entities (http://xmlns.com/foaf/0.1/Agent);[folder "re"]: contains the data and provenance of resource embodiment entities (http://purl.org/spar/fabio/Manifestation).The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/1000.zipThis version of the dataset contains:121,302,680 bibliographic entities368,061,399 authors, 2,718,222 editors, and 101,612,475 publishers (counted by their roles, without disambiguating individual698,995 publication venuesThe compressed archives total 47GB, using the tar.gz compression algorithm, and expand to 145G when decompressed. The JSON-LD files inside the archives are further compressed using the zip algorithm. It is recommended to process these inner files as compressed without extracting them, to manage data more efficiently.Additional information about OpenCitations Meta at the official webpage.

  12. o

    Open Data Portal Catalogue Metadata

    • ukpowernetworks.opendatasoft.com
    csv, excel, json
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Open Data Portal Catalogue Metadata [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/domain-dataset0/
    Explore at:
    json, excel, csvAvailable download formats
    Dataset updated
    Dec 2, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionA special dataset that contains metadata for all the published datasets. Dataset profile fields conform to Dublin Core standard.Other

    You can download metadata for individual datasets, via the links provided in descriptions.

    Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/

  13. e

    Meta-data for data.gov.uk datasets

    • data.europa.eu
    • data.wu.ac.at
    csv, html, json +2
    Updated Oct 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government Digital Service (2021). Meta-data for data.gov.uk datasets [Dataset]. https://data.europa.eu/data/datasets/data_gov_uk-datasets
    Explore at:
    json, html, csv, unknown, xmlAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset authored and provided by
    Government Digital Service
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    A dataset of all the meta-data for all of the datasets available through the data.gov.uk service. This is provided as a zipped CSV or JSON file. It is published nightly.

    Updates: 27 Sep 2017: we've moved all the previous dumps to an S3 bucket at https://dgu-ckan-metadata-dumps.s3-eu-west-1.amazonaws.com/ - This link is now listed here as a data file.

    From 13/10/16 we added .v2.jsonl dump, which is set to replace the .json dump (which will be discontinued after a 3 month transition). This is produced using 'ckanapi dump'. It provides an enhanced version of each dataset ('validated', or what you get from package_show in CKAN API v3 - the old json was the unvalidated version). This now includes full details of the organization the dataset is in, rather than just the owner_id. Plus it includes the results of the archival & qa for each dataset and resource, showing whether the link is broken, detected format and stars of openness. It also benefits from being json lines http://jsonlines.org/ format, so you don't need to load the whole thing into memory to parse the json - just a line at a time.

    On 12/1/2015 the organizations of the CSV was changed:

    • Before this date, each dataset was one line, and resources added as numbered columns. Since a dataset may have up to 300 resources, it ends up with 1025 columns, which is wider than many versions of Excel and Libreoffice will open. And the uncompressed size of 170Mb is more than most will deal with too. It is suggested you load it into a database, ahandle it with a python or ruby script, or use tools such as Refine or Google Fusion Tables.

    • After this date, the datasets are provided in one CSV and resources in another. On occasions that you want to join them, you can join them using the (dataset) "Name" column. These are now manageable in spreadsheet software.

    You can also use the standard CKAN API if you want to search or get a small section of the data. Please respect the traffic limits in the API: http://data.gov.uk/terms-and-conditions

  14. E

    cante2midi Metadata

    • live.european-language-grid.eu
    Updated Oct 19, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). cante2midi Metadata [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1004
    Explore at:
    Dataset updated
    Oct 19, 2015
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The cante2midi dataset contains 20 tracks taken from the corpus and includes a large variety of styles and complexity with respect to melodic ornamentation. We provide note-level transcriptions of the singing voice melody in a MIDI-like format, where each note is defined by onset time, duration and a quantized MIDI pitch. In addition, we provide a number of low-level descriptors and the fundamental frequency corresponding to the predominant melody for each track. The meta-information includes editoral meta-data and the musicBrainz IDs.

    Content:

    README (5KB): Text file containing detailed descriptions of manual and automatic annotations.

    meta-data (10KB): XML file containing meta-information: Source (anthology name, CD no. and track no.) and editorial meta-data (artist name, title, style and musicBrainzID).

    manual transcriptions (82KB): MIDI (.mid) and text files (.notes) containing manual note-level transcriptions of the singing voice.

    automatic transcriptions (75KB): Text files (.notes) and MIDI files (.mid) containing automatic note-level transcriptions of the singing voice.

    Bark band energies (39.9MB): Text files (.csv) containing the frame-wise extracted bark band energies.

    predominant melody (6.2MB): Text files (.csv) containing the frame-wise extracted predominant melody.

    low-level descriptors (7.9MB)Text files (.csv) containing a set of frame-wise extracted low-level features.

    MFCCs (17.8MB): Text files (.csv) containing the frame-wise extracted mel-frequency cepstral coefficients (MFCCs).

    Magnitude spectrum (709.1MB, optional): Text files (.csv) containing the frame-wise extracted magnitudes of the discrete fourier transform (DFT)

    Publications

    This work has been accepted for publication in the ACM Journal of Computation and Cultural heritage and is currently available in arXiv.

    N. Kroher, J. M. Díaz-Båñez, J. Mora and E. Gómez (2015): Corpus COFLA: A research corpus for the Computational study of Flamenco Music. arXiv:1510.04029 [cs.SD cs.IR].

    https://doi.org/10.1145/2875428

    Conditions of use

    The provided datasets are offered free of charge for internal non-commercial use. We do not grant any rights for redistribution or modification. All data collections were gathered by the COFLA team.

    © COFLA 2015. All rights reserved.

  15. OpenCitations Meta RDF dataset of all bibliographic metadata and its...

    • zenodo.org
    bin
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arcangelo Massari; Arcangelo Massari (2025). OpenCitations Meta RDF dataset of all bibliographic metadata and its provenance information [Dataset]. http://doi.org/10.5281/zenodo.17483301
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 31, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Arcangelo Massari; Arcangelo Massari
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Dec 25, 2022
    Description
    Released on 2025-06-06, compared to the previous version, includes metadata related to citing and cited bibliographic resources added in the April 2025 version of Crossref, as well as the December 2024 dump of JaLC (Japan Link Center).

    This dataset contains all the bibliographic metadata and its provenance information (in JSON-LD format) included in OpenCitations Meta. The data and the provenance are organized through a complex structure of folders and subfolders, allowing you to quickly find any entity from its URI. The first level consists of the following folders, provided compressed and separately:

    The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

    After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

    At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

    For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/1000.zip

    This version of the dataset contains:

    • 124,526,660 bibliographic entities
    • 376,295,095 authors, 2,765,927 editors, and 103,928,927 publishers (counted by their roles, without disambiguating individual entities)
    • 1,019,563 publication venues

    The compressed archives total 46.5 GB, using the 7-zip compression algorithm, and expand to 66 GB when decompressed. The JSON-LD files inside the archives are further compressed using the zip algorithm. It is recommended to process these inner files as compressed without extracting them, to manage data more efficiently.

    Additional information about OpenCitations Meta at the https://download.opencitations.net/#meta" target="_blank" rel="noopener">official webpage.

  16. e

    Metadata Catalog of the GDI-BSH

    • data.europa.eu
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Metadata Catalog of the GDI-BSH [Dataset]. https://data.europa.eu/data/datasets/68a7cc6f-0065-47f1-bb02-2b5cb8abdb58
    Explore at:
    unknownAvailable download formats
    Description

    This metadata set describes the CSW interface of the metadata catalogue of the spatial data infrastructure of the Federal Maritime and Hydrographic Agency (GDI-BSH).

  17. E

    corpusCOFLA Metadata

    • live.european-language-grid.eu
    • zenodo.org
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). corpusCOFLA Metadata [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/947
    Explore at:
    Dataset updated
    Apr 17, 2024
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The corpusCOFLA is a collection of more than 1500 flamenco recordings which are representative of what is considered classical flamenco. All contained tracks are taken from 12 commercially available flamenco anthologies in order to minimize a possible bias towards geographic location, singer or record label. We provide the editorial meta-information together with the musicBrainz IDs for all tracks as well as the anthologies as XML documents.

    Content:

    corpus meta data (619KB): XML file containing editorial meta-information for all tracks: source (anthology, CD number, track number), artist, title, style and musicBrainzID.

    anthology meta data (3KB): XML file containing editorial meta-information for all anthologies comprising the corpus: name, record label, year edition, year re-edition, number of CDs

    Version 1 (released Nov 23rd, 2017):

    the anthology “Antología del Cante Flamenco. Flamencología.” is no longer commercially available and has been removed from the corpus

    in the corpus meta-data, a field “style_annotated” has been added, which contains unified styles annotations

    singer names have been assigned unique identifiers

    Publications

    This work has been accepted for publication in the ACM Journal of Computation and Cultural heritage and is currently available in arXiv.

    N. Kroher, J. M. Díaz-Båñez, J. Mora and E. Gómez (2015): Corpus COFLA: A research corpus for the Computational study of Flamenco Music. arXiv:1510.04029 [cs.SD cs.IR].

    https://doi.org/10.1145/2875428

    Conditions of use

    The provided datasets are offered free of charge for internal non-commercial use. We do not grant any rights for redistribution or modification. All data collections were gathered by the COFLA team.

    © COFLA 2015. All rights reserved.

  18. e

    Data from: “Enabling FAIR data in Earth and environmental science with...

    • knb.ecoinformatics.org
    • osti.gov
    Updated May 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Crystal-Ornelas; Charuleka Varadharajan; Kathleen Beilsmith; Ben Bond-Lamberty; Kristin Boye; Madison Burrus; Shreyas Cholia; Danielle S. Christianson; Michael Crow; Joan Damerow; Kim S. Ely; Amy E. Goldman; Susan Heinz; Valerie C. Hendrix; Zarine Kakalia; Kayla Mathes; Fianna O'Brien; Dylan O'Ryan; Stephanie C. Pennington; Emily Robles; Alistair Rogers; Maegen Simmonds; Terri Velliquette; Pamela Weisenhorn; Jessica Nicole Welch; Karen Whitenack; Deb Agarwal (2023). Data from: “Enabling FAIR data in Earth and environmental science with community-centric (meta)data reporting formats” [Dataset]. http://doi.org/10.15485/1866606
    Explore at:
    Dataset updated
    May 4, 2023
    Dataset provided by
    ESS-DIVE
    Authors
    Robert Crystal-Ornelas; Charuleka Varadharajan; Kathleen Beilsmith; Ben Bond-Lamberty; Kristin Boye; Madison Burrus; Shreyas Cholia; Danielle S. Christianson; Michael Crow; Joan Damerow; Kim S. Ely; Amy E. Goldman; Susan Heinz; Valerie C. Hendrix; Zarine Kakalia; Kayla Mathes; Fianna O'Brien; Dylan O'Ryan; Stephanie C. Pennington; Emily Robles; Alistair Rogers; Maegen Simmonds; Terri Velliquette; Pamela Weisenhorn; Jessica Nicole Welch; Karen Whitenack; Deb Agarwal
    Time period covered
    Jan 1, 2017
    Description

    This dataset contains supplementary information for a manuscript describing the ESS-DIVE (Environmental Systems Science Data Infrastructure for a Virtual Ecosystem) data repository's community data and metadata reporting formats. The purpose of creating the ESS-DIVE reporting formats was to provide guidelines for formatting some of the diverse data types that can be found in the ESS-DIVE repository. The 6 teams of community partners who developed the reporting formats included scientists and engineers from across the Department of Energy National Lab network. Additionally, during the development process, 247 individuals representing 128 institutions provided input on the formats. The primary files in this dataset are 10 data and metadata crosswalk for ESS-DIVE’s reporting formats (all files ending in _crosswalk.csv). The crosswalks compare elements used in each of the reporting formats to other related standards and data resources (e.g., repositories, datasets, data systems). This dataset also contains additional files recommended by ESS-DIVE’s file-level metadata reporting format. Each data file has an associated dictionary (files ending in _dd.csv) which provide a brief description of each standard or data resource consulted in the data reporting format development process. The flmd.csv file describes each file contained within the dataset.

  19. E

    cante100 Metadata

    • live.european-language-grid.eu
    • zenodo.org
    Updated Apr 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). cante100 Metadata [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1005
    Explore at:
    Dataset updated
    Apr 11, 2024
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The cante100 dataset contains 100 tracks taken from the corpus. We defined 10 style families of which 10 tracks each are included. Apart from the style family, we manually annotated the sections of the track in which the vocals are present. In addition, we provide a number of low-level descriptors and the fundamental frequency corresponding to the predominant melody for each track. The meta-information includes editoral meta-data and the musicBrainz ID.

    Content:

    README (5KB): Text file containing detailed descriptions of manual and automatic annotations.

    meta-data (59KB): XML file containing meta-information: Source (anthology name, CD no. and track no.), editorial meta-data (artist name, title, style, musicBrainzID) and the manually annotated style family.

    vocal sections (8.9MB): Text file (.csv) containing frame-wise vocal section annotations.

    automatic transcriptions (375KB): Text files (.notes) and MIDI files (.mid) containing automatic note-level transcriptions of the singing voice.

    Bark band energies (216.6MB): Text files (.csv) containing the frame-wise extracted bark band energies.

    predominant melody (33.5MB): Text files (.csv) containing the frame-wise extracted predominant melody.

    low-level descriptors (42.9MB): Text files (.csv) containing a set of frame-wise extracted low-level features.

    MFCCs (97.1MB): Text files (.csv) containing the frame-wise extracted mel-frequency cepstral coefficients (MFCCs).

    Magnitude spectrum (3.85GB): Text files (.csv) containing the frame-wise extracted magnitudes of the discrete fourier transform (DFT).

    Publications

    This work has been accepted for publication in the ACM Journal of Computation and Cultural heritage and is currently available in arXiv.

    N. Kroher, J. M. Díaz-Båñez, J. Mora and E. Gómez (2015): Corpus COFLA: A research corpus for the Computational study of Flamenco Music. arXiv:1510.04029 [cs.SD cs.IR].

    https://doi.org/10.1145/2875428

    Conditions of use

    The provided datasets are offered free of charge for internal non-commercial use. We do not grant any rights for redistribution or modification. All data collections were gathered by the COFLA team.

    © COFLA 2015. All rights reserved.

  20. CMAQ v5.2 and WRF v3.8.1 model data, meta data and figures

    • catalog.data.gov
    • datasets.ai
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). CMAQ v5.2 and WRF v3.8.1 model data, meta data and figures [Dataset]. https://catalog.data.gov/dataset/cmaq-v5-2-and-wrf-v3-8-1-model-data-meta-data-and-figures
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The data are described in detail in the uploaded file "Science hub metadata.docx". This dataset is associated with the following publication: Zhang, Y., J. Bash, S. Roselle, A. Shatas, A. Repinsky, R. Mathur, C. Hogrefe, J. Piziali, T. Jacobs, and A. Gilliland. Unexpected air quality impacts from implementation of green infrastructure in urban environments: a Kansas City Case Study. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 744(20): 140960, (2020).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
Organization logo

Meta data and supporting documentation

Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description

We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) ‱ y: Vector of binary responses (1: preterm birth, 0: control) ‱ x: Matrix of covariates; one row for each simulated individual ‱ z: Matrix of standardized pollution exposures ‱ n: Number of simulated individuals ‱ m: Number of exposure time periods (e.g., weeks of pregnancy) ‱ p: Number of columns in the covariate design matrix ‱ alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

Search
Clear search
Close search
Google apps
Main menu