82 datasets found

o
Open Data Portal Catalogue Metadata
ukpowernetworks.opendatasoft.com
csv, excel, json
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Open Data Portal Catalogue Metadata [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/domain-dataset0/
Explore at:
json, excel, csvAvailable download formats
Dataset updated
Mar 26, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionA special dataset that contains metadata for all the published datasets. Dataset profile fields conform to Dublin Core standard.Other

You can download metadata for individual datasets, via the links provided in descriptions.

Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/
Meta Kaggle Code
kaggle.com
zip
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(133186454988 bytes)Available download formats
Dataset updated
Mar 20, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
w
Collections database
data.wu.ac.at
cloud.csiss.gmu.edu
+1more
csv, html
Updated Feb 10, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tate (2016). Collections database [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/YWU3Mzk5MzktMmFhZC00MjdiLWIzYjItNjZhMmJmMjk1NGE4
Explore at:
csv, htmlAvailable download formats
Dataset updated
Feb 10, 2016
Dataset provided by
Tate
Description
The Tate Collection Here we present the metadata for around 70,000 artworks that Tate owns or jointly owns with the National Galleries of Scotland as part of ARTIST ROOMS. Metadata for around 3,500 associated artists is also included.

The metadata here is released under the Creative Commons Public Domain CC0 licence. Please see the enclosed LICENCE file for more detail.

Images are not included and are not part of the dataset. Use of Tate images is covered on the Copyright and permissions page. You may also license images for commercial use.

Please review the full usage guidelines.

Repository Contents

We offer two data formats:

A richer dataset is provided in the JSON format, which is organised by the directory structure of the Git repository. JSON supports more hierarchical or nested information such as subjects.

We also provide CSVs of flattened data, which is less comprehensive but perhaps easier to grok. The CSVs provide a good introduction to overall contents of the Tate metadata and create opportunities for artistic pivot tables.

JSON

Artists

Each artist has his or her own JSON file. They are found in the artists folder, then filed away by first letter of the artist’s surname.

Artworks

Artworks are found in the artworks folder. They are filed away by accession number. This is the unique identifier given to artworks when they come into the Tate collection. In many cases, the format has significance. For example, the ar accession number prefix indicates that the artwork is part of ARTIST ROOMS collection. The n prefix indicates works that once were part of the National Gallery collection.

CSV

There is one CSV file for artists (artist_data.csv) and one (very large) for artworks (artwork_data.csv), which we may one day break up into more manageable chunks. The CSV headings should be helpful. Let us know if not. Entrepreneurial hackers could use the CSVs as an index to the JSON collections if they wanted richer data.

Usage guidelines for open data

These usage guidelines are based on goodwill. They are not a legal contract but Tate requests that you follow these guidelines if you use Metadata from our Collection dataset.

The Metadata published by Tate is available free of restrictions under the Creative Commons Zero Public Domain Dedication.

This means that you can use it for any purpose without having to give attribution. However, Tate requests that you actively acknowledge and give attribution to Tate wherever possible. Attribution supports future efforts to release other data. It also reduces the amount of ‘orphaned data’, helping retain links to authoritative sources.

Give attribution to Tate

Make sure that others are aware of the rights status of Tate and are aware of these guidelines by keeping intact links to the Creative Commons Zero Public Domain Dedication.

If for technical or other reasons you cannot include all the links to all sources of the Metadata and rights information directly with the Metadata, you should consider including them separately, for example in a separate document that is distributed with the Metadata or dataset.

If for technical or other reasons you cannot include all the links to all sources of the Metadata and rights information, you may consider linking only to the Metadata source on Tate’s website, where all available sources and rights information can be found, including in machine readable formats.

Metadata is dynamic

When working with Metadata obtained from Tate, please be aware that this Metadata is not static. It sometimes changes daily. Tate continuously updates its Metadata in order to correct mistakes and include new and additional information. Museum collections are under constant study and research, and new information is frequently added to objects in the collection.

Mention your modifications of the Metadata and contribute your modified Metadata back

Whenever you transform, translate or otherwise modify the Metadata, make it clear that the resulting Metadata has been modified by you. If you enrich or otherwise modify Metadata, consider publishing the derived Metadata without reuse restrictions, preferably via the Creative Commons Zero Public Domain Dedication.

Be responsible

Ensure that you do not use the Metadata in a way that suggests any official status or that Tate endorses you or your use of the Metadata, unless you have prior permission to do so.

Ensure that you do not mislead others or misrepresent the Metadata or its sources

Ensure that your use of the Metadata does not breach any national legislation based thereon, notably concerning (but not limited to) data protection, defamation or copyright. Please note that you use the Metadata at your own risk. Tate offers the Metadata as-is and makes no representations or warranties of any kind concerning any Metadata published by Tate.

The writers of these guidelines are deeply indebted to the Smithsonian Cooper-Hewitt, National Design Museum; and Europeana.
g
Collections database | gimi9.com
gimi9.com
Updated Nov 1, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Collections database | gimi9.com [Dataset]. https://gimi9.com/dataset/uk_collections-database
Explore at:
Dataset updated
Nov 1, 2013
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We offer two data formats: A richer dataset is provided in the JSON format, which is organised by the directory structure of the Git repository. JSON supports more hierarchical or nested information such as subjects. We also provide CSVs of flattened data, which is less comprehensive but perhaps easier to grok. The CSVs provide a good introduction to overall contents of the Tate metadata and create opportunities for artistic pivot tables. JSON Artists Each artist has his or her own JSON file. They are found in the artists folder, then filed away by first letter of the artist’s surname. Artworks Artworks are found in the artworks folder. They are filed away by accession number. This is the unique identifier given to artworks when they come into the Tate collection. In many cases, the format has significance. For example, the ar accession number prefix indicates that the artwork is part of ARTIST ROOMS collection. The n prefix indicates works that once were part of the National Gallery collection. CSV There is one CSV file for artists (artist_data.csv) and one (very large) for artworks (artwork_data.csv), which we may one day break up into more manageable chunks. The CSV headings should be helpful. Let us know if not. Entrepreneurial hackers could use the CSVs as an index to the JSON collections if they wanted richer data. Usage guidelines for open data These usage guidelines are based on goodwill. They are not a legal contract but Tate requests that you follow these guidelines if you use Metadata from our Collection dataset. The Metadata published by Tate is available free of restrictions under the Creative Commons Zero Public Domain Dedication. This means that you can use it for any purpose without having to give attribution. However, Tate requests that you actively acknowledge and give attribution to Tate wherever possible. Attribution supports future efforts to release other data. It also reduces the amount of ‘orphaned data’, helping retain links to authoritative sources. Give attribution to Tate Make sure that others are aware of the rights status of Tate and are aware of these guidelines by keeping intact links to the Creative Commons Zero Public Domain Dedication. If for technical or other reasons you cannot include all the links to all sources of the Metadata and rights information directly with the Metadata, you should consider including them separately, for example in a separate document that is distributed with the Metadata or dataset. If for technical or other reasons you cannot include all the links to all sources of the Metadata and rights information, you may consider linking only to the Metadata source on Tate’s website, where all available sources and rights information can be found, including in machine readable formats. Metadata is dynamic When working with Metadata obtained from Tate, please be aware that this Metadata is not static. It sometimes changes daily. Tate continuously updates its Metadata in order to correct mistakes and include new and additional information. Museum collections are under constant study and research, and new information is frequently added to objects in the collection. Mention your modifications of the Metadata and contribute your modified Metadata back Whenever you transform, translate or otherwise modify the Metadata, make it clear that the resulting Metadata has been modified by you. If you enrich or otherwise modify Metadata, consider publishing the derived Metadata without reuse restrictions, preferably via the Creative Commons Zero Public Domain Dedication. Be responsible Ensure that you do not use the Metadata in a way that suggests any official status or that Tate endorses you or your use of the Metadata, unless you have prior permission to do so. Ensure that you do not mislead others or misrepresent the Metadata or its sources Ensure that your use of the Metadata does not breach any national legislation based thereon, notably concerning (but not limited to) data protection, defamation or copyright. Please note that you use the Metadata at your own risk. Tate offers the Metadata as-is and makes no representations or warranties of any kind concerning any Metadata published by Tate. The writers of these guidelines are deeply indebted to the Smithsonian Cooper-Hewitt, National Design Museum; and Europeana.
Data from: Hall-of-Apps: The Top Android Apps Metadata Archive
zenodo.org
bz2, zip
Updated Mar 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Bello-Jiménez; Laura Bello-Jiménez; Camilo Escobar-Velásquez; Camilo Escobar-Velásquez; Anamaria Mojica-Hanke; Anamaria Mojica-Hanke; Santiago Cortés-Fernandéz; Santiago Cortés-Fernandéz; Mario Linares-Vásquez; Mario Linares-Vásquez (2020). Hall-of-Apps: The Top Android Apps Metadata Archive [Dataset]. http://doi.org/10.5281/zenodo.3712249
Explore at:
zip, bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.3712249
Dataset updated
Mar 20, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Laura Bello-Jiménez; Laura Bello-Jiménez; Camilo Escobar-Velásquez; Camilo Escobar-Velásquez; Anamaria Mojica-Hanke; Anamaria Mojica-Hanke; Santiago Cortés-Fernandéz; Santiago Cortés-Fernandéz; Mario Linares-Vásquez; Mario Linares-Vásquez
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The amount of Android apps available for download is constantly increasing, exerting a continuous pressure on developers to publish outstanding apps. Google Play (GP) is the default distribution channel for Android apps, which provides mobile app users with metrics to identify and report apps quality such as rating, amount of downloads, previous users comments, etc. In addition to those metrics, GP presents a set of top charts that highlight the outstanding apps in different categories. Both metrics and top app charts help developers to identify whether their development decisions are well valued by the community. Therefore, app presence in these top charts is a valuable information when understanding the features of top-apps. In this paper we present Hall-of-Apps, a dataset containing top charts' apps metadata extracted (weekly) from GP, for 4 different countries, during 30 weeks. The data is presented as (i) raw HTML files, (ii) a MongoDB database with all the information contained in app's HTML files (e.g., app description, category, general rating, etc.), and (iii) data visualizations built with the D3.js framework. A first characterization of the data along with the urls to retrieve it can be found in our online appendix: https://thesoftwaredesignlab.github.io/hall-of-apps-tools/
a
NCDOT Mitigation Site Points Shapefile
hub.arcgis.com
Updated Jul 18, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
North Carolina Department of Transportation (2012). NCDOT Mitigation Site Points Shapefile [Dataset]. https://hub.arcgis.com/datasets/c12d48f901fc4fddb13d90572d114433
Explore at:
Dataset updated
Jul 18, 2012
Dataset authored and provided by
North Carolina Department of Transportation
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Please read all of the information below before downloading and using this file. Overview: The purpose of this file is to allow NCDOT employees to track and locate areas that need to be preserved and/or maintained for mitigation credit as part of various permits. They include projects built both off- and onsite throughout the state, as well as projects done as full delivery from consultants and projects partially built or managed by other agencies (e.g. NC Ecosystem Enhancement Program or EEP). The sites in this service are only a portion of the known sites in the state, as the database they were pulled from is a work in progress. These files should not be used or cited in official documents. Feel free to contact us regarding specific sites as we may have more particular information available. We also ask that any information you may have on any sites that are missing data or are omitted be shared with us so we can improve our database. You can access monitoring reports and permits for some sites and projects by clicking the link in the “NCDOT Permits and Monitoring Reports” field (“hypWeblink” from the query results view) and navigating the appropriate page. Full metadata should be included with a download of this file. If not, please contact ddjohnson[at]ncdot.gov and a copy will be provided. You may also download a pdf of the metadata here. We ask that this file not be distributed without metadata. You can find a map containing these data here here. Known Issues: Site Boundaries – The majority of these sites do have a corresponding boundary, and its source would be denoted in the “Boundary Source” field (“BoundSrc” in the query results view). Due to data collection and conversion limitations, we cannot guarantee the accuracy of site boundaries. To assist with gauging the degree of accuracy, the "Boundary Source" field can tell you where the boundary originated. However, it should be noted that even boundaries taken from surveys can misrepresent the site if the boundary shifted during the conversion from CAD formats. We are in the process of reviewing the information we have and making further documentation of available parcel and conservation easement data to cut down on uncertainty where possible. Some locations have been taken from files provided by the Ecosystem Enhancement Program (as noted in the "Boundary Source" field). EEP quality control/quality assurance is on-going. Please contact EEP for the most recent information about specific project areas. Sites whose property documents have been collected and published by EEP will have a link in the “EEP Property Documents” field (“EEP_Folio” from the query results). Status attribute - This refers to the last-known status of a project, which may be only current as of the date the project was entered in the database. A project having a boundary does not mean that it has been completed or that it will be, so be sure to find the current status of any project before making any decisions regarding the area. River Basin - The river basin names and 8-digit hydrologic unit codes (HUCs; called CU or catalog units in the attributes) in these files may differ from what some organizations are using. These are from a boundary file released by CGIA in 2008. Contact Dave Johnson with any questions related to the contents of this service: ddjohnson[at]ncdot.gov 919-707-6130
d
Dataset metadata of known Dataverse installations
search.dataone.org
dataverse.harvard.edu
+1more
Updated Nov 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gautier, Julian (2023). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/DVN/DCDKZQ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/DCDKZQ
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Gautier, Julian
Description
This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected ... Visit https://dataone.org/datasets/sha256%3Ad27d528dae8cf01e3ea915f450426c38fd6320e8c11d3e901c43580f997a3146 for complete metadata about this dataset.
d
Data from: ESS-DIVE Reporting Format for File-level Metadata
search.dataone.org
data.ess-dive.lbl.gov
+2more
Updated Oct 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas (2021). ESS-DIVE Reporting Format for File-level Metadata [Dataset]. https://search.dataone.org/view/ess-dive-a95fac98da3b481-20210928T175904096
Explore at:
Dataset updated
Oct 8, 2021
Dataset provided by
ESS-DIVE
Authors
Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas
Time period covered
Jan 1, 2020 - Sep 30, 2021
Description
The ESS-DIVE reporting format for file-level metadata (FLMD) provides granular information at the data file level to describe the contents, scope, and structure of the data file to enable comparison of data files within a data package. The FLMD are fully consistent with and augment the metadata collected at the data package level. We developed the FLMD template based on a review of a small number of existing FLMD in use at other agencies and repositories with valuable input from the Environmental Systems Science (ESS) Community. Also included is a template for a CSV Data Dictionary where users can provide file-level information about the contents of a CSV data file (e.g., define column names, provide units). Files are in .csv, .xlsx, and .md. Templates are in both .csv and .xlsx (open with e.g. Microsoft Excel, LibreOffice, or Google Sheets). Open the .md files by downloading and using a text editor (e.g. Notepad or TextEdit). Though we provide Excel templates for the file-level metadata reporting format, our instructions encourage users to 'Save the FLMD template as a CSV following the CSV Reporting Format guidance'. In addition, we developed the ESS-DIVE File Level Metadata Extractor which is a lightweight python script that can extract some FLMD fields following the recommended FLMD format and structure.
w
Meta-data for data.gov.uk datasets
data.wu.ac.at
api, csv, html, json +1
Updated May 31, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government Digital Service (2018). Meta-data for data.gov.uk datasets [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/YjVlNGJlN2UtNmMzNi00MWI2LTlkNDgtY2FlMTk1YzMyZTM0
Explore at:
html, json, api, csv, xmlAvailable download formats
Dataset updated
May 31, 2018
Dataset provided by
Government Digital Service
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
A dataset of all the meta-data for all of the datasets available through the data.gov.uk service. This is provided as a zipped CSV or JSON file. It is published nightly.

Updates: 27 Sep 2017: we've moved all the previous dumps to an S3 bucket at https://dgu-ckan-metadata-dumps.s3-eu-west-1.amazonaws.com/ - This link is now listed here as a data file.

From 13/10/16 we added .v2.jsonl dump, which is set to replace the .json dump (which will be discontinued after a 3 month transition). This is produced using 'ckanapi dump'. It provides an enhanced version of each dataset ('validated', or what you get from package_show in CKAN API v3 - the old json was the unvalidated version). This now includes full details of the organization the dataset is in, rather than just the owner_id. Plus it includes the results of the archival & qa for each dataset and resource, showing whether the link is broken, detected format and stars of openness. It also benefits from being json lines http://jsonlines.org/ format, so you don't need to load the whole thing into memory to parse the json - just a line at a time.

On 12/1/2015 the organizations of the CSV was changed:

Before this date, each dataset was one line, and resources added as numbered columns. Since a dataset may have up to 300 resources, it ends up with 1025 columns, which is wider than many versions of Excel and Libreoffice will open. And the uncompressed size of 170Mb is more than most will deal with too. It is suggested you load it into a database, ahandle it with a python or ruby script, or use tools such as Refine or Google Fusion Tables.

After this date, the datasets are provided in one CSV and resources in another. On occasions that you want to join them, you can join them using the (dataset) "Name" column. These are now manageable in spreadsheet software.

You can also use the standard CKAN API if you want to search or get a small section of the data. Please respect the traffic limits in the API: http://data.gov.uk/terms-and-conditions
F
Dutch Shopping List OCR Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Dutch Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/dutch-shopping-list-ocr-image-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Dutch Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Dutch language.
Dataset Contain & Diversity:
Containing more than 2000 images, this Dutch OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.
To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Dutch text.
The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.
All these shopping lists were written and images were captured by native Dutch people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.
Metadata:
In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.
This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Dutch text recognition models.
Update & Custom Collection:
We are committed to continually expanding this dataset by adding more images with the help of our native Dutch crowd community.
If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.
Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.
License:
This image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Dutch language. Your journey to improved language understanding and processing begins here.
Data from: Data Catalog Project - A Browsable, Searchable, Metadata System
osti.gov
Updated Apr 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Catalog Project - A Browsable, Searchable, Metadata System [Dataset]. https://www.osti.gov/biblio/1887857
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/5EZSZC
Dataset updated
Apr 9, 2022
Dataset provided by
United States Department of Energyhttp://energy.gov/
Office of Sciencehttp://www.er.doe.gov/
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center
Description
Modern experiments are typically conducted by large, extended, where researchers rely on other team members to produce much of the data they use. The experiments record very large numbers of measurements which can be difficult for users to find, access and understand. We are developing a system for users to annotate their data products with structured metadata, providing data consumers with a discoverable, browsable data index. Machine understandable metadata captures the underlying semantics of the recorded data, which can then be consumed by both programs, and interactively by users. Collaborators can use these metadata to select and understand recorded measurements.
Z
OpenCitations Meta RDF dataset of identifiers metadata and its provenance...
data.niaid.nih.gov
zenodo.org
Updated Apr 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenCitations (2024). OpenCitations Meta RDF dataset of identifiers metadata and its provenance information [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10936285
Explore at:
Dataset updated
Apr 6, 2024
Dataset authored and provided by
OpenCitations
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset is a specialized subset of the OpenCitations Meta RDF data, focusing exclusively on data related to identifiers (http://purl.org/spar/datacite/Identifier) of bibliographic resources. It contains all the metadata and its provenance information, structured specifically around identifiers, in JSON-LD format.

The inner folders are named through the supplier prefix of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to 06*0).

After that, the folders have numeric names, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the zipped RDF data.

At the same level, additional folders containing the provenance are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called prov, also in zipped JSON-LD format.

For example, data related to the entity is located in the folder /id/06250/10000/1000/1000.zip, while information about provenance in /id/06250/10000/1000/prov/se.zip

Additional information about OpenCitations Meta at the official webpage.
e
Location Identifiers, Metadata, and Map for Field Measurements at the East...
knb.ecoinformatics.org
Updated Oct 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charuleka Varadharajan; Zarine Kakalia; Madison Burrus; Dylan O'Ryan; Erek Alper; Jillian Banfield; Max Berkelhammer; Curtis Beutler; Eoin Brodie; Wendy Brown; Mariah S. Carbone; Rosemary Carroll; Danielle Christianson; Chunwei Chou; Robert Crystal-Ornelas; K. Dana Chadwick; John Christensen; Baptiste Dafflon; Hesham Elbashandy; Brian J. Enquist; Patricia Fox; David Gochis; Matthew Henderson; Douglas Johnson; Lara Kueppers; Paula Matheus Carnevali; Alexander Newman; Thomas Powell; Kamini Singha; Patrick Sorensen; Matthias Sprenger; Tetsu Tokunaga; Roelof Versteeg; Mike Wilkins; Kenneth Williams; Marshall Worsham; Catherine Wong; Yuxin Wu; Deborah Agarwal (2022). Location Identifiers, Metadata, and Map for Field Measurements at the East River Watershed, Colorado, USA [Dataset]. https://knb.ecoinformatics.org/view/ess-dive-a9de0ff71198b5e-20220613T224226863
Explore at:
Dataset updated
Oct 27, 2022
Dataset provided by
ESS-DIVE
Authors
Charuleka Varadharajan; Zarine Kakalia; Madison Burrus; Dylan O'Ryan; Erek Alper; Jillian Banfield; Max Berkelhammer; Curtis Beutler; Eoin Brodie; Wendy Brown; Mariah S. Carbone; Rosemary Carroll; Danielle Christianson; Chunwei Chou; Robert Crystal-Ornelas; K. Dana Chadwick; John Christensen; Baptiste Dafflon; Hesham Elbashandy; Brian J. Enquist; Patricia Fox; David Gochis; Matthew Henderson; Douglas Johnson; Lara Kueppers; Paula Matheus Carnevali; Alexander Newman; Thomas Powell; Kamini Singha; Patrick Sorensen; Matthias Sprenger; Tetsu Tokunaga; Roelof Versteeg; Mike Wilkins; Kenneth Williams; Marshall Worsham; Catherine Wong; Yuxin Wu; Deborah Agarwal
Time period covered
Sep 14, 2015 - Jun 13, 2022
Area covered

Description
This dataset contains identifiers, metadata, and a map of the locations where field measurements have been conducted at the East River Community Observatory located in the Upper Colorado River Basin, United States. This is version 3.0 of the dataset and replaces the prior version 2.0, which should no longer be used (see below for details on changes between the versions). Dataset description: The East River is the primary field site of the Watershed Function Scientific Focus Area (WFSFA) and the Rocky Mountain Biological Laboratory. Researchers from several institutions generate highly diverse hydrological, biogeochemical, climate, vegetation, geological, remote sensing, and model data at the East River in collaboration with the WFSFA. Thus, the purpose of this dataset is to maintain an inventory of the field locations and instrumentation to provide information on the field activities in the East River and coordinate data collected across different locations, researchers, and institutions. The dataset contains (1) a README file with information on the various files, (2) three csv files describing the metadata collected for each surface point location, plot and region registered with the WFSFA, (3) csv files with metadata and contact information for each surface point location registered with the WFSFA, (4) a csv file with with metadata and contact information for plots, (5) a csv file with metadata for geographic regions and sub-regions within the watershed, (6) a compiled xlsx file with all the data and metadata which can be opened in Microsoft Excel, (7) a kml map of the locations plotted in the watershed which can be opened in Google Earth, (8) a jpeg image of the kml map which can be viewed in any photo viewer, and (9) a zipped file with the registration templates used by the SFA team to collect location metadata. The zipped template file contains two csv files with the blank templates (point and plot), two csv files with instructions for filling out the location templates, and one compiled xlsx file with the instructions and blank templates together. Additionally, the templates in the xlsx include drop down validation for any controlled metadata fields. Persistent location identifiers (Location_ID) are determined by the WFSFA data management team and are used to track data and samples across locations. Dataset uses: This location metadata is used to update the Watershed SFA’s publicly accessible Field Information Portal (an interactive field sampling metadata exploration tool; https://wfsfa-data.lbl.gov/watershed/), the kml map file included in this dataset, and other data management tools internal to the Watershed SFA team. Version Information: The latest version of this dataset publication is version 3.0. The latest version contains a breaking change to the Location Map (EastRiverCommunityObservatory_Map_v3_0_20220613.kml), If you had previously downloaded the map file prior to version 3.0, it will no longer work. Use the updated Location Map (EastRiverCommunityObservatory_Map_v3_0_20220613.kml) in this version of the dataset. This version also contains a total of 51 new point locations, 8 new plot locations, and 1 new geographic region. Additionally, it corrects inconsistencies in existing metadata. Refer to methods for further details on the version history. This dataset will be updated on a periodic basis with new measurement location information. Researchers interested in having their East River measurement locations added in this list should reach out to the WFSFA data management team at wfsfa-data@googlegroups.com. Acknowledgements: Please cite this dataset if using any of the location metadata in other publications or derived products. If using the location metadata for the NEON hyperspectral campaign, additionally cite Chadwick et al. (2020). doi:10.15485/1618130.
Data from: Australian Marine Environmental Data: Descriptions and Metadata
data.gov.au
datadiscoverystudio.org
+2more
pdf
Updated Jun 24, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geoscience Australia (2017). Australian Marine Environmental Data: Descriptions and Metadata [Dataset]. https://data.gov.au/data/dataset/australian-marine-environmental-data-descriptions-and-metadata
Explore at:
pdfAvailable download formats
Dataset updated
Jun 24, 2017
Dataset provided by
Geoscience Australiahttp://ga.gov.au/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Australia
Description
This report provides detailed descriptions (metadata) of 45 Australian marine environmental datasets that have been generated and collated by the Marine Biodiversity Hub as part of Theme 3 - National Ecosystems Knowledge, Project 1 - Shelf and Canyon Ecosystems Functions and Processes. The report also includes a map for each dataset to illustrate coverage and general spatial structure. The datasets contain both marine environmental and biological variables from diverse data sources and include both new and updated information. Among them, the national bathymetry grid and derived products, seabed sediment grids, seabed exposure (GEOMACS) parameters, water quality data, the national canyon dataset and connectivity layers were produced by Geoscience Australia. Other environmental and biological datasets are the outputs of oceanographic models and collections of various governmental and research organisations.

These datasets are important for the success of marine biodiversity research in Theme 3 Project 1 in that they describe key aspects of Australian marine physical, geochemical and biological environments. The physical and geochemical datasets not only characterise the static seabed features but also capture the temporal variation and three-dimensional interactions within marine ecosystems. The biological datasets represent a unique collection of fish and megafauna data available at the national scale. Together, these marine environmental datasets enhance our understanding of large-scale ecological processes driving marine biodiversity patterns. However, we should be aware of the uncertainties and potential errors exist in these datasets due to limitations of data collection and processing methods. Data quality issues of individual datasets have been documented in this report where possible.

You can also purchase hard copies of Geoscience Australia data and other products at http://www.ga.gov.au/products-services/how-to-order-products/sales-centre.html
Data from: Knowledge graphs for seismic data and metadata
data.niaid.nih.gov
datadryad.org
zip
Updated Sep 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Davis; Cassandra Hunt (2023). Knowledge graphs for seismic data and metadata [Dataset]. http://doi.org/10.6078/D1P430
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6078/D1P430
Dataset updated
Sep 19, 2023
Dataset provided by
University of California, San Diego
Relational AI
Authors
William Davis; Cassandra Hunt
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The increasing scale and diversity of seismic data, and the growing role of big data in seismology, has raised interest in methods to make data exploration more accessible. This paper presents the use of knowledge graphs (KGs) for representing seismic data and metadata to improve data exploration and analysis, focusing on usability, flexibility, and extensibility. Using constraints derived from domain knowledge in seismology, we define semantic models of seismic station and event information used to construct the KGs. Our approach utilizes the capability of KGs to integrate data across many sources and diverse schema formats. We use schema-diverse, real-world seismic data to construct KGs with millions of nodes, and illustrate potential applications with three big-data examples. Our findings demonstrate the potential of KGs to enhance the efficiency and efficacy of seismological workflows in research and beyond, indicating a promising interdisciplinary future for this technology. Methods The data here consists of, and was collected from:

Station metadata, in StationXML format, acquired from IRIS DMC using the fdsnws-station webservice (https://service.iris.edu/fdsnws/station/1/). Earthquake event data, in NDK format, acquired from the Global Centroid-Moment Tensor (GCMT) catalog webservice (https://www.globalcmt.org) [1,2]. Earthquake event data, in CSV format, acquired from the USGS earthquake catalog webservice (https://doi.org/10.5066/F7MS3QZH) [3].

The format of the data is described in the README. In addition, a complete description of the StationXML, NDK, and USGS file formats can be found at https://www.fdsn.org/xml/station/, https://www.ldeo.columbia.edu/~gcmt/projects/CMT/catalog/allorder.ndk_explained, and https://earthquake.usgs.gov/data/comcat/#event-terms, respectively. Also provided are conversions from NDK and StationXML file formats into JSON format. References: [1] Dziewonski, A. M., Chou, T. A., & Woodhouse, J. H. (1981). Determination of earthquake source parameters from waveform data for studies of global and regional seismicity. Journal of Geophysical Research: Solid Earth, 86(B4), 2825-2852. [2] Ekström, G., Nettles, M., & Dziewoński, A. M. (2012). The global CMT project 2004–2010: Centroid-moment tensors for 13,017 earthquakes. Physics of the Earth and Planetary Interiors, 200, 1-9. [3] U.S. Geological Survey, Earthquake Hazards Program, 2017, Advanced National Seismic System (ANSS) Comprehensive Catalog of Earthquake Events and Products: Various, https://doi.org/10.5066/F7MS3QZH.
N
EDI Data Portal
catalog.newmexicowaterdata.org
html
Updated Oct 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Data Initiative (2023). EDI Data Portal [Dataset]. https://catalog.newmexicowaterdata.org/dataset/edi-data-portal
Explore at:
htmlAvailable download formats
Dataset updated
Oct 23, 2023
Dataset provided by
Environmental Data Initiative
Description
The EDI Data Portal contains environmental and ecological data packages contributed by a number of participating organizations. Data providers make every effort to release data in a timely fashion and with attention to accurate, well-designed and well-documented data. To understand data fully, please read the associated metadata and contact data providers if you have any questions. Data may be used in a manner conforming with the license information found in the ‚ÄúIntellectual Rights‚Äù section of the data package metadata or defaults to the EDI Data Policy. The Environmental Data Initiative shall not be liable for any damages resulting from misinterpretation or misuse of the data or metadata.
F
Italian Product Image OCR Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Italian Product Image OCR Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/italian-product-image-ocr-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Italian Product Image Dataset - a diverse and comprehensive collection of images meticulously curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Italian language.
Dataset Contain & Diversity:
Containing a total of 2000 images, this Italian OCR dataset offers diverse distribution across different types of front images of Products. In this dataset, you'll find a variety of text that includes product names, taglines, logos, company names, addresses, product content, etc. Images in this dataset showcase distinct fonts, writing formats, colors, designs, and layouts.
To ensure the diversity of the dataset and to build a robust text recognition model we allow limited (less than five) unique images from a single resource. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Italian text.
Images have been captured under varying lighting conditions – both day and night – along with different capture angles and backgrounds, to build a balanced OCR dataset. The collection features images in portrait and landscape modes.
All these images were captured by native Italian people to ensure the text quality, avoid toxic content and PII text. We used the latest iOS and Android mobile devices above 5MP cameras to click all these images to maintain the image quality. In this training dataset images are available in both JPEG and HEIC formats.
Metadata:
Along with the image data, you will also receive detailed structured metadata in CSV format. For each image, it includes metadata like image orientation, county, language, and device information. Each image is properly renamed corresponding to the metadata.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Italian text recognition models.
Update & Custom Collection:
We're committed to expanding this dataset by continuously adding more images with the assistance of our native Italian crowd community.
If you require a custom product image OCR dataset tailored to your guidelines or specific device distribution, feel free to contact us. We're equipped to curate specialized data to meet your unique needs.
Furthermore, we can annotate or label the images with bounding box or transcribe the text in the image to align with your specific project requirements using our crowd community.
License:
This Image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage the power of this product image OCR dataset to elevate the training and performance of text recognition, text detection, and optical character recognition models within the realm of the Italian language. Your journey to enhanced language understanding and processing starts here.
OpenAIRE Graph Dump
zenodo.org
tar
Updated Aug 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paolo Manghi; Paolo Manghi; Claudio Atzori; Claudio Atzori; Alessia Bardi; Alessia Bardi; Miriam Baglioni; Miriam Baglioni; Jochen Schirrwagen; Harry Dimitropoulos; Sandro La Bruzzo; Sandro La Bruzzo; Ioannis Foufoulas; Andrea Mannocci; Andrea Mannocci; Marek Horst; Andreas Czerniak; Andreas Czerniak; Katerina Iatropoulou; Argiro Kokogiannaki; Argiro Kokogiannaki; Michele De Bonis; Michele De Bonis; Michele Artini; Antonis Lempesis; Antonis Lempesis; Alexandros Ioannidis; Natalia Manola; Natalia Manola; Pedro Principe; Pedro Principe; Thanasis Vergoulis; Thanasis Vergoulis; Serafeim Chatzopoulos; Serafeim Chatzopoulos; Dimitris Pierrakos; Jochen Schirrwagen; Harry Dimitropoulos; Ioannis Foufoulas; Marek Horst; Katerina Iatropoulou; Michele Artini; Alexandros Ioannidis; Dimitris Pierrakos (2023). OpenAIRE Graph Dump [Dataset]. http://doi.org/10.5281/zenodo.7488618
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7488618
Dataset updated
Aug 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Paolo Manghi; Paolo Manghi; Claudio Atzori; Claudio Atzori; Alessia Bardi; Alessia Bardi; Miriam Baglioni; Miriam Baglioni; Jochen Schirrwagen; Harry Dimitropoulos; Sandro La Bruzzo; Sandro La Bruzzo; Ioannis Foufoulas; Andrea Mannocci; Andrea Mannocci; Marek Horst; Andreas Czerniak; Andreas Czerniak; Katerina Iatropoulou; Argiro Kokogiannaki; Argiro Kokogiannaki; Michele De Bonis; Michele De Bonis; Michele Artini; Antonis Lempesis; Antonis Lempesis; Alexandros Ioannidis; Natalia Manola; Natalia Manola; Pedro Principe; Pedro Principe; Thanasis Vergoulis; Thanasis Vergoulis; Serafeim Chatzopoulos; Serafeim Chatzopoulos; Dimitris Pierrakos; Jochen Schirrwagen; Harry Dimitropoulos; Ioannis Foufoulas; Marek Horst; Katerina Iatropoulou; Michele Artini; Alexandros Ioannidis; Dimitris Pierrakos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The OpenAIRE Graph is exported as several dumps, so you can download the parts you are interested into.

publication_[part].tar: metadata records about research literature (includes types of publications listed here)
dataset_[part].tar: metadata records about research data (includes the subtypes listed here)
software.tar: metadata records about research software (includes the subtypes listed here)
otherresearchproduct_[part].tar: metadata records about research products that cannot be classified as research literature, data or software (includes types of products listed here)
organization.tar: metadata records about organizations involved in the research life-cycle, such as universities, research organizations, funders.
datasource.tar: metadata records about data sources whose content is available in the OpenAIRE Graph. They include institutional and thematic repositories, journals, aggregators, funders' databases.
project.tar: metadata records about project grants.
relation_[part].tar: metadata records about relations between entities in the graph.
communities_infrastructures.tar: metadata records about research communities and research infrastructures

Each file is a tar archive containing gz files, each with one json per line. Each json is compliant to the schema available at http://doi.org/10.5281/zenodo.7492151. The documentation for the model is available at https://graph.openaire.eu/docs/data-model/

Learn more about the OpenAIRE Graph at https://graph.openaire.eu.

Discover the graph's content on OpenAIRE EXPLORE and our API for developers.
f
Elements for DataCite Metadata Schema.
plos.figshare.com
xls
Updated Apr 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisa R. Johnston; Alicia Hofelich Mohr; Joel Herndon; Shawna Taylor; Jake R. Carlson; Lizhao Ge; Jennifer Moore; Jonathan Petters; Wendy Kozlowski; Cynthia Hudson Vitale (2024). Elements for DataCite Metadata Schema. [Dataset]. http://doi.org/10.1371/journal.pone.0302426.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302426.t004
Dataset updated
Apr 25, 2024
Dataset provided by
PLOS ONE
Authors
Lisa R. Johnston; Alicia Hofelich Mohr; Joel Herndon; Shawna Taylor; Jake R. Carlson; Lizhao Ge; Jennifer Moore; Jonathan Petters; Wendy Kozlowski; Cynthia Hudson Vitale
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Research data sharing has become an expected component of scientific research and scholarly publishing practice over the last few decades, due in part to requirements for federally funded research. As part of a larger effort to better understand the workflows and costs of public access to research data, this project conducted a high-level analysis of where academic research data is most frequently shared. To do this, we leveraged the DataCite and Crossref application programming interfaces (APIs) in search of Publisher field elements demonstrating which data repositories were utilized by researchers from six academic research institutions between 2012–2022. In addition, we also ran a preliminary analysis of the quality of the metadata associated with these published datasets, comparing the extent to which information was missing from metadata fields deemed important for public access to research data. Results show that the top 10 publishers accounted for 89.0% to 99.8% of the datasets connected with the institutions in our study. Known data repositories, including institutional data repositories hosted by those institutions, were initially lacking from our sample due to varying metadata standards and practices. We conclude that the metadata quality landscape for published research datasets is uneven; key information, such as author affiliation, is often incomplete or missing from source data repositories and aggregators. To enhance the findability, interoperability, accessibility, and reusability (FAIRness) of research data, we provide a set of concrete recommendations that repositories and data authors can take to improve scholarly metadata associated with shared datasets.
Metadata for FIA P3 data on lichen
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Metadata for FIA P3 data on lichen [Dataset]. https://catalog.data.gov/dataset/metadata-for-fia-p3-data-on-lichen
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This data describe the abundance of individual lichen species across the U.S. as recorded in the Forest Health and Monitoring dataset of the Forest Inventory and Analysis program (i.e. Phase 3 plots). This dataset is not publicly accessible because: These data are already housed on the USFS Forest Inventory and Analysis site (see below). It can be accessed through the following means: The lichen data for this product are from the USDA Forest Services (USFS) Forest Inventory and Analysis (FIA) Phase 3 (P3) dataset - Forest Health and Monitoring. The metadata and database description for the FIA-P3 is here (https://www.fia.fs.fed.us/library/database-documentation/). The data itself is located at the USFS Data Mart here (https://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html) in two files: “LICHEN_PLOT_SUMMARY.zip,” and “LICHEN_VISIT.zip.” Point of contact: Linda Geiser, lgeiser@fs.fed.us. Format: The data are in .csv format.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Open Data Portal Catalogue Metadata [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/domain-dataset0/

Open Data Portal Catalogue Metadata

Explore at:

json, excel, csvAvailable download formats

Dataset updated

Mar 26, 2025

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IntroductionA special dataset that contains metadata for all the published datasets. Dataset profile fields conform to Dublin Core standard.Other

You can download metadata for individual datasets, via the links provided in descriptions.

Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/

Clear search

Close search

Google apps

Main menu

Open Data Portal Catalogue Metadata

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Collections database

Collections database | gimi9.com

Data from: Hall-of-Apps: The Top Android Apps Metadata Archive

NCDOT Mitigation Site Points Shapefile

Dataset metadata of known Dataverse installations

Data from: ESS-DIVE Reporting Format for File-level Metadata

Meta-data for data.gov.uk datasets

Dutch Shopping List OCR Image Dataset

What’s Included

Data from: Data Catalog Project - A Browsable, Searchable, Metadata System

OpenCitations Meta RDF dataset of identifiers metadata and its provenance...

Location Identifiers, Metadata, and Map for Field Measurements at the East...

Data from: Australian Marine Environmental Data: Descriptions and Metadata

Data from: Knowledge graphs for seismic data and metadata

EDI Data Portal

Italian Product Image OCR Dataset

What’s Included

OpenAIRE Graph Dump

Elements for DataCite Metadata Schema.

Metadata for FIA P3 data on lichen

Open Data Portal Catalogue Metadata