100+ datasets found

P
Meta-Dataset Dataset
paperswithcode.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eleni Triantafillou; Tyler Zhu; Vincent Dumoulin; Pascal Lamblin; Utku Evci; Kelvin Xu; Ross Goroshin; Carles Gelada; Kevin Swersky; Pierre-Antoine Manzagol; Hugo Larochelle, Meta-Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/meta-dataset
Explore at:
Authors
Eleni Triantafillou; Tyler Zhu; Vincent Dumoulin; Pascal Lamblin; Utku Evci; Kelvin Xu; Ross Goroshin; Carles Gelada; Kevin Swersky; Pierre-Antoine Manzagol; Hugo Larochelle
Description
The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:

ILSVRC-2012 (the ImageNet dataset, consisting of natural images with 1000 categories) Omniglot (hand-written characters, 1623 classes) Aircraft (dataset of aircraft images, 100 classes) CUB-200-2011 (dataset of Birds, 200 classes) Describable Textures (different kinds of texture images with 43 categories) Quick Draw (black and white sketches of 345 different categories) Fungi (a large dataset of mushrooms with 1500 categories) VGG Flower (dataset of flower images with 102 categories), Traffic Signs (German traffic sign images with 43 classes) MSCOCO (images collected from Flickr, 80 classes).

All datasets except Traffic signs and MSCOCO have a training, validation and test split (proportioned roughly into 70%, 15%, 15%). The datasets Traffic Signs and MSCOCO are reserved for testing only.
metadata
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). metadata [Dataset]. https://catalog.data.gov/dataset/metadata-f2500
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The dataset consists of public domain acute and chronic toxicity and chemistry data for algal species. Data are accessible at: https://envirotoxdatabase.org/ Data include algal species, chemical identification, and the concentrations that do and do not affect algal growth.
Common Metadata Elements for Cataloging Biomedical Datasets
figshare.com
xlsx
Updated Jan 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Read (2016). Common Metadata Elements for Cataloging Biomedical Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.1496573.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1496573.v1
Dataset updated
Jan 20, 2016
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Kevin Read
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.
c
Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection)
crawlfeeds.com
csv, zip
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection) [Dataset]. https://crawlfeeds.com/datasets/movies-tv-shows-metadata-dataset-190k-records-horror-heavy-collection
Explore at:
zip, csvAvailable download formats
Dataset updated
Jun 22, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
This comprehensive dataset features detailed metadata for over 190,000 movies and TV shows, with a strong concentration in the Horror genre. It is ideal for entertainment research, machine learning models, genre-specific trend analysis, and content recommendation systems.

Each record contains rich information, making it perfect for streaming platforms, film industry analysts, or academic media researchers.

Primary Genre Focus: Horror

Use Cases:

Build movie recommendation systems or genre classifiers

Train NLP models on movie descriptions

Analyze Horror content trends over time

Explore box office vs. rating correlations

Enrich entertainment datasets with directorial and cast metadata
Dataset relating a study on Geospatial Open Data usage and metadata quality
zenodo.org
csv
Updated Jun 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfonso Quarati; Alfonso Quarati (2023). Dataset relating a study on Geospatial Open Data usage and metadata quality [Dataset]. http://doi.org/10.5281/zenodo.4584542
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4584542
Dataset updated
Jun 19, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alfonso Quarati; Alfonso Quarati
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Open Government Data portals (OGD) thanks to the presence of thousands of geo-referenced datasets, containing spatial information, are of extreme interest for any analysis or process relating to the territory. For this to happen, users must be enabled to access these datasets and reuse them. An element often considered hindering the full dissemination of OGD data is the quality of their metadata. Starting from an experimental investigation conducted on over 160,000 geospatial datasets belonging to six national and international OGD portals, this work has as its first objective to provide an overview of the usage of these portals measured in terms of datasets views and downloads. Furthermore, to assess the possible influence of the quality of the metadata on the use of geospatial datasets, an assessment of the metadata for each dataset was carried out, and the correlation between these two variables was measured. The results obtained showed a significant underutilization of geospatial datasets and a generally poor quality of their metadata. Besides, a weak correlation was found between the use and quality of the metadata, not such as to assert with certainty that the latter is a determining factor of the former.

The dataset consists of six zipped CSV files, containing the collected datasets' usage data, full metadata, and computed quality values, for about 160,000 geospatial datasets belonging to the three national and three international portals considered in the study, i.e. US (catalog.data.gov), Colombia (datos.gov.co), Ireland (data.gov.ie), HDX (data.humdata.org), EUODP (data.europa.eu), and NASA (data.nasa.gov).

Data collection occurred in the period: 2019-12-19 -- 2019-12-23.

The header for each CSV file is:

[ ,portalid,id,downloaddate,metadata,overallq,qvalues,assessdate,dviews,downloads,engine,admindomain]

where for each row (a portal's dataset) the following fields are defined as follows:

portalid: portal identifier

id: dataset identifier

downloaddate: date of data collection

overallq: overall quality values computed by applying the methodology presented in [1]

qvalues: json object containing the quality values computed for the 17 metrics presented in [1]

assessdate: date of quality assessment

dviews: number of total views for the dataset

downloads: number of total downloads for the dataset (made available only by the Colombia, HDX, and NASA portals)

engine: identifier of the supporting portal platform: 1(CKAN), 2 (Socrata)

admindomain: 1 (national), 3 (international)

metadata: the overall dataset's metadata downloaded via API from the portal according to the supporting platform schema

[1] Neumaier, S.; Umbrich, J.; Polleres, A. Automated Quality Assessment of Metadata Across Open Data Portals.J. Data and Information Quality2016,8, 2:1–2:29. doi:10.1145/2964909
metadata
kaggle.com
Updated Nov 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
limentian (2022). metadata [Dataset]. https://www.kaggle.com/datasets/limentian/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 14, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
limentian
Description
Dataset

This dataset was created by limentian

Contents
a
The Visual Genome Dataset v1.0 Metadata
academictorrents.com
bittorrent
Updated Jun 30, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Visual Genome Dataset v1.0 Metadata [Dataset]. https://academictorrents.com/details/ca98efc75a80278b795ce056fd4229c1bc6f229f
Explore at:
bittorrent(263326070)Available download formats
Dataset updated
Jun 30, 2016
Dataset authored and provided by
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li Jia-Li, David Ayman Shamma, Michael Bernstein, Li Fei-Fei
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li Jia-Li, David Ayman Shamma, Michael Bernstein, Li Fei-Fei image meta data (16.92 MB) region descriptions (988.18 MB) question answers (201.09 MB) objects (99.14 MB) attributes (174.97 MB) relationships (406.70 MB)
o
Metadata Catalogue
spenergynetworks.opendatasoft.com
csv, excel, json
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Metadata Catalogue [Dataset]. https://spenergynetworks.opendatasoft.com/explore/dataset/metadata-catalogue/
Explore at:
csv, json, excelAvailable download formats
Dataset updated
Jul 1, 2025
Description
A dataset containing the metadata for all openly published datasets on the SP Energy Networks Open Data Portal. All metadata conforms to the Dublin Core metadata standard - a set of 15 'core' elements. Download dataset metadata (JSON)If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above.
metadata
kaggle.com
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naoures Abidi (2024). metadata [Dataset]. https://www.kaggle.com/datasets/abidinawres/metadata/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Naoures Abidi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Naoures Abidi

Released under Apache 2.0

Contents
Data from: A metadata framework for electronic phenotypes
data.niaid.nih.gov
dataone.org
+1more
zip
Updated May 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Spotnitz; Nripendra Acharya; James J. Cimino; Shawn Murphy; Bahram Namjou-Khales; Nancy Crimmins; Theresa Walunas; Cong Liu; David Crosslin; Barbara Benoit; Elisabeth Rosenthal; Jennifer Pacheco; Anna Ostropolets; Harry Reyes Nieva; Jason Patterson; Lauren Richter; Tiffany Callahan; Ahmed Elhussein; Chao Pang; Krzysztof Kiryluk; Jordan Nestor; Atlas Khan; Sumit Mohan; Evan Minty; Wendy Chung; Wei-Qi Wei; Karthik Natarajan; Chunhua Weng (2023). A metadata framework for electronic phenotypes [Dataset]. http://doi.org/10.5061/dryad.rn8pk0ph3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.rn8pk0ph3
Dataset updated
May 1, 2023
Dataset provided by
Mass General Brighamhttp://www.partners.org/
University of Washington
University of Alabama at Birmingham
Columbia University Irving Medical Center
Cincinnati Children's Hospital Medical Center
University of Calgary
Vanderbilt University Medical Center
University of Northwestern
Tulane University
Northwestern University
Authors
Matthew Spotnitz; Nripendra Acharya; James J. Cimino; Shawn Murphy; Bahram Namjou-Khales; Nancy Crimmins; Theresa Walunas; Cong Liu; David Crosslin; Barbara Benoit; Elisabeth Rosenthal; Jennifer Pacheco; Anna Ostropolets; Harry Reyes Nieva; Jason Patterson; Lauren Richter; Tiffany Callahan; Ahmed Elhussein; Chao Pang; Krzysztof Kiryluk; Jordan Nestor; Atlas Khan; Sumit Mohan; Evan Minty; Wendy Chung; Wei-Qi Wei; Karthik Natarajan; Chunhua Weng
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
As many phenotyping algorithms are being created to support precision medicine or observational studies using electronic patient data, it is getting increasingly difficult to identify the right algorithm for the right task. A metadata framework promises to help curate phenotyping algorithms to facilitate more efficient and accurate retrieval. We recruited 20 researchers from two phenotyping communities, the eMERGE and the OHDSI communities, and used a mixed-methods approach to develop the metadata framework. Once we achieved a consensus of 39 metadata elements, we surveyed 47 new researchers from these communities to evaluate the utility of the metadata framework. Two researchers were also asked to use it to annotate eight type 2 diabetes mellitus phenotypes. The survey consisted of a series of multiple-choice questions, which allowed rating of the utility of each element on a scale of 1-5, and open-ended questions, which allowed for narrative responses. More than 90% of respondents rated metadata elements concerning phenotype definition and validation methods and metrics with a score of 4 or 5. Our thematic analysis of the respondents’ feedback indicates that the strengths of the metadata framework were its ability to capture rich descriptions, explicitness, compliance with data standards, comprehensiveness in validation metrics, and ability to enable cross-phenotype searches. Limitations were its complexity for data collection and entailed costs. Methods We used online third-party software (Qualtrics, Provo, UT) to collect the dataset. We performed statistical analyses in using R, version 4.1.1.
h
arxiv-metadata-dataset
huggingface.co
Updated Mar 31, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumuk Shashidhar (2022). arxiv-metadata-dataset [Dataset]. https://huggingface.co/datasets/sumuks/arxiv-metadata-dataset
Explore at:
Dataset updated
Mar 31, 2022
Authors
Sumuk Shashidhar
Description
sumuks/arxiv-metadata-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Z
Metadata of a Large Sonar and Stereo Camera Dataset Suitable for...
data.niaid.nih.gov
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Backe, Christian (2024). Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10373153
Explore at:
Dataset updated
Jul 8, 2024
Dataset provided by
Backe, Christian
Bande, Miguel
Cesar, Diego
Wehbe, Bilal
Pribbernow, Max
Shah, Nimish
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation

Introduction

This is a set of metadata describing a large dataset of synchronized sonar and stereo camera recordings, that were captured between August 2021 and September 2023 during the project DeeperSense (https://robotik.dfki-bremen.de/en/research/projects/deepersense/), as training data for Sonar-to-RGB image translation. Parts of the sensor data have been published (https://zenodo.org/records/7728089, https://zenodo.org/records/10220989). Due to the size of the sensor data corpus, it is currently impractical to make the entire corpus accessible online. Instead, this metadatabase serves as a relatively compact representation, allowing interested researchers to inspect the data, and select relevant portions for their particular use case, which will be made available on demand. This is an effort to comply with the FAIR principle A2 (https://www.go-fair.org/fair-principles/) that metadata shall be accessible, even when the base data is not immediately.

Locations and sensors

The sensor data was captured at four different locations, including one laboratory (Maritime Exploration Hall at DFKI RIC Bremen) and three field locations (Chalk Lake Hemmoor, Tank Wash Basin Neu-Ulm, Lake Starnberg). At all locations, a ZED camera and a Blueprint Oculus M1200d sonar were used. Additionally, a SeaVision camera was used at the Maritime Exploration Hall at DFKI RIC Bremen and at the Chalk Lake Hemmoor. The examples/ directory holds a typical output image for each sensor at each available location.

Data volume per session

Six data collection sessions were conducted. The table below presents an overview of the amount of data captured in each session:

Session dates Location Number of datasets Total duration of datasets [h] Total logfile size [GB] Number of images Total image size [GB]

2021-08-09 - 2021-08-12 Maritime Exploration Hall at DFKI RIC Bremen 52 10.8 28.8 389’047 88.1

2022-02-07 - 2022-02-08 Maritime Exploration Hall at DFKI RIC Bremen 35 4.4 54.1 629’626 62.3

2022-04-26 - 2022-04-28 Chalk Lake Hemmoor 52 8.1 133.6 1’114’281 97.8

2022-06-28 - 2022-06-29 Tank Wash Basin Neu-Ulm 42 6.7 144.2 824’969 26.9

2023-04-26 - 2023-04-27 Maritime Exploration Hall at DFKI RIC Bremen 55 7.4 141.9 739’613 9.6

2023-09-01 - 2023-09-02 Lake Starnberg 19 2.9 40.1 217’385 2.3

255 40.3 542.7 3’914’921 287.0

Data and metadata structure

Sensor data corpus

The sensor data corpus comprises two processing stages:

raw data streams stored in ROS bagfiles (aka logfiles),

camera and sonar images (aka datafiles) extracted from the logfiles.

The files are stored in a file tree hierarchy which groups them by session, dataset, and modality:

${session_key}/ ${dataset_key}/ ${logfile_name} ${modality_key}/ ${datafile_name}

A typical logfile path has this form:

2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ stereo_camera-zed-2023-09-02-15-06-07.bag

A typical datafile path has this form:

2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ zed_right/ 1693660038_368077993.jpg

All directory and file names, and their particles, are designed to serve as identifiers in the metadatabase. Their formatting, as well as the definitions of all terms, are documented in the file entities.json.

Metadatabase

The metadatabase is provided in two equivalent forms:

as a standalone SQLite (https://www.sqlite.org/index.html) database file metadata.sqlite for users familiar with SQLite,

as a collection of CSV files in the csv/ directory for users who prefer other tools.

The database file has been generated from the CSV files, so each database table holds the same information as the corresponding CSV file. In addition, the metadatabase contains a series of convenience views that facilitate access to certain aggregate information.

An entity relationship diagram of the metadatabase tables is stored in the file entity_relationship_diagram.png. Each entity, its attributes, and relations are documented in detail in the file entities.json

Some general design remarks:

For convenience, timestamps are always given in both a human-readable form (ISO 8601 formatted datetime strings with explicit local time zone), and as seconds since the UNIX epoch.

In practice, each logfile always contains a single stream, and each stream is stored always in a single logfile. Per database schema however, the entities stream and logfile are modeled separately, with a “many-streams-to-one-logfile” relationship. This design was chosen to be compatible with, and open for, data collections where a single logfile contains multiple streams.

A modality is not an attribute of a sensor alone, but of a datafile: Because a sensor is an attribute of a stream, and a single stream may be the source of multiple modalities (e.g. RGB vs. grayscale images from the same camera, or cartesian vs. polar projection of the same sonar output). Conversely, the same modality may originate from different sensors.

As a usage example, the data volume per session which is tabulated at the top of this document, can be extracted from the metadatabase with the following SQL query:

SELECT PRINTF( '%s - %s', SUBSTR(session_start, 1, 10), SUBSTR(session_end, 1, 10)) AS 'Session dates', location_name_english AS Location, number_of_datasets AS 'Number of datasets', total_duration_of_datasets_h AS 'Total duration of datasets [h]', total_logfile_size_gb AS 'Total logfile size [GB]', number_of_images AS 'Number of images', total_image_size_gb AS 'Total image size [GB]' FROM location JOIN session USING (location_id) JOIN ( SELECT session_id, COUNT(dataset_id) AS number_of_datasets, ROUND( SUM(dataset_duration) / 3600, 1) AS total_duration_of_datasets_h, ROUND( SUM(total_logfile_size) / 10e9, 1) AS total_logfile_size_gb FROM location JOIN session USING (location_id) JOIN dataset USING (session_id) JOIN view_dataset_total_logfile_size USING (dataset_id) GROUP BY session_id ) USING (session_id) JOIN ( SELECT session_id, COUNT(datafile_id) AS number_of_images, ROUND(SUM(datafile_size) / 10e9, 1) AS total_image_size_gb FROM session JOIN dataset USING (session_id) JOIN stream USING (dataset_id) JOIN datafile USING (stream_id) GROUP BY session_id ) USING (session_id) ORDER BY session_id;
I
Version values for DataCite dataset records
databank.illinois.edu
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Wickes, Version values for DataCite dataset records [Dataset]. http://doi.org/10.13012/B2IDB-4803136_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-4803136_V1
Authors
Elizabeth Wickes
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset was extracted from a set of metadata files harvested from the DataCite metadata store (https://search.datacite.org/ui) during December 2015. Metadata records for items with a resourceType of dataset were collected. 1,647,949 total records were collected. This dataset contains three files: 1) readme.txt: A readme file. 2) version-results.csv: A CSV file containing three columns: DOI, DOI prefix, and version text contents 3) version-counts.csv: A CSV file containing counts for unique version text content values.
Enterprise Metadata Repository (EMR)
catalog.data.gov
s.cnmilf.com
+1more
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Enterprise Metadata Repository (EMR) [Dataset]. https://catalog.data.gov/dataset/enterprise-metadata-repository-emr
Explore at:
Dataset updated
Jul 4, 2025
Dataset provided by
Social Security Administrationhttp://ssa.gov/
Description
Stores physical and logical information about relational databases and record structures to assist in data identification and management.
Dataset metadata of known Dataverse installations
search.datacite.org
dataverse.harvard.edu
+1more
Updated 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julian Gautier (2019). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/dvn/dcdkzq
Explore at:
Unique identifier
https://doi.org/10.7910/dvn/dcdkzq
Dataset updated
2019
Dataset provided by
DataCitehttps://www.datacite.org/
Harvard Dataverse
Authors
Julian Gautier
Description
This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected this data, 36 installations were running versions of the Dataverse software that allow depositors to choose a license or data use agreement from a dropdown menu in the dataset deposit form. For more information, see https://guides.dataverse.org/en/5.11.1/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations.csv file contains the metadata block names, field names and child field names (if the field is a compound field) of the 77 Dataverse installations' metadata blocks. The metadatablocks_from_most_known_dataverse_installations.csv file is useful for comparing each installation's dataset metadata model (the metadata fields and the metadata blocks that each installation uses). The CSV file was created using a Python script at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_csv_file_with_metadata_block_fields_of_all_installations.py, which takes as inputs the directories and files created by the get_dataset_metadata_of_all_installations.py script. Known errors The metadata of two datasets from one of the known installations could not be downloaded because the datasets' pages and metadata could not be accessed with the Dataverse APIs. About metadata blocks Read about the Dataverse software's metadata blocks system at http://guides.dataverse.org/en/latest/admin/metadatacustomization.html
c
Dataset Metadata Creation: Automatically generates CKAN dataset metadata...
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Dataset Metadata Creation: Automatically generates CKAN dataset metadata based on ArrayExpress data, reducing manual data entry and ensuring consistency. (inferred functionality) [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-arrayexpress
Explore at:
Dataset updated
Jun 4, 2025
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The arrayexpress extension for CKAN facilitates the import of data from the ArrayExpress database into a CKAN instance. This extension is designed to streamline the process of integrating ArrayExpress experiment data, a valuable resource for genomics and transcriptomics research, directly into a CKAN-based data portal. Due to limited documentation, specific functionalities are inferred to enhance data accessibility and promote efficient management of ArrayExpress datasets within CKAN. Key Features: ArrayExpress Data Import: Enables the import of experiment data from the ArrayExpress database into CKAN, providing access to valuable genomics and transcriptomics datasets. Dataset Metadata Creation: Automatically generates CKAN dataset metadata based on ArrayExpress data, reducing manual data entry and ensuring consistency. (inferred functionality) Streamlined Data Integration: Simplifies the integration process of ArrayExpress resources into CKAN, improving access to experiment-related information. (inferred functionality) Use Cases: Genomics Data Portals: Organizations managing data portals for genomics or transcriptomics research can use this extension to incorporate ArrayExpress data, increasing the breadth of available data and improving user access. Research Institutions: Research institutions can simplify data imports to share their ArrayExpress datasets with collaborators, ensuring data consistency and adherence to metadata standards. Technical Integration: The ArrayExpress extension integrates with CKAN by adding functionality to import and handle ArrayExpress data. While the exact integration points (plugins, API endpoints) aren't detailed in the provided documentation, the extension would likely use CKAN's plugin architecture to add data import capabilities, and the metadata schema may need to be adapted for compatibility (inferred integration). Benefits & Impact: By using the arrayexpress extension, organizations can improve the accessibility of ArrayExpress data within CKAN. It reduces the manual effort required to integrate experiment data and helps in maintaining a consistent and comprehensive data catalog for genomics and transcriptomics research (inferred integration).
d
US Restaurant POI dataset with metadata
datarade.ai
.csv
Updated Jul 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geolytica (2022). US Restaurant POI dataset with metadata [Dataset]. https://datarade.ai/data-products/us-restaurant-poi-dataset-with-metadata-geolytica
Explore at:
.csvAvailable download formats
Dataset updated
Jul 30, 2022
Dataset authored and provided by
Geolytica
Area covered
United States of America
Description
Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.

This is our process flow:

Our machine learning systems continuously crawl for new POI data Our geoparsing and geocoding calculates their geo locations Our categorization systems cleanup and standardize the datasets Our data pipeline API publishes the datasets on our data store

A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.

POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.

We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.

Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

Data samples may be downloaded at https://store.poidata.xyz/us
r
Metadata access dataset
redivis.com
Updated Aug 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Metadata access dataset [Dataset]. https://redivis.com/workflows/xe7m-278rbcqnv
Explore at:
Dataset updated
Aug 19, 2024
Description
null This dataset was created on Wed, 28 Jul 2021 20:29:32 GMT.
data.gov.au Dataset Ontology
data.gov.au
cloud.csiss.gmu.edu
+1more
ttl
Updated May 4, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Commonwealth Scientific and Industrial Research Organisation (CSIRO) (2017). data.gov.au Dataset Ontology [Dataset]. https://data.gov.au/data/dataset/activity/data-gov-au-dataset-ontology
Explore at:
ttlAvailable download formats
Dataset updated
May 4, 2017
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Commonwealth Scientific and Industrial Research Organisation (CSIRO)
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Area covered
Australia
Description
The data.gov.au Dataset Ontology is an OWL ontology designed to describe the characteristics of datasets published on data.gov.au.

The ontology contains elements which describe the publication, update, origin, governance, spatial and temporal coverage and other contextual information about the dataset. The ontology also covers aspects of organisational custodianship and governance.

By using this ontology to describe datasets on data.gov.au publishers increase discoverability and enable the consumption of this information in other applications/systems as Linked Data. It further enables decentralised publishing of catalogs and facilitates federated dataset search across sites, e.g. in datasets that are published by the States.

Other publishers of Linked Data may make assertions about data published using this ontology, e.g. they may publish information about the use of the dataset in other applications.
h
movie-metadata
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
datadruids, movie-metadata [Dataset]. https://huggingface.co/datasets/ada-datadruids/movie-metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
datadruids
Description
ada-datadruids/movie-metadata dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Eleni Triantafillou; Tyler Zhu; Vincent Dumoulin; Pascal Lamblin; Utku Evci; Kelvin Xu; Ross Goroshin; Carles Gelada; Kevin Swersky; Pierre-Antoine Manzagol; Hugo Larochelle, Meta-Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/meta-dataset

Meta-Dataset Dataset

Explore at:

Authors

Eleni Triantafillou; Tyler Zhu; Vincent Dumoulin; Pascal Lamblin; Utku Evci; Kelvin Xu; Ross Goroshin; Carles Gelada; Kevin Swersky; Pierre-Antoine Manzagol; Hugo Larochelle

Description

The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:

ILSVRC-2012 (the ImageNet dataset, consisting of natural images with 1000 categories) Omniglot (hand-written characters, 1623 classes) Aircraft (dataset of aircraft images, 100 classes) CUB-200-2011 (dataset of Birds, 200 classes) Describable Textures (different kinds of texture images with 43 categories) Quick Draw (black and white sketches of 345 different categories) Fungi (a large dataset of mushrooms with 1500 categories) VGG Flower (dataset of flower images with 102 categories), Traffic Signs (German traffic sign images with 43 classes) MSCOCO (images collected from Flickr, 80 classes).

All datasets except Traffic signs and MSCOCO have a training, validation and test split (proportioned roughly into 70%, 15%, 15%). The datasets Traffic Signs and MSCOCO are reserved for testing only.

Clear search

Close search

Google apps

Main menu

Meta-Dataset Dataset

metadata

Common Metadata Elements for Cataloging Biomedical Datasets

Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection)

Use Cases:

Dataset relating a study on Geospatial Open Data usage and metadata quality

metadata

Dataset

Contents

The Visual Genome Dataset v1.0 Metadata

Metadata Catalogue

metadata

Dataset

Contents

Data from: A metadata framework for electronic phenotypes

arxiv-metadata-dataset

Metadata of a Large Sonar and Stereo Camera Dataset Suitable for...

Version values for DataCite dataset records

Enterprise Metadata Repository (EMR)

Dataset metadata of known Dataverse installations

Dataset Metadata Creation: Automatically generates CKAN dataset metadata...

US Restaurant POI dataset with metadata

Metadata access dataset

data.gov.au Dataset Ontology

movie-metadata

Meta-Dataset DatasetSee More Versions

Meta-Dataset Dataset