100+ datasets found
  1. Common Metadata Elements for Cataloging Biomedical Datasets

    • figshare.com
    xlsx
    Updated Jan 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Read (2016). Common Metadata Elements for Cataloging Biomedical Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.1496573.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 20, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Kevin Read
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.

  2. Dataset relating a study on Geospatial Open Data usage and metadata quality

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jun 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alfonso Quarati; Alfonso Quarati; Monica De Martino; Monica De Martino (2023). Dataset relating a study on Geospatial Open Data usage and metadata quality [Dataset]. http://doi.org/10.5281/zenodo.4280594
    Explore at:
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alfonso Quarati; Alfonso Quarati; Monica De Martino; Monica De Martino
    Description

    The Open Government Data portals (OGD) thanks to the presence of thousands of geo-referenced datasets, containing spatial information, are of extreme interest for any analysis or process relating to the territory. For this to happen, users must be enabled to access these datasets and reuse them. An element often considered hindering the full dissemination of OGD data is the quality of their metadata. Starting from an experimental investigation conducted on over 160,000 geospatial datasets belonging to six national and international OGD portals, this work has as its first objective to provide an overview of the usage of these portals measured in terms of datasets views and downloads. Furthermore, to assess the possible influence of the quality of the metadata on the use of geospatial datasets, an assessment of the metadata for each dataset was carried out, and the correlation between these two variables was measured. The results obtained showed a significant underutilization of geospatial datasets and a generally poor quality of their metadata. Besides, a weak correlation was found between the use and quality of the metadata, not such as to assert with certainty that the latter is a determining factor of the former.

    The dataset consists of six zipped CSV files, containing the collected datasets' usage data, full metadata, and computed quality values, for about 160,000 geospatial datasets belonging to the three national and three international portals considered in the study, i.e. US (catalog.data.gov), Colombia (datos.gov.co), Ireland (data.gov.ie), HDX (data.humdata.org), EUODP (data.europa.eu), and NASA (data.nasa.gov).

    Data collection occurred in the period: 2019-12-19 -- 2019-12-23.

    The header for each CSV file is:

    [ ,portalid,id,downloaddate,metadata,overallq,qvalues,assessdate,dviews,downloads,engine,admindomain]

    where for each row (a portal's dataset) the following fields are defined as follows:

    • portalid: portal identifier
    • id: dataset identifier
    • downloaddate: date of data collection
    • metadata: the overall dataset's metadata downloaded via API from the portal according to the supporting platform schema
    • overallq: overall quality values computed by applying the methodology presented in [1]
    • qvalues: json object containing the quality values computed for the 17 metrics presented in [1]
    • assessdate: date of quality assessment
    • dviews: number of total views for the dataset
    • downloads: number of total downloads for the dataset (made available only by the Colombia, HDX, and NASA portals)
    • engine: identifier of the supporting portal platform: 1(CKAN), 2 (Socrata)
    • admindomain: 1 (national), 2 (international)

    [1] Neumaier, S.; Umbrich, J.; Polleres, A. Automated Quality Assessment of Metadata Across Open Data Portals.J. Data and Information Quality2016,8, 2:1–2:29. doi:10.1145/2964909

  3. Metadata Dictionary Describing Data to Model a Chemical's Conditions of Use

    • catalog.data.gov
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2025). Metadata Dictionary Describing Data to Model a Chemical's Conditions of Use [Dataset]. https://catalog.data.gov/dataset/metadata-dictionary-describing-data-to-model-a-chemicals-conditions-of-use
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This data dictionary describes relevant fields from secondary data sources that can assist with modeling the conditions of use for a chemical when performing a chemical assessment. Information on how to access the secondary data sources are included. This dataset is associated with the following publication: Chea, J.D., D.E. Meyer, R.L. Smith, S. Takkellapati, and G.J. Ruiz-Mercado. Exploring automated tracking of chemicals through their conditions of use to support life cycle chemical assessment. JOURNAL OF INDUSTRIAL ECOLOGY. Berkeley Electronic Press, Berkeley, CA, USA, 29(2): 413-616, (2025).

  4. o

    Data for: Sustainable connectivity in a community repository

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +3more
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ted Habermann (2023). Data for: Sustainable connectivity in a community repository [Dataset]. http://doi.org/10.5061/dryad.nzs7h44xr
    Explore at:
    Dataset updated
    Dec 7, 2023
    Authors
    Ted Habermann
    Description

    Data For: Sustainable Connectivity in a Community Repository ## GENERAL INFORMATION This readme.txt file was generated on 30231110 by Ted Habermann ### Title of Dataset Data For: Sustainable Connectivity in a Community Repository ### Author Information Principal Investigator Contact Information Name: Ted Habermann (0000-0003-3585-6733) Institution: Metadata Game Changers () Email: ORCID: 0000-0003-3585-6733 ### Date published or finalized for release: November 10, 2023 ## Date of data collection (single date, range, approximate date) May and June 2023 ### Information about funding sources that supported the collection of the data: National Science Foundation (Crossref Funder ID: 100000001) Award 2134956. ### Overview of the data (abstract): These data are Dryad metadata retrieved from and translated into csv files. There are two datasets: 1. DryadJournalDataset was retrieved from Dryad using the ISSNs in the file DryadJournalDataset_ISSNs.txt, although some had no data. 2. DryadOrganizationDataset was retrieved from Dryad using the RORs in the file DryadOrganizationDataset_RORs.txt, although some had no data. Each dataset includes four types of metadata: identifiers, funders, keywords, and related works, each in a separate comma (.csv) or tab (.tsv) delimited files. There are also Microsoft Excel files (.xlsx) for the identifier metadata and connectivity summaries for each dataset (*.html). The connectivity summaries include summaries of each parameter in all four data files with definitions, counts, unique counts, most frequent values, and completeness. These data formed the basis for an analysis of the connectivity of the Dryad repository for organizations, funders, and people. | Size | FileName | | --------: | :--------------------------------------------------------- | | 90541505 | DryadJournalDataset_Identifiers_20230520_12.csv | | 9017051 | DryadJournalDataset_funders_20230520_12.tsv | | 29108477 | DryadJournalDataset_keywords_20230520_12.tsv | | 8833842 | DryadJournalDataset_relatedWorks_20230520_12.tsv | | | | | 18260935 | DryadOrganizationDataset_funders_20230601_12.tsv | | 240128730 | DryadOrganizationDataset_identifiers_20230601_12.tsv | | 39600659 | DryadOrganizationDataset_keywords_20230601_12.tsv | | 11520475 | DryadOrganizationDataset_relatedWorks_20230601_12.tsv | | | | | 40726143 | DryadJournalDataset_identifiers_20230520_12.xlsx | | 81894301 | DryadOrganizationDataset_identifiers_20230601_12.xlsx | | | | | 842827 | DryadJournalDataset_ConnectivitySummary.html | | 387551 | DryadOrganizationDataset_ConnectivitySummary.html | ### Field Definitions ## SHARING/ACCESS INFORMATION ### Licenses/restrictions placed on the data: Creative Commons Public Domain License (CC0) ### Links to publications that cite or use the data: TBD ### Was data derived from another source? No ## DATA & FILE OVERVIEW ### File List A. *Dataset_identifiers_YYYYMMDD_HH.*sv: Short description: Identifier metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. B. *Dataset_funders_YYYYMMDD_HH.*sv: Short description: Funder metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. C. *Dataset_keywords_YYYYMMDD_HH.*sv: Short description: Keyword metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. D. *Dataset_relatedWorks_YYYYMMDD_HH.*sv: Short description: Related work metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. E. *Dataset_identifiers_YYYYMMDD_HH.xlsx: Short description: Excel spreadsheet with identifier metadata from Dryad for Dataset collected at YYYYMMDD_HH using the Dryad API. F. *Dataset_ConnectivitySummary.html: Short description: Connectivity summary for Dataset. G. summarizeConnectivity.ipynb Short description: Python notebook with code for creating connectivity summaries and plots. ### Relationship between files: All files with the same dataset name make up a dataset. The .*sv are original metadata extracted from Dryad. ## METHODOLOGICAL INFORMATION ### Description of methods used for collection/generation of data: Most of the analysis is simply extracting and comparing counts of various metadata elements. ## DATA-SPECIFIC INFORMATION See connectivity summaries (*ConnectivitySummary.html) for a list of parameters in each file and summaries of their values. ### Identifier Metadata The identifier metadata datasets include the following fields: | Field | Definition | | :------------------------------- | :--------------------------------------------------------------------------------------------------- | | DOI | Digital object identifier for the dataset | | title | Title for the dataset | | datePublished | Date dataset published | | relatedPublicationISSN | International Standard Serial Number for journal with related publication | | primary_article | Digital object identifier for pr...

  5. h

    text-descriptives-metadata

    • huggingface.co
    Updated Oct 15, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2013). text-descriptives-metadata [Dataset]. https://huggingface.co/datasets/argilla/text-descriptives-metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2013
    Dataset authored and provided by
    Argilla
    Description

    Dataset Card for text-descriptives-metadata

    This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into Argilla as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.

      Dataset Summary
    

    This dataset contains:

    A dataset configuration file conforming to the Argilla dataset format named argilla.yaml. This configuration file will be used to configure the dataset when using the… See the full description on the dataset page: https://huggingface.co/datasets/argilla/text-descriptives-metadata.

  6. A Dataset of Metadata of Articles Citing Retracted Articles

    • zenodo.org
    csv
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yagmur Ozturk; Yagmur Ozturk (2024). A Dataset of Metadata of Articles Citing Retracted Articles [Dataset]. http://doi.org/10.5281/zenodo.13621503
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yagmur Ozturk; Yagmur Ozturk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises of metada of articles citing retracted publications. Originally, we obtained the DOIs from the Feet of Clay Detector of the Problematic Paper Screener (PPS - FoCD). Additional columns that were not provided in PPS were added using Crossref & Retraction Watch Database (CRxRW) and Dimensions API services. This detector flags publications that cite retracted articles with additional metadata.

    By querying the Dimensions API with the DOIs of the FoC articles, we acquired information such as more detailed document types (editorial, review article, research article), open access status (we only kept open access FoC articles in the dataset since we want to access the full-texts in the future), and research fields (classified according to the Australian and New Zealand Standard Research Classification (ANZSRC) Fields of Research (FoR), comprising of 23 main fields such as biological sciences, education.

    To get further information about the cited retracted articles in the dataset, we used the joint release of CRxRW. Using this dataset, we added the retraction reasons and retraction years.

    The original dataset was obtained from the PPS FoCD in December 2023. At this time there were 22558 total articles flagged in FoCD. Using the data filtering feature in PPS, we had a preliminary selection before downloading the first version of the dataset. We applied a filter to obtain:

    • non-retracted citing articles at the time of data curation*
    • open-access citing articles since we need the whole text to go forward with natural language processing tasks
    • cited retracted articles with at least one scientific content related reason of retraction
    • only articles (not monographs, chapters) to retain a unified text type

    More information about the usage of this dataset will be updated.

    *Current retraction status of the citing articles can be different since this is a static dataset and scientific literature is dynamic.

  7. Dataset metadata of known Dataverse installations

    • search.datacite.org
    • dataverse.harvard.edu
    • +1more
    Updated 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Gautier (2019). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/dvn/dcdkzq
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Harvard Dataverse
    Authors
    Julian Gautier
    Description

    This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected this data, 36 installations were running versions of the Dataverse software that allow depositors to choose a license or data use agreement from a dropdown menu in the dataset deposit form. For more information, see https://guides.dataverse.org/en/5.11.1/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations.csv file contains the metadata block names, field names and child field names (if the field is a compound field) of the 77 Dataverse installations' metadata blocks. The metadatablocks_from_most_known_dataverse_installations.csv file is useful for comparing each installation's dataset metadata model (the metadata fields and the metadata blocks that each installation uses). The CSV file was created using a Python script at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_csv_file_with_metadata_block_fields_of_all_installations.py, which takes as inputs the directories and files created by the get_dataset_metadata_of_all_installations.py script. Known errors The metadata of two datasets from one of the known installations could not be downloaded because the datasets' pages and metadata could not be accessed with the Dataverse APIs. About metadata blocks Read about the Dataverse software's metadata blocks system at http://guides.dataverse.org/en/latest/admin/metadatacustomization.html

  8. metadata

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). metadata [Dataset]. https://catalog.data.gov/dataset/metadata-f2500
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The dataset consists of public domain acute and chronic toxicity and chemistry data for algal species. Data are accessible at: https://envirotoxdatabase.org/ Data include algal species, chemical identification, and the concentrations that do and do not affect algal growth.

  9. c

    Dataset Metadata Creation: Automatically generates CKAN dataset metadata...

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Dataset Metadata Creation: Automatically generates CKAN dataset metadata based on ArrayExpress data, reducing manual data entry and ensuring consistency. (inferred functionality) [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-arrayexpress
    Explore at:
    Dataset updated
    Jun 4, 2025
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The arrayexpress extension for CKAN facilitates the import of data from the ArrayExpress database into a CKAN instance. This extension is designed to streamline the process of integrating ArrayExpress experiment data, a valuable resource for genomics and transcriptomics research, directly into a CKAN-based data portal. Due to limited documentation, specific functionalities are inferred to enhance data accessibility and promote efficient management of ArrayExpress datasets within CKAN. Key Features: ArrayExpress Data Import: Enables the import of experiment data from the ArrayExpress database into CKAN, providing access to valuable genomics and transcriptomics datasets. Dataset Metadata Creation: Automatically generates CKAN dataset metadata based on ArrayExpress data, reducing manual data entry and ensuring consistency. (inferred functionality) Streamlined Data Integration: Simplifies the integration process of ArrayExpress resources into CKAN, improving access to experiment-related information. (inferred functionality) Use Cases: Genomics Data Portals: Organizations managing data portals for genomics or transcriptomics research can use this extension to incorporate ArrayExpress data, increasing the breadth of available data and improving user access. Research Institutions: Research institutions can simplify data imports to share their ArrayExpress datasets with collaborators, ensuring data consistency and adherence to metadata standards. Technical Integration: The ArrayExpress extension integrates with CKAN by adding functionality to import and handle ArrayExpress data. While the exact integration points (plugins, API endpoints) aren't detailed in the provided documentation, the extension would likely use CKAN's plugin architecture to add data import capabilities, and the metadata schema may need to be adapted for compatibility (inferred integration). Benefits & Impact: By using the arrayexpress extension, organizations can improve the accessibility of ArrayExpress data within CKAN. It reduces the manual effort required to integrate experiment data and helps in maintaining a consistent and comprehensive data catalog for genomics and transcriptomics research (inferred integration).

  10. Z

    Metadata of a Large Sonar and Stereo Camera Dataset Suitable for...

    • data.niaid.nih.gov
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cesar, Diego (2024). Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10373153
    Explore at:
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Shah, Nimish
    Pribbernow, Max
    Bande, Miguel
    Wehbe, Bilal
    Backe, Christian
    Cesar, Diego
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation

    Introduction

    This is a set of metadata describing a large dataset of synchronized sonar and stereo camera recordings, that were captured between August 2021 and September 2023 during the project DeeperSense (https://robotik.dfki-bremen.de/en/research/projects/deepersense/), as training data for Sonar-to-RGB image translation. Parts of the sensor data have been published (https://zenodo.org/records/7728089, https://zenodo.org/records/10220989). Due to the size of the sensor data corpus, it is currently impractical to make the entire corpus accessible online. Instead, this metadatabase serves as a relatively compact representation, allowing interested researchers to inspect the data, and select relevant portions for their particular use case, which will be made available on demand. This is an effort to comply with the FAIR principle A2 (https://www.go-fair.org/fair-principles/) that metadata shall be accessible, even when the base data is not immediately.

    Locations and sensors

    The sensor data was captured at four different locations, including one laboratory (Maritime Exploration Hall at DFKI RIC Bremen) and three field locations (Chalk Lake Hemmoor, Tank Wash Basin Neu-Ulm, Lake Starnberg). At all locations, a ZED camera and a Blueprint Oculus M1200d sonar were used. Additionally, a SeaVision camera was used at the Maritime Exploration Hall at DFKI RIC Bremen and at the Chalk Lake Hemmoor. The examples/ directory holds a typical output image for each sensor at each available location.

    Data volume per session

    Six data collection sessions were conducted. The table below presents an overview of the amount of data captured in each session:

    Session dates Location Number of datasets Total duration of datasets [h] Total logfile size [GB] Number of images Total image size [GB]

    2021-08-09 - 2021-08-12 Maritime Exploration Hall at DFKI RIC Bremen 52 10.8 28.8 389’047 88.1

    2022-02-07 - 2022-02-08 Maritime Exploration Hall at DFKI RIC Bremen 35 4.4 54.1 629’626 62.3

    2022-04-26 - 2022-04-28 Chalk Lake Hemmoor 52 8.1 133.6 1’114’281 97.8

    2022-06-28 - 2022-06-29 Tank Wash Basin Neu-Ulm 42 6.7 144.2 824’969 26.9

    2023-04-26 - 2023-04-27 Maritime Exploration Hall at DFKI RIC Bremen 55 7.4 141.9 739’613 9.6

    2023-09-01 - 2023-09-02 Lake Starnberg 19 2.9 40.1 217’385 2.3

    255 40.3 542.7 3’914’921 287.0

    Data and metadata structure

    Sensor data corpus

    The sensor data corpus comprises two processing stages:

    raw data streams stored in ROS bagfiles (aka logfiles),

    camera and sonar images (aka datafiles) extracted from the logfiles.

    The files are stored in a file tree hierarchy which groups them by session, dataset, and modality:

    ${session_key}/ ${dataset_key}/ ${logfile_name} ${modality_key}/ ${datafile_name}

    A typical logfile path has this form:

    2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ stereo_camera-zed-2023-09-02-15-06-07.bag

    A typical datafile path has this form:

    2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ zed_right/ 1693660038_368077993.jpg

    All directory and file names, and their particles, are designed to serve as identifiers in the metadatabase. Their formatting, as well as the definitions of all terms, are documented in the file entities.json.

    Metadatabase

    The metadatabase is provided in two equivalent forms:

    as a standalone SQLite (https://www.sqlite.org/index.html) database file metadata.sqlite for users familiar with SQLite,

    as a collection of CSV files in the csv/ directory for users who prefer other tools.

    The database file has been generated from the CSV files, so each database table holds the same information as the corresponding CSV file. In addition, the metadatabase contains a series of convenience views that facilitate access to certain aggregate information.

    An entity relationship diagram of the metadatabase tables is stored in the file entity_relationship_diagram.png. Each entity, its attributes, and relations are documented in detail in the file entities.json

    Some general design remarks:

    For convenience, timestamps are always given in both a human-readable form (ISO 8601 formatted datetime strings with explicit local time zone), and as seconds since the UNIX epoch.

    In practice, each logfile always contains a single stream, and each stream is stored always in a single logfile. Per database schema however, the entities stream and logfile are modeled separately, with a “many-streams-to-one-logfile” relationship. This design was chosen to be compatible with, and open for, data collections where a single logfile contains multiple streams.

    A modality is not an attribute of a sensor alone, but of a datafile: Because a sensor is an attribute of a stream, and a single stream may be the source of multiple modalities (e.g. RGB vs. grayscale images from the same camera, or cartesian vs. polar projection of the same sonar output). Conversely, the same modality may originate from different sensors.

    As a usage example, the data volume per session which is tabulated at the top of this document, can be extracted from the metadatabase with the following SQL query:

    SELECT PRINTF( '%s - %s', SUBSTR(session_start, 1, 10), SUBSTR(session_end, 1, 10)) AS 'Session dates', location_name_english AS Location, number_of_datasets AS 'Number of datasets', total_duration_of_datasets_h AS 'Total duration of datasets [h]', total_logfile_size_gb AS 'Total logfile size [GB]', number_of_images AS 'Number of images', total_image_size_gb AS 'Total image size [GB]' FROM location JOIN session USING (location_id) JOIN ( SELECT session_id, COUNT(dataset_id) AS number_of_datasets, ROUND( SUM(dataset_duration) / 3600, 1) AS total_duration_of_datasets_h, ROUND( SUM(total_logfile_size) / 10e9, 1) AS total_logfile_size_gb FROM location JOIN session USING (location_id) JOIN dataset USING (session_id) JOIN view_dataset_total_logfile_size USING (dataset_id) GROUP BY session_id ) USING (session_id) JOIN ( SELECT session_id, COUNT(datafile_id) AS number_of_images, ROUND(SUM(datafile_size) / 10e9, 1) AS total_image_size_gb FROM session JOIN dataset USING (session_id) JOIN stream USING (dataset_id) JOIN datafile USING (stream_id) GROUP BY session_id ) USING (session_id) ORDER BY session_id;

  11. RSNA ATD 2023 DICOM Metadata

    • kaggle.com
    Updated Oct 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emmanuel Katchy (2023). RSNA ATD 2023 DICOM Metadata [Dataset]. https://www.kaggle.com/datasets/tobetek/rsna-atd-2023-dicom-metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 4, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Emmanuel Katchy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What is DICOM

    DICOM (Digital Imaging and Communications in Medicine) is a standard format used to store and transmit medical images and related information in healthcare settings. It's a widely used format for various types of medical images, including X-rays, MRIs, CT scans, ultrasounds, and more. DICOM files typically contain a wealth of information beyond just the image pixels. This extra data would be wonderful for feature engineering. Here's an overview of the data possibly stored in a DICOM image format (the original RSNA ATD dataset has most likely been purged of PII, and majority of these fields are not present):

    1. Patient Information (Patient's name, Patient's ID, Patient's date of birth etc.)

    2. Study Information (Study description, Study date and time, Study ID etc.)

    3. Series Information:

      • Series description
      • Modality (e.g., CT, MRI, X-ray, ultrasound)
      • Series instance UID (a unique identifier for the series)
      • Number of images in the series
      • Image orientation and position information
    4. Image Information:

      • Image type (e.g., original, derived, etc.)
      • Photometric interpretation (how pixel values represent image information, e.g., grayscale, RGB)
      • Rows and columns (image dimensions)
      • Pixel spacing (physical size of each pixel)
      • Bits allocated and bits stored (bit depth of pixel values)
      • High bit (the most significant bit)
      • Windowing and leveling settings for image display
      • Rescale intercept and slope (used for converting pixel values to physical units)
      • Image orientation (patient positioning)
    5. Image Acquisition Details:

      • Exposure parameters (e.g., radiation dose in radiography, MRI sequence parameters)
      • Image acquisition date and time
      • Equipment information (e.g., machine make and model)
      • Image acquisition technique (e.g., pulse sequence in MRI)
      • Image Annotations and Markings:
    6. Image Pixel Data: The actual image pixel values, which can be 2D or 3D depending on the image type Encoded in a format such as raw pixel data or compressed image data (e.g., JPEG, JPEG2000)

    How can this Dataset be used?

    1. Feature Engineering
    2. 3D Visualization of Scan series
    3. Anomaly Detection

    Columns in the Dataset

    Here's an explanation of each of the fields in the dataset:

    1. SOP Instance UID (Unique Identifier):

      • A globally unique identifier assigned to each instance (e.g., an individual image or a series) within a DICOM study. It helps identify and distinguish different instances.
    2. Content Date:

      • The date when the image or data was created or acquired. It's typically in the format YYYYMMDD (year, month, day).
    3. Content Time:

      • The time when the image or data was created or acquired. It's typically in the format HHMMSS.FFFFFF (hour, minute, second, fraction of a second).
    4. Patient ID:

      • A unique identifier for the patient, often used to link different studies and images to the same patient.
    5. Slice Thickness:

      • The thickness of an image slice in millimeters, relevant in three-dimensional imaging modalities like CT scans.
    6. KVP (Kilovolt Peak):

      • The peak voltage of the X-ray machine used to acquire the image. It affects the quality and contrast of the image.
    7. Patient Position:

      • The position of the patient during image acquisition, such as supine, prone, standing, etc.
    8. Study Instance UID:

      • A unique identifier assigned to each study, which may consist of multiple series and images related to a specific medical examination or procedure.
    9. Series Instance UID:

      • A unique identifier assigned to each series within a study. A series contains a group of related images.
    10. Series Number:

      • An integer identifier that indicates the position of the series within the study.
    11. Instance Number:

      • An integer identifier that indicates the position of the image or data instance within a series.
    12. Image Position (Patient):

      • The position of the image slice within the patient's anatomy, typically defined by three coordinates (x, y, z) in millimeters.
    13. Image Orientation (Patient):

      • The orientation of the image with respect to the patient's anatomy, typically defined by six parameters that describe the direction cosines of the rows and columns.
    14. Frame of Reference UID:

      • An identifier that establishes a coordinate system for images within a study, enabling proper alignment and orientation of images in multi-modality studies.
    15. Samples per Pixel:

      • The number of data samples (e.g., pixels) per image pixel.
    16. Photometric Interpretation:

      • Describes how pixel data is interpreted for display, such as grayscale, RGB color, o...
  12. Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection)

    • crawlfeeds.com
    csv, zip
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection) [Dataset]. https://crawlfeeds.com/datasets/movies-tv-shows-metadata-dataset-190k-records-horror-heavy-collection
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 22, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    This comprehensive dataset features detailed metadata for over 190,000 movies and TV shows, with a strong concentration in the Horror genre. It is ideal for entertainment research, machine learning models, genre-specific trend analysis, and content recommendation systems.

    Each record contains rich information, making it perfect for streaming platforms, film industry analysts, or academic media researchers.

    Primary Genre Focus: Horror

    Use Cases:

    • Build movie recommendation systems or genre classifiers

    • Train NLP models on movie descriptions

    • Analyze Horror content trends over time

    • Explore box office vs. rating correlations

    • Enrich entertainment datasets with directorial and cast metadata

  13. data.gov.au Dataset Ontology

    • data.gov.au
    • data.wu.ac.at
    ttl
    Updated May 4, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Commonwealth Scientific and Industrial Research Organisation (CSIRO) (2017). data.gov.au Dataset Ontology [Dataset]. https://data.gov.au/data/dataset/activity/data-gov-au-dataset-ontology
    Explore at:
    ttlAvailable download formats
    Dataset updated
    May 4, 2017
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Commonwealth Scientific and Industrial Research Organisation (CSIRO)
    License

    Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
    License information was derived automatically

    Area covered
    Australia
    Description

    The data.gov.au Dataset Ontology is an OWL ontology designed to describe the characteristics of datasets published on data.gov.au.

    The ontology contains elements which describe the publication, update, origin, governance, spatial and temporal coverage and other contextual information about the dataset. The ontology also covers aspects of organisational custodianship and governance.

    By using this ontology to describe datasets on data.gov.au publishers increase discoverability and enable the consumption of this information in other applications/systems as Linked Data. It further enables decentralised publishing of catalogs and facilitates federated dataset search across sites, e.g. in datasets that are published by the States.

    Other publishers of Linked Data may make assertions about data published using this ontology, e.g. they may publish information about the use of the dataset in other applications.

  14. h

    arxiv-metadata-dataset

    • huggingface.co
    Updated Jun 30, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumuk Shashidhar (2015). arxiv-metadata-dataset [Dataset]. https://huggingface.co/datasets/sumuks/arxiv-metadata-dataset
    Explore at:
    Dataset updated
    Jun 30, 2015
    Authors
    Sumuk Shashidhar
    Description

    sumuks/arxiv-metadata-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. d

    US Restaurant POI dataset with metadata

    • datarade.ai
    .csv
    Updated Jul 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geolytica (2022). US Restaurant POI dataset with metadata [Dataset]. https://datarade.ai/data-products/us-restaurant-poi-dataset-with-metadata-geolytica
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jul 30, 2022
    Dataset authored and provided by
    Geolytica
    Area covered
    United States of America
    Description

    Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.

    This is our process flow:

    Our machine learning systems continuously crawl for new POI data
    Our geoparsing and geocoding calculates their geo locations
    Our categorization systems cleanup and standardize the datasets
    Our data pipeline API publishes the datasets on our data store
    

    A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.

    POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.

    We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.

    Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

    Data samples may be downloaded at https://store.poidata.xyz/us

  16. g

    Data warehouse and metadata holdings relevant to Australias North West Shelf...

    • gimi9.com
    Updated Sep 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Data warehouse and metadata holdings relevant to Australias North West Shelf | gimi9.com [Dataset]. https://gimi9.com/dataset/au_data-warehouse-and-metadata-holdings-relevant-to-australias-north-west-shelf/
    Explore at:
    Dataset updated
    Sep 4, 2024
    Description

    From the earliest stages of planning the North West Shelf Joint Environmental Management Study it was evident that good management of the scientific data to be used in the research would be important for the success of the Study. A comprehensive review of data sets and other information relevant to the marine ecosystems, the geology, infrastructure and industries of the North West Shelf area had been completed (Heyward et al. 2006). The Data Management Project was established to source and prepare existing data sets for use, requiring the development and use of a range of tools: metadata systems, data visualisation and data delivery applications. These were made available to collaborators to allow easy access to data obtained and generated by the Study. The CMAR MarLIN metadata system was used to document the 285 data sets, those which were identified as potentially useful for the Study and the software and information products generated by and for the Study. This report represents a hard copy atlas of all NWSJEMS data products and the existing data sets identified for potential use as inputs to the Study. It comprises summary metadata elements describing the data sets, their custodianship and how the data sets might be obtained. The identifiers of each data set can be used to refer to the full metadata records in the on-line MarLIN system.

  17. e

    Metadata Files - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Jul 18, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Metadata Files - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/ce82ab38-7f12-531f-8028-a74132a46b2a
    Explore at:
    Dataset updated
    Jul 18, 2021
    Description

    The Metadata files contain metadata and behavioural data. The variables are: •acquisition_rate is a scalar describing the acquisition rate in Hz. •Pixel_size is a scalar describing the size of each pixel in microns. •Numb_patches is a scalar describing the number of patches in the experiment. •Patch_coordinates is a structure containing coordinate information about each patch. Patch_coordinates.data is a matrix in which each row represents a patch, and columns 5, 6, and 7 represent the X, Y, and Z positions (respectively) of that patch. • SpeedDataMatrix and SpeedTimeMatrix are vectors containing the wheel speed time series and times from the wheel encoder. •dlc_whisk_angle and dlc_whisk_time are vectors containing the whisking angle time series and times as determined via DeepLabCut.•wheel_MI is a matrix whose second column contains the wheel motion index time series as determined from the wheel cameras and whose second column contains the corresponding times. Note that this file may also contain variables extracted by now obsolete methods which were not included by the analysis in the paper (e.g., Whiskers_angle_0 for old whisker position detection, Axon_dFF for old grouping procedure). You can ignore these.

  18. Movie Metadata and Reviews

    • kaggle.com
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentina Acevedo Lopez (2024). Movie Metadata and Reviews [Dataset]. https://www.kaggle.com/datasets/valentinaacevedo/movie-metadata-and-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Valentina Acevedo Lopez
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview

    This dataset contains detailed metadata and user reviews for movies. It includes information such as movie titles, genres, user scores, certifications, metascores, directors, top cast members, plot summaries, and user reviews. The data was scraped from IMDb and may contain some inconsistencies and missing values, making it a great resource for practicing data cleaning and preprocessing.

    Columns Description

    • Name: The title of the movie.
    • Year: The release year of the movie.
    • Genres: The genres associated with the movie (e.g., Action, Adventure, Sci-Fi).
    • Users-Score: Average user score.
    • Certification: Movie certification rating (e.g., PG-13, R).
    • Metascore: Metacritic score.
    • Director: The director of the movie.
    • Top-Cast: Main cast members.
    • Plot-Summary: A brief summary of the movie's plot.
    • Users-Reviews: User-submitted reviews.

    Data Cleaning and Preprocessing

    The dataset may include the following issues:

    • Missing Values: Some columns have missing values.
    • Inconsistent Delimiters: Certain rows may have inconsistent delimiters.
    • Duplicate Entries: There might be duplicate records.
    • Formatting Issues: Some columns may contain improperly formatted data.

    Steps for Data Cleaning:

    • Identify and handle missing values.
    • Correct delimiter issues using text processing techniques.
    • Remove duplicate records to ensure data integrity.
    • Standardize formats for categorical variables.

    Potential Use Cases

    • Movie Recommendation Systems: Use the metadata to build recommendation algorithms.
    • Sentiment Analysis: Analyze user reviews to gauge audience sentiment.
    • Trend Analysis: Explore trends in movie genres, ratings, and user reviews.

    License

    This dataset is shared under the MIT License. If you use this data, please attribute IMDb as the source.

  19. Zenodo Open Metadata snapshot. Training dataset for records classifier...

    • zenodo.org
    json, zip
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krzysztof Nowak; Krzysztof Nowak (2022). Zenodo Open Metadata snapshot. Training dataset for records classifier building [Dataset]. http://doi.org/10.5281/zenodo.800494
    Explore at:
    json, zipAvailable download formats
    Dataset updated
    Dec 14, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Krzysztof Nowak; Krzysztof Nowak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains Zenodo's published open access record's metadata as of 6th of March 2017.

    It's composed of:

    • A ZIP archive zenodo_open_metadata_06_03_2017.zip containing the full dataset:
      • zenodo_open_metadata_06_03_2017.json (425MB, MD5: 22b30564e94d85373fa87fbfb77b57d3)
    • A JSON file zenodo_open_metadata_06_03_2017_sample.json containing a small sample of the full dataset.

    Full dataset contains:

    • Metadata of 171674 Open Access Zenodo records.

    • Metadata of 5067 previously Open Access but since removed records which were classified as SPAM records by Zenodo staff.

    • Dataset contains only already publicly available metadata of all of the records.

    • In two cases, the metadata has been altered:
      • One title from a SPAM-labelled record has been altered as it contained an e-mail address.
      • One SPAM-labelled record has been removed from the full dataset

    Data format description:

    Dataset is a JSON file, containing a single list of 176741 key-value dictionaries.

    Each dictionary contains the terms:
    part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date

    which are corresponding to the fields with the same name available in Zenodo's record jsonschema v1.0.0: https://github.com/zenodo/zenodo/blob/master/zenodo/modules/records/jsonschemas/records/record-v1.0.0.json

    In addition, some terms have been altered:

    The term files contains a list of dictionaries containing filetype, size and filename only.
    The term license contains a short Zenodo ID of the license (e.g "cc-by").
    The term spam contains a boolean value, determining whether given record was marked as SPAM record by Zenodo staff.

    Some values for the top-level terms, which were missing in the metadata may contain a null value.

  20. m

    sample

    • data.mendeley.com
    Updated Feb 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kaavya kaavya (2024). sample [Dataset]. http://doi.org/10.17632/ft7ctmb7yh.1
    Explore at:
    Dataset updated
    Feb 5, 2024
    Authors
    kaavya kaavya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Describe your research hypothesis, what your data shows, any notable findings and how the data can be interpreted. Please add sufficient description to enable others to understand what the data is, how it was gathered and how to interpret and use it.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kevin Read (2016). Common Metadata Elements for Cataloging Biomedical Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.1496573.v1
Organization logoOrganization logo

Common Metadata Elements for Cataloging Biomedical Datasets

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
xlsxAvailable download formats
Dataset updated
Jan 20, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Kevin Read
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.

Search
Clear search
Close search
Google apps
Main menu