100+ datasets found

h
c4-en-html-with-metadata
huggingface.co
Updated Apr 20, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Big Science - Modeling Metadata (2011). c4-en-html-with-metadata [Dataset]. https://huggingface.co/datasets/bs-modeling-metadata/c4-en-html-with-metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2011
Dataset authored and provided by
Big Science - Modeling Metadata
Description
bs-modeling-metadata/c4-en-html-with-metadata dataset hosted on Hugging Face and contributed by the HF Datasets community
w
Websites using Social Page Metadata
webtechsurvey.com
csv
Updated Nov 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey (2025). Websites using Social Page Metadata [Dataset]. https://webtechsurvey.com/technology/social-page-metadata
Explore at:
csvAvailable download formats
Dataset updated
Nov 22, 2025
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites using the Social Page Metadata technology, compiled through global website indexing conducted by WebTechSurvey.
Metadata Dictionary Describing Data to Model a Chemical's Conditions of Use
catalog.data.gov
s.cnmilf.com
+1more
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2025). Metadata Dictionary Describing Data to Model a Chemical's Conditions of Use [Dataset]. https://catalog.data.gov/dataset/metadata-dictionary-describing-data-to-model-a-chemicals-conditions-of-use
Explore at:
Dataset updated
Apr 17, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This data dictionary describes relevant fields from secondary data sources that can assist with modeling the conditions of use for a chemical when performing a chemical assessment. Information on how to access the secondary data sources are included. This dataset is associated with the following publication: Chea, J.D., D.E. Meyer, R.L. Smith, S. Takkellapati, and G.J. Ruiz-Mercado. Exploring automated tracking of chemicals through their conditions of use to support life cycle chemical assessment. JOURNAL OF INDUSTRIAL ECOLOGY. Berkeley Electronic Press, Berkeley, CA, USA, 29(2): 413-616, (2025).
h
mini-bioasq-with-metadata
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriela Balisacan, mini-bioasq-with-metadata [Dataset]. https://huggingface.co/datasets/potsu-potsu/mini-bioasq-with-metadata
Explore at:
Authors
Gabriela Balisacan
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
This dataset is an extension of the rag-mini-bioasq dataset. Its difference resides in the text-corpus part of the aforementioned set where the metadata was added for each passage. Metadata contains six separate categories, each in a dedicated column:

Year of the publication (publish_year) Type of the publication (publish_type) Country of the publication - often correlated with the homeland of the authors (country) Number of pages (no_pages) Authors (authors) Keywords (keywords)
N
Dublin Core Metadata Element Set, Version 1.1
data.netwerkdigitaalerfgoed.nl
application/n-quads +5
Updated Jan 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linked Open Data Vocabularies (2023). Dublin Core Metadata Element Set, Version 1.1 [Dataset]. https://data.netwerkdigitaalerfgoed.nl/vocabularies/dce
Explore at:
application/trig, application/n-quads, application/n-triples, ttl, jsonld, application/sparql-results+jsonAvailable download formats
Dataset updated
Jan 5, 2023
Dataset authored and provided by
Linked Open Data Vocabularies
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Dublin Core Metadata Element Set is a vocabulary of fifteen properties for use in resource description. The name "Dublin" is due to its origin at a 1995 invitational workshop in Dublin, Ohio; "core" because its elements are broad and generic, usable for describing a wide range of resources.

The fifteen element "Dublin Core" described in this standard is part of a larger set of metadata vocabularies and technical specifications maintained by the Dublin Core Metadata Initiative (DCMI). The full set of vocabularies, DCMI Metadata Terms, also includes sets of resource classes (including the DCMI Type Vocabulary, vocabulary encoding schemes, and syntax encoding schemes. The terms in DCMI vocabularies are intended to be used in combination with terms from other, compatible vocabularies in the context of application profiles and on the basis of the DCMI Abstract Model.

All changes made to terms of the Dublin Core Metadata Element Set since 2001 have been reviewed by a DCMI Usage Board in the context of a DCMI Namespace Policy. The namespace policy describes how DCMI terms are assigned Uniform Resource Identifiers (URIs) and sets limits on the range of editorial changes that may allowably be made to the labels, definitions, and usage comments associated with existing DCMI terms.

This document, an excerpt from the more comprehensive document DCMI Metadata Terms provides an abbreviated reference version of the fifteen element descriptions that have been formally endorsed in the following standards:

ISO Standard 15836:2009 of February 2009 [ISO15836]

ANSI/NISO Standard Z39.85-2012 of February 2013 [NISOZ3985]

IETF RFC 5013 of August 2007 [RFC5013]

Since 1998, when these fifteen elements entered into a standardization track, notions of best practice in the Semantic Web have evolved to include the assignment of formal domains and ranges in addition to definitions in natural language. Domains and ranges specify what kind of described resources and value resources are associated with a given property. Domains and ranges express the meanings implicit in natural-language definitions in an explicit form that is usable for the automatic processing of logical inferences. When a given property is encountered, an inferencing application may use information about the domains and ranges assigned to a property in order to make inferences about the resources described thereby.

Since January 2008, therefore, DCMI includes formal domains and ranges in the definitions of its properties. So as not to affect the conformance of existing implementations of "simple Dublin Core" in RDF, domains and ranges have not been specified for the fifteen properties of the dce: namespace (http://purl.org/dc/elements/1.1/). Rather, fifteen new properties with "names" identical to those of the Dublin Core Metadata Element Set Version 1.1 have been created in the dct: namespace (http://purl.org/dc/terms/). These fifteen new properties have been defined as sub-properties of the corresponding properties of DCMES Version 1.1 and assigned domains and ranges as specified in the more comprehensive document DCMI Metadata Terms.

Implementers may freely choose to use these fifteen properties either in their legacy dce: variant (e.g., http://purl.org/dc/elements/1.1/creator) or in the dct: variant (e.g., http://purl.org/dc/terms/creator) depending on application requirements. The RDF schemas of the DCMI namespaces describe the subproperty relation of dct:creator to dce:creator for use by Semantic Web-aware applications. Over time, however, implementers are encouraged to use the semantically more precise dct: properties, as they more fully follow emerging notions of best practice for machine-processable metadata.

Homepage: https://www.dublincore.org/specifications/dublin-core/dces/

Namespace: http://purl.org/dc/elements/1.1/
n
OpenScience Slovenia document metadata dataset
narcis.nl
data.mendeley.com
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Borovič, M (via Mendeley Data) (2021). OpenScience Slovenia document metadata dataset [Dataset]. http://doi.org/10.17632/7wh9xvvmgk.3
Explore at:
Unique identifier
https://doi.org/10.17632/7wh9xvvmgk.3
Dataset updated
Mar 9, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Borovič, M (via Mendeley Data)
Area covered
Slovenia
Description
The OpenScience Slovenia metadata dataset contains metadata entries for Slovenian public domain academic documents which include undergraduate and postgraduate theses, research and professional articles, along with other academic document types. The data within the dataset was collected as a part of the establishment of the Slovenian Open-Access Infrastructure which defined a unified document collection process and cataloguing for universities in Slovenia within the infrastructure repositories. The data was collected from several already established but separate library systems in Slovenia and merged into a single metadata scheme using metadata deduplication and merging techniques. It consists of text and numerical fields, representing attributes that describe documents. These attributes include document titles, keywords, abstracts, typologies, authors, issue years and other identifiers such as URL and UDC. The potential of this dataset lies especially in text mining and text classification tasks and can also be used in development or benchmarking of content-based recommender systems on real-world data.
c
Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection)
crawlfeeds.com
csv, zip
Updated Aug 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection) [Dataset]. https://crawlfeeds.com/datasets/movies-tv-shows-metadata-dataset-190k-records-horror-heavy-collection
Explore at:
zip, csvAvailable download formats
Dataset updated
Aug 23, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
This comprehensive dataset features detailed metadata for over 190,000 movies and TV shows, with a strong concentration in the Horror genre. It is ideal for entertainment research, machine learning models, genre-specific trend analysis, and content recommendation systems.

Each record contains rich information, making it perfect for streaming platforms, film industry analysts, or academic media researchers.

Primary Genre Focus: Horror

Use Cases:

Build movie recommendation systems or genre classifiers

Train NLP models on movie descriptions

Analyze Horror content trends over time

Explore box office vs. rating correlations

Enrich entertainment datasets with directorial and cast metadata
Human vs AI Generated Texts with Metadata
kaggle.com
zip
Updated Sep 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandhya Peesara (2025). Human vs AI Generated Texts with Metadata [Dataset]. https://www.kaggle.com/datasets/sandhyapeesara/human-vs-ai-generated-texts-with-metadata
Explore at:
zip(32320246 bytes)Available download formats
Dataset updated
Sep 22, 2025
Authors
Sandhya Peesara
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a collection of human-written and AI-generated texts, along with metadata such as text length, word count, and source type. The content ranges from case studies, essays, reflections, and personal narratives to AI reviews and feedback. It is useful for tasks such as text classification, authorship attribution, NLP benchmarking, and AI vs. human text analysis.

Dataset Structure

Each record is stored in JSON format with the following fields:

text (string) – The full text content (essay, case study, review, or reflection).

source (string) – Indicates whether the text was written by a Human or generated by an AI/Assistant.

prompt_id (integer) – Identifier linking the text to a given prompt or task.

text_length (integer) – The number of characters in the text.

word_count (integer) – The number of words in the text.
Z
Raw Data for Publication "Using Supervised Learning to Classify Metadata of...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Apr 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weber, Tobias (2020). Raw Data for Publication "Using Supervised Learning to Classify Metadata of Research Data by Field of Study" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3490328
Explore at:
Dataset updated
Apr 20, 2020
Dataset provided by
Leibniz Supercomputing Centre
Authors
Weber, Tobias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Automated classification of metadata of research data by their field of study can be used in scientometric research, by repository service providers, and in the context of research data aggregation services. To evaluate different machine learning approaches, data from the DataCite index have been downloaded in May 2019 with a GeRDI harvester (filtering out any metadata without a qualified subject, i.e. a subject with either a subjectName or a subjectURI) . These is the resulting raw data set.
Dataset metadata of known Dataverse installations
search.datacite.org
dataverse.harvard.edu
+1more
Updated 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julian Gautier (2019). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/dvn/dcdkzq
Explore at:
Unique identifier
https://doi.org/10.7910/dvn/dcdkzq
Dataset updated
2019
Dataset provided by
DataCitehttps://www.datacite.org/
Harvard Dataverse
Authors
Julian Gautier
Description
This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected this data, 36 installations were running versions of the Dataverse software that allow depositors to choose a license or data use agreement from a dropdown menu in the dataset deposit form. For more information, see https://guides.dataverse.org/en/5.11.1/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations.csv file contains the metadata block names, field names and child field names (if the field is a compound field) of the 77 Dataverse installations' metadata blocks. The metadatablocks_from_most_known_dataverse_installations.csv file is useful for comparing each installation's dataset metadata model (the metadata fields and the metadata blocks that each installation uses). The CSV file was created using a Python script at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_csv_file_with_metadata_block_fields_of_all_installations.py, which takes as inputs the directories and files created by the get_dataset_metadata_of_all_installations.py script. Known errors The metadata of two datasets from one of the known installations could not be downloaded because the datasets' pages and metadata could not be accessed with the Dataverse APIs. About metadata blocks Read about the Dataverse software's metadata blocks system at http://guides.dataverse.org/en/latest/admin/metadatacustomization.html
c
The global enterprise metadata management market size is USD 7.85 billion in...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Jun 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2024). The global enterprise metadata management market size is USD 7.85 billion in 2024 and will expand at a compound annual growth rate (CAGR) of 24.1% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/enterprise-metadata-management-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jun 18, 2024
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global enterprise metadata management market size is USD 7.85 billion in 2024 and will expand at a compound annual growth rate (CAGR) of 24.1% from 2024 to 2031. Market Dynamics of Enterprise Metadata Management Market

Key Drivers for Enterprise Metadata Management Market

Rapidly expanding data sets- The market growth is fueled by enterprise metadata management. Enterprises need to manage and understand their massive and varied datasets as the amount of data generated by these entities continues to grow at an exponential rate. The management of structured and unstructured data is becoming more complicated as organizations gather massive volumes of data from many sources. Enterprise metadata management is crucial for comprehending data context, linkages, and usage; enterprise metadata management offers a framework for organizing, characterizing, and controlling data using metadata. Moreover, improved data quality, easier data integration, and system-wide consistency are all results of well-managed metadata. Better decision-making and operational efficiency can be achieved when firms use enterprise metadata management because it increases data discoverability, streamlines data processes, and supports advanced analytics. The demand for enterprise metadata management is being driven by these markets becoming more popular because of the growth of big data and advanced analytics tools.

Key Restraints for Enterprise Metadata Management Market

The enterprise metadata management industry is restricted due to a high implementation cost. The implementation and maintenance of enterprise metadata management solutions can be impeded by a lack of trained specialists in this industry.

za Introduction of the Enterprise Metadata Management Market

Enterprise metadata management is the process of managing all of an organization’s information. Metadata is information about other data that gives it organization, meaning, and context. Better management of data, following the rules, and better decisions are all made easier by enterprise metadata management, which makes sure that data is correctly defined and easy to find. The necessity for improved data governance and strict adherence to regulations is mostly driving the global enterprise metadata management market. The demand for enterprise metadata management is also being propelled by the increasingly digital landscape and the widespread use of advanced analytics. In addition, because it aids in managing and securing the metadata created and stored, blockchain technology is gaining traction across many industries, opening up enormous possibilities for enterprise metadata management. As a result, there will likely be a meteoric rise in the business metadata management industry. Issues with data consistency across numerous channels provide a challenge for both business users and IT departments in the enterprise metadata management market.
Data from: Sample Identifiers and Metadata Reporting Format for...
osti.gov
data.ess-dive.lbl.gov
+5more
Updated Jan 1, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agarwal, Deb; Boye, Kristin; Brodie, Eoin; Burrus, Madison; Chadwick, Dana; Cholia, Shreyas; Crystal-Ornelas, Robert; Damerow, Joan; Elbashandy, Hesham; Eloy Alves, Ricardo; Ely, Kim; Goldman, Amy; Hendrix, Valerie; Jones, Christopher; Jones, Matt; Kakalia, Zarine; Kemner, Kenneth; Kersting, Annie; Maher, Kate; Merino, Nancy; O'Brien, Fianna; Perzan, Zach; Robles, Emily; Snavely, Cory; Sorensen, Patrick; Stegen, James; Varadharajan, Charu; Weisenhorn, Pamela; Whitenack, Karen; Zavarin, Mavrik (2020). Sample Identifiers and Metadata Reporting Format for Environmental Systems Science [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/1660470-ess-dive-global-sample-numbers-metadata-reporting-format-environmental-systems-science-igsn-ess
Explore at:
Dataset updated
Jan 1, 2020
Dataset provided by
United States Department of Energyhttp://energy.gov/
Environmental System Science Data Infrastructure for a Virtual Ecosystem; Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE)
Authors
Agarwal, Deb; Boye, Kristin; Brodie, Eoin; Burrus, Madison; Chadwick, Dana; Cholia, Shreyas; Crystal-Ornelas, Robert; Damerow, Joan; Elbashandy, Hesham; Eloy Alves, Ricardo; Ely, Kim; Goldman, Amy; Hendrix, Valerie; Jones, Christopher; Jones, Matt; Kakalia, Zarine; Kemner, Kenneth; Kersting, Annie; Maher, Kate; Merino, Nancy; O'Brien, Fianna; Perzan, Zach; Robles, Emily; Snavely, Cory; Sorensen, Patrick; Stegen, James; Varadharajan, Charu; Weisenhorn, Pamela; Whitenack, Karen; Zavarin, Mavrik
Description
The ESS-DIVE sample identifiers and metadata reporting format primarily follows the System for Earth Sample Registration (SESAR) Global Sample Number (IGSN) guide and template, with modifications to address Environmental Systems Science (ESS) sample needs and practicalities (IGSN-ESS). IGSNs are associated with standardized metadata to characterize a variety of different sample types (e.g. object type, material) and describe sample collection details (e.g. latitude, longitude, environmental context, date, collection method). Globally unique sample identifiers, particularly IGSNs, facilitate sample discovery, tracking, and reuse; they are especially useful when sample data is shared with collaborators, sent to different laboratories or user facilities for analyses, or distributed in different data files, datasets, and/or publications. To develop recommendations for multidisciplinary ecosystem and environmental sciences, we first conducted research on related sample standards and templates. We provide a comparison of existing sample reporting conventions, which includes mapping metadata elements across existing standards and Environment Ontology (ENVO) terms for sample object types and environmental materials. We worked with eight U.S. Department of Energy (DOE) funded projects, including those from Terrestrial Ecosystem Science and Subsurface Biogeochemical Research Scientific Focus Areas. Project scientists tested the process of registering samples for IGSNs and associated metadata in workflows for multidisciplinary ecosystem sciences.more » We provide modified IGSN metadata guidelines to account for needs of a variety of related biological and environmental samples. While generally following the IGSN core descriptive metadata schema, we provide recommendations for extending sample type terms, and connecting to related templates geared towards biodiversity (Darwin Core) and genomic (Minimum Information about any Sequence, MIxS) samples and specimens. ESS-DIVE recommends registering samples for IGSNs through SESAR, and we include instructions for registration using the IGSN-ESS guidelines. Our resulting sample reporting guidelines, template (IGSN-ESS), and identifier approach can be used by any researcher with sample data for ecosystem sciences.« less
Imagery with Metadata
national-government.esrij.com
esriaustraliahub.com.au
+1more
Updated Apr 3, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2011). Imagery with Metadata [Dataset]. https://national-government.esrij.com/maps/c03a526d94704bfb839445e80de95495
Explore at:
Dataset updated
Apr 3, 2011
Dataset authored and provided by
Esrihttp://esri.com/
Area covered

Description
World Imagery provides one meter or better satellite and aerial imagery in many parts of the world and lower resolution satellite imagery worldwide. The map includes 15m TerraColor imagery at small and mid-scales (~1:591M down to ~1:72k) and 2.5m SPOT Imagery (~1:288k to ~1:72k) for the world. The map features 0.5m resolution imagery in the continental United States and parts of Western Europe from Vantor. Additional Vantor sub-meter imagery is featured in many parts of the world. In the United States, 1 meter or better resolution NAIP imagery is available in some areas. In other parts of the world, imagery at different resolutions has been contributed by the GIS User Community. In select communities, very high resolution imagery (down to 0.03m) is available down to ~1:280 scale. You can contribute your imagery to this map and have it served by Esri via the Community Maps Program. View the list of Contributors for the World Imagery Map. See World Imagery for more information on this map. Metadata: Point and click on the map to see the resolution, collection date, and source of the imagery. Values of "99999" mean that metadata is not available for that field. The metadata applies only to the best available imagery at that location. You may need to zoom in to view the best available imagery. Feedback: Have you ever seen a problem in the Esri World Imagery Map that you wanted to see fixed? You can use the Imagery Map Feedback web map to provide feedback on issues or errors that you see. The feedback will be reviewed by the ArcGIS Online team and considered for one of our updates. Need Newer Imagery?: If you need to access more recent or higher resolution imagery, you can find and order that in the Content Store for ArcGIS app.Learn MoreGet AccessOpen App
metadata
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). metadata [Dataset]. https://catalog.data.gov/dataset/metadata-f2500
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The dataset consists of public domain acute and chronic toxicity and chemistry data for algal species. Data are accessible at: https://envirotoxdatabase.org/ Data include algal species, chemical identification, and the concentrations that do and do not affect algal growth.
Common Metadata Elements for Cataloging Biomedical Datasets
figshare.com
xlsx
Updated Jan 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Read (2016). Common Metadata Elements for Cataloging Biomedical Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.1496573.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1496573.v1
Dataset updated
Jan 20, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Kevin Read
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.
h
subset-0-with-metadata
huggingface.co
Updated Sep 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sraghvi Anchaliya (2025). subset-0-with-metadata [Dataset]. https://huggingface.co/datasets/Sraghvi/subset-0-with-metadata
Explore at:
Dataset updated
Sep 15, 2025
Authors
Sraghvi Anchaliya
Description
Sraghvi/subset-0-with-metadata dataset hosted on Hugging Face and contributed by the HF Datasets community
Z
Metadata of a Large Sonar and Stereo Camera Dataset Suitable for...
data.niaid.nih.gov
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Backe, Christian; Wehbe, Bilal; Bande, Miguel; Shah, Nimish; Cesar, Diego; Pribbernow, Max (2024). Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10373153
Explore at:
Dataset updated
Jul 8, 2024
Dataset provided by
German Research Center for Artificial Intelligence (DFKI)
Kraken Robotik GmbH
Authors
Backe, Christian; Wehbe, Bilal; Bande, Miguel; Shah, Nimish; Cesar, Diego; Pribbernow, Max
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation

Introduction

This is a set of metadata describing a large dataset of synchronized sonar and stereo camera recordings, that were captured between August 2021 and September 2023 during the project DeeperSense (https://robotik.dfki-bremen.de/en/research/projects/deepersense/), as training data for Sonar-to-RGB image translation. Parts of the sensor data have been published (https://zenodo.org/records/7728089, https://zenodo.org/records/10220989). Due to the size of the sensor data corpus, it is currently impractical to make the entire corpus accessible online. Instead, this metadatabase serves as a relatively compact representation, allowing interested researchers to inspect the data, and select relevant portions for their particular use case, which will be made available on demand. This is an effort to comply with the FAIR principle A2 (https://www.go-fair.org/fair-principles/) that metadata shall be accessible, even when the base data is not immediately.

Locations and sensors

The sensor data was captured at four different locations, including one laboratory (Maritime Exploration Hall at DFKI RIC Bremen) and three field locations (Chalk Lake Hemmoor, Tank Wash Basin Neu-Ulm, Lake Starnberg). At all locations, a ZED camera and a Blueprint Oculus M1200d sonar were used. Additionally, a SeaVision camera was used at the Maritime Exploration Hall at DFKI RIC Bremen and at the Chalk Lake Hemmoor. The examples/ directory holds a typical output image for each sensor at each available location.

Data volume per session

Six data collection sessions were conducted. The table below presents an overview of the amount of data captured in each session:

Session dates Location Number of datasets Total duration of datasets [h] Total logfile size [GB] Number of images Total image size [GB]

2021-08-09 - 2021-08-12 Maritime Exploration Hall at DFKI RIC Bremen 52 10.8 28.8 389’047 88.1

2022-02-07 - 2022-02-08 Maritime Exploration Hall at DFKI RIC Bremen 35 4.4 54.1 629’626 62.3

2022-04-26 - 2022-04-28 Chalk Lake Hemmoor 52 8.1 133.6 1’114’281 97.8

2022-06-28 - 2022-06-29 Tank Wash Basin Neu-Ulm 42 6.7 144.2 824’969 26.9

2023-04-26 - 2023-04-27 Maritime Exploration Hall at DFKI RIC Bremen 55 7.4 141.9 739’613 9.6

2023-09-01 - 2023-09-02 Lake Starnberg 19 2.9 40.1 217’385 2.3

255 40.3 542.7 3’914’921 287.0

Data and metadata structure

Sensor data corpus

The sensor data corpus comprises two processing stages:

raw data streams stored in ROS bagfiles (aka logfiles),

camera and sonar images (aka datafiles) extracted from the logfiles.

The files are stored in a file tree hierarchy which groups them by session, dataset, and modality:

${session_key}/ ${dataset_key}/ ${logfile_name} ${modality_key}/ ${datafile_name}

A typical logfile path has this form:

2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ stereo_camera-zed-2023-09-02-15-06-07.bag

A typical datafile path has this form:

2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ zed_right/ 1693660038_368077993.jpg

All directory and file names, and their particles, are designed to serve as identifiers in the metadatabase. Their formatting, as well as the definitions of all terms, are documented in the file entities.json.

Metadatabase

The metadatabase is provided in two equivalent forms:

as a standalone SQLite (https://www.sqlite.org/index.html) database file metadata.sqlite for users familiar with SQLite,

as a collection of CSV files in the csv/ directory for users who prefer other tools.

The database file has been generated from the CSV files, so each database table holds the same information as the corresponding CSV file. In addition, the metadatabase contains a series of convenience views that facilitate access to certain aggregate information.

An entity relationship diagram of the metadatabase tables is stored in the file entity_relationship_diagram.png. Each entity, its attributes, and relations are documented in detail in the file entities.json

Some general design remarks:

For convenience, timestamps are always given in both a human-readable form (ISO 8601 formatted datetime strings with explicit local time zone), and as seconds since the UNIX epoch.

In practice, each logfile always contains a single stream, and each stream is stored always in a single logfile. Per database schema however, the entities stream and logfile are modeled separately, with a “many-streams-to-one-logfile” relationship. This design was chosen to be compatible with, and open for, data collections where a single logfile contains multiple streams.

A modality is not an attribute of a sensor alone, but of a datafile: Because a sensor is an attribute of a stream, and a single stream may be the source of multiple modalities (e.g. RGB vs. grayscale images from the same camera, or cartesian vs. polar projection of the same sonar output). Conversely, the same modality may originate from different sensors.

As a usage example, the data volume per session which is tabulated at the top of this document, can be extracted from the metadatabase with the following SQL query:

SELECT PRINTF( '%s - %s', SUBSTR(session_start, 1, 10), SUBSTR(session_end, 1, 10)) AS 'Session dates', location_name_english AS Location, number_of_datasets AS 'Number of datasets', total_duration_of_datasets_h AS 'Total duration of datasets [h]', total_logfile_size_gb AS 'Total logfile size [GB]', number_of_images AS 'Number of images', total_image_size_gb AS 'Total image size [GB]' FROM location JOIN session USING (location_id) JOIN ( SELECT session_id, COUNT(dataset_id) AS number_of_datasets, ROUND( SUM(dataset_duration) / 3600, 1) AS total_duration_of_datasets_h, ROUND( SUM(total_logfile_size) / 10e9, 1) AS total_logfile_size_gb FROM location JOIN session USING (location_id) JOIN dataset USING (session_id) JOIN view_dataset_total_logfile_size USING (dataset_id) GROUP BY session_id ) USING (session_id) JOIN ( SELECT session_id, COUNT(datafile_id) AS number_of_images, ROUND(SUM(datafile_size) / 10e9, 1) AS total_image_size_gb FROM session JOIN dataset USING (session_id) JOIN stream USING (dataset_id) JOIN datafile USING (stream_id) GROUP BY session_id ) USING (session_id) ORDER BY session_id;
Movies info from Rotten Tomatoes
kaggle.com
zip
Updated Jan 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Terra (2021). Movies info from Rotten Tomatoes [Dataset]. https://www.kaggle.com/rafaelterra/movies-metadata-from-rotten-tomatoes
Explore at:
zip(1211440 bytes)Available download formats
Dataset updated
Jan 5, 2021
Authors
Rafael Terra
Description
What is it?

Metadata extracted from the Rotten Tomatoes website using web scraping techniques. All the code used to do that can be seen in https://github.com/rafaelstjf/Tomato_Brusher

Context

I wanted to use machine learning approaches to see if it was possible to predict the user score of a movie using only things like genre, rating, rating, critic score, cast and crew.
IMDB & TMDB Movie Metadata Big Dataset (over 1M)
kaggle.com
zip
Updated Aug 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shubham Chandra (2024). IMDB & TMDB Movie Metadata Big Dataset (over 1M) [Dataset]. https://www.kaggle.com/datasets/shubhamchandra235/imdb-and-tmdb-movie-metadata-big-dataset-1m
Explore at:
zip(416807108 bytes)Available download formats
Dataset updated
Aug 5, 2024
Authors
Shubham Chandra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Title: IMDB & TMDB Movie Metadata Big Dataset (>1M)

Subtitle: A Comprehensive Dataset Featuring Detailed Metadata of Movies (IMDB, TMDB). Over 1M Rows & 42 Features: Metadata, Ratings, Genres, Cast, Crew, Sentiment Analysis and many more...

Detailed Description:

Overview: This comprehensive dataset merges the extensive film data available from both IMDB and TMDB, offering a rich resource for movie enthusiasts, data scientists, and researchers. With over 1 million rows and 42 detailed features, this dataset provides in-depth information about a wide variety of movies, spanning different genres, periods, and production backgrounds.

File Information: 1. File Size: ≈ 1GB 2. Format: CSV (Comma-Separated Values)

Column Descriptors/Key Features: 1. ID: Unique identifier for each movie. 2. Title: The official title of the movie. 3. Vote Average: Average rating received by the movie. 4. Vote Count: Number of votes the movie has received. 5. Status: Current status of the movie (e.g., Released, Post-Production). 6. Release Date: Official release date of the movie. 7. Revenue: Box office revenue generated by the movie. 8. Runtime: Duration of the movie in minutes. 9. Adult: Indicates if the movie is for adults. 10. Genres: List of genres the movie belongs to. 11. Overview Sentiment: Sentiment analysis of the movie's overview text. 12. Cast: List of main actors in the movie. 13. Crew: List of key crew members, including directors, producers, and writers. 14. Genres List: Detailed genres in list format. 15. Keywords: List of relevant keywords associated with the movie. 16. Director of Photography: Name of the cinematographer. 17. Producers: Names of the producers. 18. Music Composer: Name of the music composer.

Additional Features:

Unnamed 0: Index column.

Star1, Star2, Star3, Star4: Names of the top-billed stars.

Writer: Name(s) of the writer(s).

Original Language: Original language of the movie.

Original Title: Original title if different from the main title.

Popularity: Popularity score of the movie.

Budget: Budget allocated for the movie.

Tagline: Promotional tagline of the movie.

Production Companies: Companies involved in the production.

Production Countries: Countries where the movie was produced.

Spoken Languages: Languages spoken in the movie.

Homepage: Official website of the movie.

IMDB ID: Unique identifier on IMDB.

TMDB ID: Unique identifier on TMDB.

Video: Indicates if there is a video associated.

Poster Path: Path to the movie poster image.

Backdrop Path: Path to the backdrop image.

Release Year: Year the movie was released.

Collection Name: Name of the collection the movie belongs to.

Collection ID: Unique identifier for the collection.

Genres ID: Unique identifier for the genres.

Original Language Code: Code for the original language.

Overview: Brief summary of the movie.

All Combined Keywords: Combined keywords in a single field.

Potential Use Cases: - Sentiment Analysis: Analyze audience sentiment towards movies based on reviews and ratings. - Recommendation Systems: Build models to recommend movies based on user preferences and viewing history. - Market Analysis: Study trends in the movie industry, including genre popularity and revenue patterns. - Content Analysis: Investigate the thematic content and diversity of movies over time. - Data Visualization: Create visual representations of movie data to uncover hidden insights.
Additional file 1: of OMeta: an ontology-based, data-driven metadata...
springernature.figshare.com
datasetcatalog.nlm.nih.gov
txt
Updated Jan 7, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Indresh Singh; Mehmet Kuscuoglu; Derek Harkins; Granger Sutton; Derrick Fouts; Karen Nelson (2019). Additional file 1: of OMeta: an ontology-based, data-driven metadata tracking system [Dataset]. http://doi.org/10.6084/m9.figshare.7552577.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7552577.v1
Dataset updated
Jan 7, 2019
Dataset provided by
Figsharehttp://figshare.com/
Authors
Indresh Singh; Mehmet Kuscuoglu; Derek Harkins; Granger Sutton; Derrick Fouts; Karen Nelson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GSC/BRC Metadata Standards ProjectSetup: Example CSV file for setting up Project registration and update event for GSC/BRC metadata standards. User as load the setup file using CLI interface or user can setup project using metadata setup GUI. (CSV 14 kb)

Facebook

Twitter

Click to copy link

Link copied

Cite

Big Science - Modeling Metadata (2011). c4-en-html-with-metadata [Dataset]. https://huggingface.co/datasets/bs-modeling-metadata/c4-en-html-with-metadata

c4-en-html-with-metadata

bs-modeling-metadata/c4-en-html-with-metadata

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 20, 2011

Dataset authored and provided by

Big Science - Modeling Metadata

Description

bs-modeling-metadata/c4-en-html-with-metadata dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

c4-en-html-with-metadata

Websites using Social Page Metadata

Metadata Dictionary Describing Data to Model a Chemical's Conditions of Use

mini-bioasq-with-metadata

Dublin Core Metadata Element Set, Version 1.1

OpenScience Slovenia document metadata dataset

Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection)

Use Cases:

Human vs AI Generated Texts with Metadata

Raw Data for Publication "Using Supervised Learning to Classify Metadata of...

Dataset metadata of known Dataverse installations

The global enterprise metadata management market size is USD 7.85 billion in...

Data from: Sample Identifiers and Metadata Reporting Format for...

Imagery with Metadata

metadata

Common Metadata Elements for Cataloging Biomedical Datasets

subset-0-with-metadata

Metadata of a Large Sonar and Stereo Camera Dataset Suitable for...

Movies info from Rotten Tomatoes

What is it?

Context

IMDB & TMDB Movie Metadata Big Dataset (over 1M)

Additional file 1: of OMeta: an ontology-based, data-driven metadata...

c4-en-html-with-metadataSee More Versions

bs-modeling-metadata/c4-en-html-with-metadata

c4-en-html-with-metadata