100+ datasets found
  1. Meta Kaggle

    • kaggle.com
    zip
    Updated Feb 1, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2026). Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle
    Explore at:
    zip(10313419305 bytes)Available download formats
    Dataset updated
    Feb 1, 2026
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Meta Kaggle

    Explore our public data on competitions, datasets, kernels (code / notebooks) and more

    Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

    Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

    https://imgur.com/2Egeb8R.png" alt="Kaggle Leaderboard Performance">

    This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

    Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

    August 2023 update

    In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here: https://www.kaggle.com/datasets/kaggle/meta-kaggle-code

    We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.

  2. h

    the-stack-metadata

    • huggingface.co
    Updated Nov 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode (2022). the-stack-metadata [Dataset]. https://huggingface.co/datasets/bigcode/the-stack-metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 20, 2022
    Dataset authored and provided by
    BigCode
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for The Stack Metadata

      Changelog
    

    Release Description

    v1.1 This is the first release of the metadata. It is for The Stack v1.1

    v1.2 Metadata dataset matching The Stack v1.2

      Dataset Summary
    

    This is a set of additional information for repositories used for The Stack. It contains file paths, detected licenes as well as some other information for the repositories.

      Supported Tasks and Leaderboards
    

    The main task is to recreate… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack-metadata.

  3. Amazon Berkeley Objects : Complete Metadata(small)

    • kaggle.com
    zip
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pyxis2025 (2025). Amazon Berkeley Objects : Complete Metadata(small) [Dataset]. https://www.kaggle.com/datasets/pyxis2025/amazon-berkeley-objects-complete-metadatasmall
    Explore at:
    zip(112921101 bytes)Available download formats
    Dataset updated
    May 2, 2025
    Authors
    pyxis2025
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Amazon ABO Multimodal Metadata Dataset (Mini-VQA Ready)

    This dataset is a curated subset of the Amazon Berkeley Objects (ABO) dataset, tailored specifically for multimodal applications like Visual Question Answering (VQA). It merges product metadata and image identifiers into a unified format, enabling rapid development and prototyping of multimodal AI models.

    🔍 Dataset Overview

    Each entry in the dataset corresponds to a unique product listing and includes structured information suitable for downstream tasks like: - Multilingual product description understanding - Image-grounded question generation - Metadata-aware classification and retrieval

    📦 Features

    Field NameDescription
    brandBrand name of the product (e.g., AmazonBasics, Solimo)
    bullet_pointShort description points highlighting features
    colorProduct color (e.g., White Powder Coat, Multicolor)
    item_idUnique identifier for the product
    item_keywordsA collection of search-relevant tags
    item_nameOfficial product title
    main_image_idMain image identifier (used for image retrieval)
    other_image_idAdditional image identifiers
    product_typeBroad category like CELLULAR_PHONE_CASE, SHOES, etc.
  4. c

    IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage)

    • crawlfeeds.com
    csv, zip
    Updated Nov 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). IMDb Movies Metadata Dataset – 4.5M Records (Global Coverage) [Dataset]. https://crawlfeeds.com/datasets/imdb-movies-metadata-dataset-4-5m-records-global-coverage
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Nov 9, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.

    This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.

    Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.

    What’s Included:

    • Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more

    • Delivery: Direct download

    Use Cases:

    • Train LLMs or chatbots on cinematic language and metadata

    • Build or enrich movie recommendation engines

    • Run cross-lingual or multi-region film analytics

    • Benchmark genre popularity across time periods

    • Power academic studies or entertainment dashboards

    • Feed into knowledge graphs, search engines, or NLP pipelines

  5. metadata

    • catalog.data.gov
    • datasets.ai
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). metadata [Dataset]. https://catalog.data.gov/dataset/metadata-f2500
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The dataset consists of public domain acute and chronic toxicity and chemistry data for algal species. Data are accessible at: https://envirotoxdatabase.org/ Data include algal species, chemical identification, and the concentrations that do and do not affect algal growth.

  6. Metadata dataset

    • kaggle.com
    zip
    Updated Apr 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicole Wong98 (2021). Metadata dataset [Dataset]. https://www.kaggle.com/datasets/nicolewong98/metadata-dataset
    Explore at:
    zip(1896422671 bytes)Available download formats
    Dataset updated
    Apr 26, 2021
    Authors
    Nicole Wong98
    Description

    Dataset

    This dataset was created by Nicole Wong98

    Contents

  7. c

    Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection)

    • crawlfeeds.com
    csv, zip
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Movies & TV Shows Metadata Dataset (190K+ Records, Horror-Heavy Collection) [Dataset]. https://crawlfeeds.com/datasets/movies-tv-shows-metadata-dataset-190k-records-horror-heavy-collection
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Aug 23, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    This comprehensive dataset features detailed metadata for over 190,000 movies and TV shows, with a strong concentration in the Horror genre. It is ideal for entertainment research, machine learning models, genre-specific trend analysis, and content recommendation systems.

    Each record contains rich information, making it perfect for streaming platforms, film industry analysts, or academic media researchers.

    Primary Genre Focus: Horror

    Use Cases:

    • Build movie recommendation systems or genre classifiers

    • Train NLP models on movie descriptions

    • Analyze Horror content trends over time

    • Explore box office vs. rating correlations

    • Enrich entertainment datasets with directorial and cast metadata

  8. Amazon Books Dataset (20K Books + 727K Reviews)

    • kaggle.com
    zip
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hadi Fariborzi (2025). Amazon Books Dataset (20K Books + 727K Reviews) [Dataset]. https://www.kaggle.com/datasets/hadifariborzi/amazon-books-dataset-20k-books-727k-reviews
    Explore at:
    zip(233373889 bytes)Available download formats
    Dataset updated
    Oct 21, 2025
    Authors
    Hadi Fariborzi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A comprehensive Amazon books dataset featuring 20,000 books and 727,876 reviews spanning 26 years (1997-2023), paired with a complete step-by-step data science tutorial. Perfect for learning data analytics from scratch or conducting advanced book market analysis.

    What's Included:

    Raw Data: 20K book metadata (titles, authors, prices, ratings, descriptions) + 727K detailed reviews Complete Tutorial Series: 4 progressive Python scripts covering data loading, cleaning, exploratory analysis, and visualization Ready-to-Run Code: Fully documented scripts with practice exercises Educational Focus: Designed for ENTR 3901 coursework but suitable for all skill levels Key Features:

    Real-world e-commerce data (pre-filtered for quality: 200+ reviews, $5+ price) Comprehensive documentation and setup instructions Generates 6+ professional visualizations Includes bonus analysis challenges (sentiment analysis, price optimization, time patterns) Perfect for business analytics, market research, and data science education Use Cases:

    Learning data analytics fundamentals Book market analysis and trends Customer behavior insights Price optimization studies Review sentiment analysis Academic coursework and projects This dataset bridges the gap between raw data and practical learning, making it ideal for both beginners and experienced analysts looking to explore e-commerce patterns in the publishing industry.

  9. HHS Metadata Standard

    • healthdata.gov
    • data.virginia.gov
    • +1more
    csv, xlsx, xml
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). HHS Metadata Standard [Dataset]. https://healthdata.gov/HHS/HHS-Metadata-Standard/9g3v-hy22
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Jul 18, 2025
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    HHS Metadata Standard: Version 1.0, published in July 2025, serves as the authoritative framework for defining HHS metadata—data about data—fields and attributes. Aligned with the Evidence Act and HealthData.gov, this standard establishes clear guidelines for metadata collection and public sharing across all data assets created, collected, managed, or maintained by HHS. It outlines required metadata fields for HHS datasets, ensuring consistency, interoperability, and discoverability in HHS data governance.

  10. UNIFESP X-ray Body Part - DICOM METADATA CSV

    • kaggle.com
    zip
    Updated Apr 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Icaro Bombonato (2022). UNIFESP X-ray Body Part - DICOM METADATA CSV [Dataset]. https://www.kaggle.com/datasets/ibombonato/unifesp-xray-body-part-dicom-metadata-csv
    Explore at:
    zip(282385 bytes)Available download formats
    Dataset updated
    Apr 5, 2022
    Authors
    Icaro Bombonato
    Description

    This is the metadata from DICOM files for UNIFESP X-ray Body Part Competition in csv format

    Competition and original dataset:

    https://www.kaggle.com/competitions/unifesp-x-ray-body-part-classifier/

    Acknowledgements We thank Sarah Lustosa Haiek, Julia Tagliaferri, Lucas Diniz, and Rogerio Jadjiski for annotating this dataset. We thank the PI Nitamar Abdala, MD, PhD, for supporting this work. We thank Ernandez, our PACS admin, and Jefferson, our IT manager. We thank MD.ai for providing the annotation platform.

  11. Z

    Metadata of a Large Sonar and Stereo Camera Dataset Suitable for...

    • data.niaid.nih.gov
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Backe, Christian; Wehbe, Bilal; Bande, Miguel; Shah, Nimish; Cesar, Diego; Pribbernow, Max (2024). Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10373153
    Explore at:
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Kraken Robotik GmbH
    German Research Center for Artificial Intelligence (DFKI)
    Authors
    Backe, Christian; Wehbe, Bilal; Bande, Miguel; Shah, Nimish; Cesar, Diego; Pribbernow, Max
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation

    Introduction

    This is a set of metadata describing a large dataset of synchronized sonar and stereo camera recordings, that were captured between August 2021 and September 2023 during the project DeeperSense (https://robotik.dfki-bremen.de/en/research/projects/deepersense/), as training data for Sonar-to-RGB image translation. Parts of the sensor data have been published (https://zenodo.org/records/7728089, https://zenodo.org/records/10220989). Due to the size of the sensor data corpus, it is currently impractical to make the entire corpus accessible online. Instead, this metadatabase serves as a relatively compact representation, allowing interested researchers to inspect the data, and select relevant portions for their particular use case, which will be made available on demand. This is an effort to comply with the FAIR principle A2 (https://www.go-fair.org/fair-principles/) that metadata shall be accessible, even when the base data is not immediately.

    Locations and sensors

    The sensor data was captured at four different locations, including one laboratory (Maritime Exploration Hall at DFKI RIC Bremen) and three field locations (Chalk Lake Hemmoor, Tank Wash Basin Neu-Ulm, Lake Starnberg). At all locations, a ZED camera and a Blueprint Oculus M1200d sonar were used. Additionally, a SeaVision camera was used at the Maritime Exploration Hall at DFKI RIC Bremen and at the Chalk Lake Hemmoor. The examples/ directory holds a typical output image for each sensor at each available location.

    Data volume per session

    Six data collection sessions were conducted. The table below presents an overview of the amount of data captured in each session:

    Session dates Location Number of datasets Total duration of datasets [h] Total logfile size [GB] Number of images Total image size [GB]

    2021-08-09 - 2021-08-12 Maritime Exploration Hall at DFKI RIC Bremen 52 10.8 28.8 389’047 88.1

    2022-02-07 - 2022-02-08 Maritime Exploration Hall at DFKI RIC Bremen 35 4.4 54.1 629’626 62.3

    2022-04-26 - 2022-04-28 Chalk Lake Hemmoor 52 8.1 133.6 1’114’281 97.8

    2022-06-28 - 2022-06-29 Tank Wash Basin Neu-Ulm 42 6.7 144.2 824’969 26.9

    2023-04-26 - 2023-04-27 Maritime Exploration Hall at DFKI RIC Bremen 55 7.4 141.9 739’613 9.6

    2023-09-01 - 2023-09-02 Lake Starnberg 19 2.9 40.1 217’385 2.3

    255 40.3 542.7 3’914’921 287.0

    Data and metadata structure

    Sensor data corpus

    The sensor data corpus comprises two processing stages:

    raw data streams stored in ROS bagfiles (aka logfiles),

    camera and sonar images (aka datafiles) extracted from the logfiles.

    The files are stored in a file tree hierarchy which groups them by session, dataset, and modality:

    ${session_key}/ ${dataset_key}/ ${logfile_name} ${modality_key}/ ${datafile_name}

    A typical logfile path has this form:

    2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ stereo_camera-zed-2023-09-02-15-06-07.bag

    A typical datafile path has this form:

    2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ zed_right/ 1693660038_368077993.jpg

    All directory and file names, and their particles, are designed to serve as identifiers in the metadatabase. Their formatting, as well as the definitions of all terms, are documented in the file entities.json.

    Metadatabase

    The metadatabase is provided in two equivalent forms:

    as a standalone SQLite (https://www.sqlite.org/index.html) database file metadata.sqlite for users familiar with SQLite,

    as a collection of CSV files in the csv/ directory for users who prefer other tools.

    The database file has been generated from the CSV files, so each database table holds the same information as the corresponding CSV file. In addition, the metadatabase contains a series of convenience views that facilitate access to certain aggregate information.

    An entity relationship diagram of the metadatabase tables is stored in the file entity_relationship_diagram.png. Each entity, its attributes, and relations are documented in detail in the file entities.json

    Some general design remarks:

    For convenience, timestamps are always given in both a human-readable form (ISO 8601 formatted datetime strings with explicit local time zone), and as seconds since the UNIX epoch.

    In practice, each logfile always contains a single stream, and each stream is stored always in a single logfile. Per database schema however, the entities stream and logfile are modeled separately, with a “many-streams-to-one-logfile” relationship. This design was chosen to be compatible with, and open for, data collections where a single logfile contains multiple streams.

    A modality is not an attribute of a sensor alone, but of a datafile: Because a sensor is an attribute of a stream, and a single stream may be the source of multiple modalities (e.g. RGB vs. grayscale images from the same camera, or cartesian vs. polar projection of the same sonar output). Conversely, the same modality may originate from different sensors.

    As a usage example, the data volume per session which is tabulated at the top of this document, can be extracted from the metadatabase with the following SQL query:

    SELECT PRINTF( '%s - %s', SUBSTR(session_start, 1, 10), SUBSTR(session_end, 1, 10)) AS 'Session dates', location_name_english AS Location, number_of_datasets AS 'Number of datasets', total_duration_of_datasets_h AS 'Total duration of datasets [h]', total_logfile_size_gb AS 'Total logfile size [GB]', number_of_images AS 'Number of images', total_image_size_gb AS 'Total image size [GB]' FROM location JOIN session USING (location_id) JOIN ( SELECT session_id, COUNT(dataset_id) AS number_of_datasets, ROUND( SUM(dataset_duration) / 3600, 1) AS total_duration_of_datasets_h, ROUND( SUM(total_logfile_size) / 10e9, 1) AS total_logfile_size_gb FROM location JOIN session USING (location_id) JOIN dataset USING (session_id) JOIN view_dataset_total_logfile_size USING (dataset_id) GROUP BY session_id ) USING (session_id) JOIN ( SELECT session_id, COUNT(datafile_id) AS number_of_images, ROUND(SUM(datafile_size) / 10e9, 1) AS total_image_size_gb FROM session JOIN dataset USING (session_id) JOIN stream USING (dataset_id) JOIN datafile USING (stream_id) GROUP BY session_id ) USING (session_id) ORDER BY session_id;

  12. u

    Gede Heritage Spatial Documentation Metadata Dataset

    • zivahub.uct.ac.za
    jpeg
    Updated Jan 12, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heinz Rüther; Admin ZivaHub; Ralph Schröder; Stephen Wessels; Bruce McDonald (2026). Gede Heritage Spatial Documentation Metadata Dataset [Dataset]. http://doi.org/10.25375/uct.11854452.v2
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jan 12, 2026
    Dataset provided by
    University of Cape Town
    Authors
    Heinz Rüther; Admin ZivaHub; Ralph Schröder; Stephen Wessels; Bruce McDonald
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This master metadata spreadsheet documents all of the Gede ruins heritage items published by the Zamani Project.The information in this site description is provided for contextual purposes only and should not be regarded as a primary source.Gede is a Swahili archaeological site comprising coral stone structures, including mosques, houses, and tombs arranged within a walled town layout. Architectural features such as mihrabs, water cisterns, and decorative niches reflect Islamic influence and urban planning. Excavations have revealed trade goods and domestic artifacts, indicating participation in Indian Ocean commerce. Gede provides insights into Swahili cultural identity, religious practice, and economic networks.Gede is listed as the UNESCO World Heritage Site, 'The Historic Town and Archaeological Site of Gedi'.The Zamani Project seeks to increase awareness and knowledge of tangible cultural heritage in Africa and internationally by creating metrically accurate digital representations of historical sites. Digital spatial data of cultural heritage sites can be used for research and education, for restoration and conservation, and as a record for future generations. The Zamani Project operates as a non-profit organisation within the University of Cape Town.Special thanks to the Saville Foundation, and the Andrew W. Mellon Foundation, among others, for their contributions to the digital documentation of this heritage site.If you believe any information in this description is incorrect, please contact the repository administrators.

  13. A Dataset of Metadata of Articles Citing Retracted Articles

    • zenodo.org
    csv
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yagmur Ozturk; Yagmur Ozturk (2024). A Dataset of Metadata of Articles Citing Retracted Articles [Dataset]. http://doi.org/10.5281/zenodo.13621503
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yagmur Ozturk; Yagmur Ozturk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises of metada of articles citing retracted publications. Originally, we obtained the DOIs from the Feet of Clay Detector of the Problematic Paper Screener (PPS - FoCD). Additional columns that were not provided in PPS were added using Crossref & Retraction Watch Database (CRxRW) and Dimensions API services. This detector flags publications that cite retracted articles with additional metadata.

    By querying the Dimensions API with the DOIs of the FoC articles, we acquired information such as more detailed document types (editorial, review article, research article), open access status (we only kept open access FoC articles in the dataset since we want to access the full-texts in the future), and research fields (classified according to the Australian and New Zealand Standard Research Classification (ANZSRC) Fields of Research (FoR), comprising of 23 main fields such as biological sciences, education.

    To get further information about the cited retracted articles in the dataset, we used the joint release of CRxRW. Using this dataset, we added the retraction reasons and retraction years.

    The original dataset was obtained from the PPS FoCD in December 2023. At this time there were 22558 total articles flagged in FoCD. Using the data filtering feature in PPS, we had a preliminary selection before downloading the first version of the dataset. We applied a filter to obtain:

    • non-retracted citing articles at the time of data curation*
    • open-access citing articles since we need the whole text to go forward with natural language processing tasks
    • cited retracted articles with at least one scientific content related reason of retraction
    • only articles (not monographs, chapters) to retain a unified text type

    More information about the usage of this dataset will be updated.

    *Current retraction status of the citing articles can be different since this is a static dataset and scientific literature is dynamic.

  14. n

    OpenScience Slovenia document metadata dataset

    • narcis.nl
    • data.mendeley.com
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Borovič, M (via Mendeley Data) (2021). OpenScience Slovenia document metadata dataset [Dataset]. http://doi.org/10.17632/7wh9xvvmgk.3
    Explore at:
    Dataset updated
    Mar 9, 2021
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Borovič, M (via Mendeley Data)
    Area covered
    Slovenia
    Description

    The OpenScience Slovenia metadata dataset contains metadata entries for Slovenian public domain academic documents which include undergraduate and postgraduate theses, research and professional articles, along with other academic document types. The data within the dataset was collected as a part of the establishment of the Slovenian Open-Access Infrastructure which defined a unified document collection process and cataloguing for universities in Slovenia within the infrastructure repositories. The data was collected from several already established but separate library systems in Slovenia and merged into a single metadata scheme using metadata deduplication and merging techniques. It consists of text and numerical fields, representing attributes that describe documents. These attributes include document titles, keywords, abstracts, typologies, authors, issue years and other identifiers such as URL and UDC. The potential of this dataset lies especially in text mining and text classification tasks and can also be used in development or benchmarking of content-based recommender systems on real-world data.

  15. h

    fscbac-book-metadata-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FSCBAC Open Standard, fscbac-book-metadata-dataset [Dataset]. http://doi.org/10.57967/hf/7195
    Explore at:
    Authors
    FSCBAC Open Standard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    FSCBAC Book Metadata Dataset

    This dataset contains structured metadata for children's books aligned with the FSCBAC Standard 3.1.0.It provides machine-readable entries describing book structure, linguistic load, emotional intensity, visual load, developmental purpose, and recommended usage. The dataset does not modify or extend the FSCBAC Standard.It functions as a dataset-only layer and references the standard externally.

      Files Included
    

    books.json — full dataset of… See the full description on the dataset page: https://huggingface.co/datasets/fscbac-standard/fscbac-book-metadata-dataset.

  16. d

    US Restaurant POI dataset with metadata

    • datarade.ai
    .csv
    Updated Jul 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geolytica (2022). US Restaurant POI dataset with metadata [Dataset]. https://datarade.ai/data-products/us-restaurant-poi-dataset-with-metadata-geolytica
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jul 30, 2022
    Dataset authored and provided by
    Geolytica
    Area covered
    United States of America
    Description

    Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.

    This is our process flow:

    Our machine learning systems continuously crawl for new POI data
    Our geoparsing and geocoding calculates their geo locations
    Our categorization systems cleanup and standardize the datasets
    Our data pipeline API publishes the datasets on our data store
    

    A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.

    POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.

    We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.

    Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

    Data samples may be downloaded at https://store.poidata.xyz/us

  17. w

    Dataset metadata for data.gov.uk

    • data.wu.ac.at
    html
    Updated Oct 3, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surrey County Council (2016). Dataset metadata for data.gov.uk [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/YmMyMjEzNDEtYWNiYy00YTcwLWE3NGEtNzUwNmVlZjAxMWY0
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Oct 3, 2016
    Dataset provided by
    Surrey County Council
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This dataset contains additional metadata on datasets, useful for automatic registering of datasets on the data.gov.uk system

  18. H

    Dataset metadata of known Dataverse installations, August 2025

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Sep 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Gautier (2025). Dataset metadata of known Dataverse installations, August 2025 [Dataset]. http://doi.org/10.7910/DVN/RMAGSH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Julian Gautier
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the metadata of the datasets published in 118 Dataverse installations, information about the metadata blocks of 118 installations, and the lists of pre-defined licenses or dataset terms that depositors can apply to datasets in the 100 installations that were running versions of the Dataverse software that include the "multiple-license" feature. The data is useful for improving understandings about how certain Dataverse features and metadata fields are used and for learning about the quality of dataset and file-level metadata within and across Dataverse installations. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 25 and September 2, 2025 using a Python script that uses the Dataverse API. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation)_2025.08.25-2025.09.02.csv │ ├── contributor(citation)_2025.08.25-2025.09.02.csv │ ├── data_source(citation)_2025.08.25-2025.09.02.csv │ ├── ... │ └── topic_classification(citation)_2025.08.25-2025.09.02.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2025.08.26_07.14.00.zip │ ├── dataset_pids_Abacus_2025.08.26_07.14.00.csv │ ├── Dataverse_JSON_metadata_2025.08.26_07.14.00 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.9 │ ├── astrophysics_v5.9.json │ ├── biomedical_v5.9.json │ ├── citation_v5.9.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2025.08.25_15.45.25.zip │ ├── ... │ └── Yale_Dataverse_2025.08.25_11.51.29.zip └── dataverse_installations_summary_2025.09.02.csv └── dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv └── license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv └── metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the "Citation" metadata block and "Geospatial" metadata block of datasets in the 118 Dataverse installations. For example, author(citation)_2025.08.25-2025.09.02.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in 118 installations, with a column for each of the four child fields: author name, affiliation, identifier type, and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 118 zip files, one zip file for each of the 118 Dataverse installations whose sites were functioning when I attempted to collect their metadata and that have at least one published dataset. Each zip file contains: A CSV file listing information about the datasets published in the installation, including a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. A directory with JSON files that have information about the installation's metadata fields, such as the field names and how they're organized. A directory of JSON files that contain the metadata of the installation's published, non-deaccessioned dataset versions in the Dataverse JSON metadata schema. The dataverse_installations_summary_2025.09.02.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata included and not included in this dataset. The dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv file contains the dataset PIDs of published datasets in 118 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all "dataset_pids_....csv" files in each of the 118 zip files in the dataverse_json_metadata_from_each_known_dataverse_installation directory. The license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv file contains information about the licenses and data use agreements that some installations let depositors choose when creating datasets. When I collected this data, 100 of the available 118 installations were running versions of the Dataverse software that allow depositors to choose a "predefined license or data use agreement" from a dropdown menu in the dataset deposit form. For more information about this Dataverse feature, see https://guides.dataverse.org/en/6.7/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv file contains the metadata block names, field names, child field names (if the field is a compound field), display names, descriptions/tooltip text, watermarks, and controlled vocabulary values of fields in the 118 Dataverse installations' metadata blocks. This file is useful for learning...

  19. d

    Garner Valley DAS Metadata

    • catalog.data.gov
    • gdr.openei.org
    • +3more
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Wisconsin (2025). Garner Valley DAS Metadata [Dataset]. https://catalog.data.gov/dataset/garner-valley-das-metadata-b4222
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    University of Wisconsin
    Description

    Metadata for the data collected at the NEES@UCSB Garner Valley Downhole Array field site on September 10-12, 2013 as part of the larger PoroTomo project.

  20. h

    metadata

    • huggingface.co
    Updated Jan 15, 2026
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mrfakename (2026). metadata [Dataset]. https://huggingface.co/datasets/mrfakename/metadata
    Explore at:
    Dataset updated
    Jan 15, 2026
    Authors
    mrfakename
    Description

    mrfakename/metadata dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kaggle (2026). Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle
Organization logo

Meta Kaggle

Kaggle's public data on competitions, users, submission scores, code, and more

Explore at:
22 scholarly articles cite this dataset (View in Google Scholar)
zip(10313419305 bytes)Available download formats
Dataset updated
Feb 1, 2026
Dataset authored and provided by
Kagglehttp://kaggle.com/
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Meta Kaggle

Explore our public data on competitions, datasets, kernels (code / notebooks) and more

Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

https://imgur.com/2Egeb8R.png" alt="Kaggle Leaderboard Performance">

This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

August 2023 update

In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here: https://www.kaggle.com/datasets/kaggle/meta-kaggle-code

We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.

Search
Clear search
Close search
Google apps
Main menu