100+ datasets found
  1. h

    CMMMU

    • huggingface.co
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Multimodal Art Projection (2024). CMMMU [Dataset]. https://huggingface.co/datasets/m-a-p/CMMMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2024
    Dataset authored and provided by
    Multimodal Art Projection
    Description

    CMMMU

    🌐 Homepage | 🤗 Paper | 📖 arXiv | 🤗 Dataset | GitHub

      Introduction
    

    CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music… See the full description on the dataset page: https://huggingface.co/datasets/m-a-p/CMMMU.

  2. M

    Metro Regional Parcel Dataset - (Updated Quarterly)

    • gisdata.mn.gov
    ags_mapserver, fgdb +4
    Updated Apr 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MetroGIS (2025). Metro Regional Parcel Dataset - (Updated Quarterly) [Dataset]. https://gisdata.mn.gov/dataset/us-mn-state-metrogis-plan-regional-parcels
    Explore at:
    fgdb, gpkg, html, shp, jpeg, ags_mapserverAvailable download formats
    Dataset updated
    Apr 19, 2025
    Dataset provided by
    MetroGIS
    Description

    This dataset includes all 7 metro counties that have made their parcel data freely available without a license or fees.

    This dataset is a compilation of tax parcel polygon and point layers assembled into a common coordinate system from Twin Cities, Minnesota metropolitan area counties. No attempt has been made to edgematch or rubbersheet between counties. A standard set of attribute fields is included for each county. The attributes are the same for the polygon and points layers. Not all attributes are populated for all counties.

    NOTICE: The standard set of attributes changed to the MN Parcel Data Transfer Standard on 1/1/2019.
    https://www.mngeo.state.mn.us/committee/standards/parcel_attrib/parcel_attrib.html

    See section 5 of the metadata for an attribute summary.

    Detailed information about the attributes can be found in the Metro Regional Parcel Attributes document.

    The polygon layer contains one record for each real estate/tax parcel polygon within each county's parcel dataset. Some counties have polygons for each individual condominium, and others do not. (See Completeness in Section 2 of the metadata for more information.) The points layer includes the same attribute fields as the polygon dataset. The points are intended to provide information in situations where multiple tax parcels are represented by a single polygon. One primary example of this is the condominium, though some counties stacked polygons for condos. Condominiums, by definition, are legally owned as individual, taxed real estate units. Records for condominiums may not show up in the polygon dataset. The points for the point dataset often will be randomly placed or stacked within the parcel polygon with which they are associated.

    The polygon layer is broken into individual county shape files. The points layer is provided as both individual county files and as one file for the entire metro area.

    In many places a one-to-one relationship does not exist between these parcel polygons or points and the actual buildings or occupancy units that lie within them. There may be many buildings on one parcel and there may be many occupancy units (e.g. apartments, stores or offices) within each building. Additionally, no information exists within this dataset about residents of parcels. Parcel owner and taxpayer information exists for many, but not all counties.

    This is a MetroGIS Regionally Endorsed dataset.

    Additional information may be available from each county at the links listed below. Also, any questions or comments about suspected errors or omissions in this dataset can be addressed to the contact person at each individual county.

    Anoka = http://www.anokacounty.us/315/GIS
    Caver = http://www.co.carver.mn.us/GIS
    Dakota = http://www.co.dakota.mn.us/homeproperty/propertymaps/pages/default.aspx
    Hennepin = https://gis-hennepin.hub.arcgis.com/pages/open-data
    Ramsey = https://www.ramseycounty.us/your-government/open-government/research-data
    Scott = http://opendata.gis.co.scott.mn.us/
    Washington: http://www.co.washington.mn.us/index.aspx?NID=1606

  3. h

    AS-Core

    • huggingface.co
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenGVLab (2025). AS-Core [Dataset]. https://huggingface.co/datasets/OpenGVLab/AS-Core
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2025
    Dataset authored and provided by
    OpenGVLab
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    AS-Core

    AS-Core is the human-verified subset of AS-1B.

    semantic_tag_1m.json: the human verified annotations for semantic tags. region_vqa_1m.jsonl: the human verified annotations for region VQA. region_caption_400k.jsonl: the region captions generated base on paraphrasing the region question-answer pairs.

    NOTE: The bbox format is x1y1x2y2.

      Introduction
    

    We present the All-Seeing Project with: All-Seeing 1B (AS-1B) dataset: we propose a new large-scale dataset… See the full description on the dataset page: https://huggingface.co/datasets/OpenGVLab/AS-Core.

  4. h

    Dataset

    • huggingface.co
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erin La (2025). Dataset [Dataset]. https://huggingface.co/datasets/erinla/Dataset
    Explore at:
    Dataset updated
    Feb 7, 2025
    Authors
    Erin La
    Description

    erinla/Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    ER-dataset

    • huggingface.co
    Updated Aug 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RUC-DataLab (2022). ER-dataset [Dataset]. https://huggingface.co/datasets/RUC-DataLab/ER-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 29, 2022
    Dataset authored and provided by
    RUC-DataLab
    Description

    dataset-list

    The datasets in this dataset repository are from public datasets DeepMatcher,Magellan and WDC, which cover a variety of domains, such as product, citation and restaurant. Each dataset contains entities from two relational tables with multiple attributes, and a set of labeled matching/non-matching entity pairs.

    dataset_name domain

    abt_buy Product

    amazon_google Product

    anime Anime

    beer Product

    books2 Book

    books4 Book

    cameras WDC-Product

    computers… See the full description on the dataset page: https://huggingface.co/datasets/RUC-DataLab/ER-dataset.

  6. h

    dataset

    • huggingface.co
    Updated May 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miri (2024). dataset [Dataset]. https://huggingface.co/datasets/Rhma/dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 29, 2024
    Authors
    Miri
    Description

    Rhma/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    geo-img-dataset

    • huggingface.co
    Updated Mar 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    latt slatt (2025). geo-img-dataset [Dataset]. https://huggingface.co/datasets/latterworks/geo-img-dataset
    Explore at:
    Dataset updated
    Mar 19, 2025
    Authors
    latt slatt
    Description

    latterworks/geo-img-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    dataset

    • huggingface.co
    Updated Mar 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bac (2024). dataset [Dataset]. https://huggingface.co/datasets/Orenbac/dataset
    Explore at:
    Dataset updated
    Mar 10, 2024
    Authors
    Bac
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Orenbac/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    MapPool

    • huggingface.co
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raimund Schnürer (2024). MapPool [Dataset]. https://huggingface.co/datasets/sraimund/MapPool
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 22, 2024
    Authors
    Raimund Schnürer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MapPool - Bubbling up an extremely large corpus of maps for AI

    MapPool is a dataset of 75 million potential maps and textual captions. It has been derived from CommonPool, a dataset consisting of 12 billion text-image pairs from the Internet. The images have been encoded by a vision transformer and classified into maps and non-maps by a support vector machine. This approach outperforms previous models and yields a validation accuracy of 98.5%. The MapPool dataset may help to train… See the full description on the dataset page: https://huggingface.co/datasets/sraimund/MapPool.

  10. h

    classification-dataset

    • huggingface.co
    Updated Aug 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Jardinet (2024). classification-dataset [Dataset]. https://huggingface.co/datasets/thomas-jardinet/classification-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2024
    Authors
    Thomas Jardinet
    Description

    thomas-jardinet/classification-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    dataset

    • huggingface.co
    Updated Apr 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will Roberts (2023). dataset [Dataset]. https://huggingface.co/datasets/robertsw/dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 19, 2023
    Authors
    Will Roberts
    Description

    robertsw/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. multi-destination-trip-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Booking.com, multi-destination-trip-dataset [Dataset]. https://huggingface.co/datasets/Booking-com/multi-destination-trip-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Booking Holdings Inc.http://bookingholdings.com/
    Booking.comhttp://booking.com/
    Description

    Intro

    Booking.com provides a unique dataset based on millions of real anonymized bookings to encourage the research on sequential recommendation problems. Many travelers go on trips which include more than one destination. Our mission at Booking.com is to make it easier for everyone to experience the world, and we can help to do that by providing real-time recommendations for what their next in-trip destination will be. By making accurate predictions, we help deliver a frictionless… See the full description on the dataset page: https://huggingface.co/datasets/Booking-com/multi-destination-trip-dataset.

  13. h

    VLA-OS-Dataset

    • huggingface.co
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LinS Lab (2025). VLA-OS-Dataset [Dataset]. https://huggingface.co/datasets/Linslab/VLA-OS-Dataset
    Explore at:
    Dataset updated
    Jun 24, 2025
    Authors
    LinS Lab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card

    This is the training dataset used in the paper VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models.

      Source
    

    Project Page: https://nus-lins-lab.github.io/vlaos/ Paper: https://arxiv.org/abs/2506.17561 Code: https://github.com/HeegerGao/VLA-OS Model: https://huggingface.co/Linslab/VLA-OS

      Usage
    

    Ensure you have installed git lfs: curl -s… See the full description on the dataset page: https://huggingface.co/datasets/Linslab/VLA-OS-Dataset.

  14. h

    SpeakerVid-5M-Dataset

    • huggingface.co
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wang (2025). SpeakerVid-5M-Dataset [Dataset]. https://huggingface.co/datasets/dorni/SpeakerVid-5M-Dataset
    Explore at:
    Dataset updated
    Jul 24, 2025
    Authors
    wang
    Description

    Data Usage (download from hugging face)

    We provide separate list files for all data and SFT data. The all_data_list.json file contains the YouTube video IDs and the names of several clips obtained from the video segmentation (these names serve as unique identifiers and can be used to locate the corresponding annotations in the annotation folder). Every YouTube video ID specific to a single video on youtube.com, for example, you can access 8Hg_-5aUOYo through Link… See the full description on the dataset page: https://huggingface.co/datasets/dorni/SpeakerVid-5M-Dataset.

  15. h

    united-states-license-plate-dataset

    • huggingface.co
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). united-states-license-plate-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/united-states-license-plate-dataset
    Explore at:
    Dataset updated
    Jul 1, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Dataset of license plate recognition

    Dataset offers 89,986 images of vehicles featuring license plates from the USA, making it an excellent resource for tasks involving OCR (Optical Character Recognition), license plate identification, and vehicle registration data extraction. Each image is accompanied by a CSV file that provides the corresponding plate text and country code, ideal for developing and testing text recognition systems. With this dataset, researchers and developers can… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/united-states-license-plate-dataset.

  16. h

    dataset

    • huggingface.co
    Updated Jan 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    0G3NS3C (2024). dataset [Dataset]. https://huggingface.co/datasets/0G3NS3C/dataset
    Explore at:
    Dataset updated
    Jan 8, 2024
    Authors
    0G3NS3C
    Description

    0G3NS3C/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    Data from: dataset-1

    • huggingface.co
    Updated Dec 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Reisch (2024). dataset-1 [Dataset]. https://huggingface.co/datasets/luker42/dataset-1
    Explore at:
    Dataset updated
    Dec 6, 2024
    Authors
    Luke Reisch
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    luker42/dataset-1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    calculus-dataset

    • huggingface.co
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Di Zhang (2025). calculus-dataset [Dataset]. https://huggingface.co/datasets/di-zhang-fdu/calculus-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2025
    Authors
    Di Zhang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    di-zhang-fdu/calculus-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    colorized-dataset

    • huggingface.co
    Updated Dec 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZIJING XU (2023). colorized-dataset [Dataset]. https://huggingface.co/datasets/annyorange/colorized-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 12, 2023
    Authors
    ZIJING XU
    Description

    annyorange/colorized-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    dataset

    • huggingface.co
    Updated Aug 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    haloword (2024). dataset [Dataset]. https://huggingface.co/datasets/userprofile/dataset
    Explore at:
    Dataset updated
    Aug 11, 2024
    Authors
    haloword
    Description

    userprofile/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Multimodal Art Projection (2024). CMMMU [Dataset]. https://huggingface.co/datasets/m-a-p/CMMMU

CMMMU

m-a-p/CMMMU

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 23, 2024
Dataset authored and provided by
Multimodal Art Projection
Description

CMMMU

🌐 Homepage | 🤗 Paper | 📖 arXiv | 🤗 Dataset | GitHub

  Introduction

CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music… See the full description on the dataset page: https://huggingface.co/datasets/m-a-p/CMMMU.

Search
Clear search
Close search
Google apps
Main menu