100+ datasets found

h
CMMMU
huggingface.co
Updated Jan 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Multimodal Art Projection (2024). CMMMU [Dataset]. https://huggingface.co/datasets/m-a-p/CMMMU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 23, 2024
Dataset authored and provided by
Multimodal Art Projection
Description
CMMMU

🌐 Homepage | 🤗 Paper | 📖 arXiv | 🤗 Dataset | GitHub

Introduction

CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music… See the full description on the dataset page: https://huggingface.co/datasets/m-a-p/CMMMU.
M
Metro Regional Parcel Dataset - (Updated Quarterly)
gisdata.mn.gov
ags_mapserver, fgdb +4
Updated Apr 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MetroGIS (2025). Metro Regional Parcel Dataset - (Updated Quarterly) [Dataset]. https://gisdata.mn.gov/dataset/us-mn-state-metrogis-plan-regional-parcels
Explore at:
fgdb, gpkg, html, shp, jpeg, ags_mapserverAvailable download formats
Dataset updated
Apr 19, 2025
Dataset provided by
MetroGIS
Description
This dataset includes all 7 metro counties that have made their parcel data freely available without a license or fees.

This dataset is a compilation of tax parcel polygon and point layers assembled into a common coordinate system from Twin Cities, Minnesota metropolitan area counties. No attempt has been made to edgematch or rubbersheet between counties. A standard set of attribute fields is included for each county. The attributes are the same for the polygon and points layers. Not all attributes are populated for all counties.

NOTICE: The standard set of attributes changed to the MN Parcel Data Transfer Standard on 1/1/2019.
https://www.mngeo.state.mn.us/committee/standards/parcel_attrib/parcel_attrib.html

See section 5 of the metadata for an attribute summary.

Detailed information about the attributes can be found in the Metro Regional Parcel Attributes document.

The polygon layer contains one record for each real estate/tax parcel polygon within each county's parcel dataset. Some counties have polygons for each individual condominium, and others do not. (See Completeness in Section 2 of the metadata for more information.) The points layer includes the same attribute fields as the polygon dataset. The points are intended to provide information in situations where multiple tax parcels are represented by a single polygon. One primary example of this is the condominium, though some counties stacked polygons for condos. Condominiums, by definition, are legally owned as individual, taxed real estate units. Records for condominiums may not show up in the polygon dataset. The points for the point dataset often will be randomly placed or stacked within the parcel polygon with which they are associated.

The polygon layer is broken into individual county shape files. The points layer is provided as both individual county files and as one file for the entire metro area.

In many places a one-to-one relationship does not exist between these parcel polygons or points and the actual buildings or occupancy units that lie within them. There may be many buildings on one parcel and there may be many occupancy units (e.g. apartments, stores or offices) within each building. Additionally, no information exists within this dataset about residents of parcels. Parcel owner and taxpayer information exists for many, but not all counties.

This is a MetroGIS Regionally Endorsed dataset.

Additional information may be available from each county at the links listed below. Also, any questions or comments about suspected errors or omissions in this dataset can be addressed to the contact person at each individual county.

Anoka = http://www.anokacounty.us/315/GIS
Caver = http://www.co.carver.mn.us/GIS
Dakota = http://www.co.dakota.mn.us/homeproperty/propertymaps/pages/default.aspx
Hennepin = https://gis-hennepin.hub.arcgis.com/pages/open-data
Ramsey = https://www.ramseycounty.us/your-government/open-government/research-data
Scott = http://opendata.gis.co.scott.mn.us/
Washington: http://www.co.washington.mn.us/index.aspx?NID=1606
h
AS-Core
huggingface.co
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenGVLab (2025). AS-Core [Dataset]. https://huggingface.co/datasets/OpenGVLab/AS-Core
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2025
Dataset authored and provided by
OpenGVLab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
AS-Core

AS-Core is the human-verified subset of AS-1B.

semantic_tag_1m.json: the human verified annotations for semantic tags. region_vqa_1m.jsonl: the human verified annotations for region VQA. region_caption_400k.jsonl: the region captions generated base on paraphrasing the region question-answer pairs.

NOTE: The bbox format is x1y1x2y2.

Introduction

We present the All-Seeing Project with: All-Seeing 1B (AS-1B) dataset: we propose a new large-scale dataset… See the full description on the dataset page: https://huggingface.co/datasets/OpenGVLab/AS-Core.
h
Dataset
huggingface.co
Updated Feb 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erin La (2025). Dataset [Dataset]. https://huggingface.co/datasets/erinla/Dataset
Explore at:
Dataset updated
Feb 7, 2025
Authors
Erin La
Description
erinla/Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ER-dataset
huggingface.co
Updated Aug 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RUC-DataLab (2022). ER-dataset [Dataset]. https://huggingface.co/datasets/RUC-DataLab/ER-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 29, 2022
Dataset authored and provided by
RUC-DataLab
Description
dataset-list

The datasets in this dataset repository are from public datasets DeepMatcher,Magellan and WDC, which cover a variety of domains, such as product, citation and restaurant. Each dataset contains entities from two relational tables with multiple attributes, and a set of labeled matching/non-matching entity pairs.

dataset_name domain

abt_buy Product

amazon_google Product

anime Anime

beer Product

books2 Book

books4 Book

cameras WDC-Product

computers… See the full description on the dataset page: https://huggingface.co/datasets/RUC-DataLab/ER-dataset.
h
dataset
huggingface.co
Updated May 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miri (2024). dataset [Dataset]. https://huggingface.co/datasets/Rhma/dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 29, 2024
Authors
Miri
Description
Rhma/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
geo-img-dataset
huggingface.co
Updated Mar 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
latt slatt (2025). geo-img-dataset [Dataset]. https://huggingface.co/datasets/latterworks/geo-img-dataset
Explore at:
Dataset updated
Mar 19, 2025
Authors
latt slatt
Description
latterworks/geo-img-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
dataset
huggingface.co
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bac (2024). dataset [Dataset]. https://huggingface.co/datasets/Orenbac/dataset
Explore at:
Dataset updated
Mar 10, 2024
Authors
Bac
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Orenbac/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
MapPool
huggingface.co
Updated May 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raimund Schnürer (2024). MapPool [Dataset]. https://huggingface.co/datasets/sraimund/MapPool
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 22, 2024
Authors
Raimund Schnürer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MapPool - Bubbling up an extremely large corpus of maps for AI

MapPool is a dataset of 75 million potential maps and textual captions. It has been derived from CommonPool, a dataset consisting of 12 billion text-image pairs from the Internet. The images have been encoded by a vision transformer and classified into maps and non-maps by a support vector machine. This approach outperforms previous models and yields a validation accuracy of 98.5%. The MapPool dataset may help to train… See the full description on the dataset page: https://huggingface.co/datasets/sraimund/MapPool.
h
classification-dataset
huggingface.co
Updated Aug 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Jardinet (2024). classification-dataset [Dataset]. https://huggingface.co/datasets/thomas-jardinet/classification-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2024
Authors
Thomas Jardinet
Description
thomas-jardinet/classification-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
dataset
huggingface.co
Updated Apr 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Will Roberts (2023). dataset [Dataset]. https://huggingface.co/datasets/robertsw/dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 19, 2023
Authors
Will Roberts
Description
robertsw/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
multi-destination-trip-dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Booking.com, multi-destination-trip-dataset [Dataset]. https://huggingface.co/datasets/Booking-com/multi-destination-trip-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Booking Holdings Inc.http://bookingholdings.com/
Booking.comhttp://booking.com/
Description
Intro

Booking.com provides a unique dataset based on millions of real anonymized bookings to encourage the research on sequential recommendation problems. Many travelers go on trips which include more than one destination. Our mission at Booking.com is to make it easier for everyone to experience the world, and we can help to do that by providing real-time recommendations for what their next in-trip destination will be. By making accurate predictions, we help deliver a frictionless… See the full description on the dataset page: https://huggingface.co/datasets/Booking-com/multi-destination-trip-dataset.
h
VLA-OS-Dataset
huggingface.co
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LinS Lab (2025). VLA-OS-Dataset [Dataset]. https://huggingface.co/datasets/Linslab/VLA-OS-Dataset
Explore at:
Dataset updated
Jun 24, 2025
Authors
LinS Lab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card

This is the training dataset used in the paper VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models.

Source

Project Page: https://nus-lins-lab.github.io/vlaos/ Paper: https://arxiv.org/abs/2506.17561 Code: https://github.com/HeegerGao/VLA-OS Model: https://huggingface.co/Linslab/VLA-OS

Usage

Ensure you have installed git lfs: curl -s… See the full description on the dataset page: https://huggingface.co/datasets/Linslab/VLA-OS-Dataset.
h
SpeakerVid-5M-Dataset
huggingface.co
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wang (2025). SpeakerVid-5M-Dataset [Dataset]. https://huggingface.co/datasets/dorni/SpeakerVid-5M-Dataset
Explore at:
Dataset updated
Jul 24, 2025
Authors
wang
Description
Data Usage (download from hugging face)

We provide separate list files for all data and SFT data. The all_data_list.json file contains the YouTube video IDs and the names of several clips obtained from the video segmentation (these names serve as unique identifiers and can be used to locate the corresponding annotations in the annotation folder). Every YouTube video ID specific to a single video on youtube.com, for example, you can access 8Hg_-5aUOYo through Link… See the full description on the dataset page: https://huggingface.co/datasets/dorni/SpeakerVid-5M-Dataset.
h
united-states-license-plate-dataset
huggingface.co
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata (2025). united-states-license-plate-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/united-states-license-plate-dataset
Explore at:
Dataset updated
Jul 1, 2025
Authors
Unidata
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Area covered
United States
Description
Dataset of license plate recognition

Dataset offers 89,986 images of vehicles featuring license plates from the USA, making it an excellent resource for tasks involving OCR (Optical Character Recognition), license plate identification, and vehicle registration data extraction. Each image is accompanied by a CSV file that provides the corresponding plate text and country code, ideal for developing and testing text recognition systems. With this dataset, researchers and developers can… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/united-states-license-plate-dataset.
h
dataset
huggingface.co
Updated Jan 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
0G3NS3C (2024). dataset [Dataset]. https://huggingface.co/datasets/0G3NS3C/dataset
Explore at:
Dataset updated
Jan 8, 2024
Authors
0G3NS3C
Description
0G3NS3C/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Data from: dataset-1
huggingface.co
Updated Dec 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Reisch (2024). dataset-1 [Dataset]. https://huggingface.co/datasets/luker42/dataset-1
Explore at:
Dataset updated
Dec 6, 2024
Authors
Luke Reisch
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
luker42/dataset-1 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
calculus-dataset
huggingface.co
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Di Zhang (2025). calculus-dataset [Dataset]. https://huggingface.co/datasets/di-zhang-fdu/calculus-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 23, 2025
Authors
Di Zhang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
di-zhang-fdu/calculus-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
colorized-dataset
huggingface.co
Updated Dec 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZIJING XU (2023). colorized-dataset [Dataset]. https://huggingface.co/datasets/annyorange/colorized-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 12, 2023
Authors
ZIJING XU
Description
annyorange/colorized-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
dataset
huggingface.co
Updated Aug 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
haloword (2024). dataset [Dataset]. https://huggingface.co/datasets/userprofile/dataset
Explore at:
Dataset updated
Aug 11, 2024
Authors
haloword
Description
userprofile/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Multimodal Art Projection (2024). CMMMU [Dataset]. https://huggingface.co/datasets/m-a-p/CMMMU

CMMMU

m-a-p/CMMMU

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 23, 2024

Dataset authored and provided by

Multimodal Art Projection

Description

CMMMU

🌐 Homepage | 🤗 Paper | 📖 arXiv | 🤗 Dataset | GitHub

  Introduction

CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music… See the full description on the dataset page: https://huggingface.co/datasets/m-a-p/CMMMU.

Clear search

Close search

Google apps

Main menu

CMMMU

Metro Regional Parcel Dataset - (Updated Quarterly)

AS-Core

Dataset

ER-dataset

dataset

geo-img-dataset

dataset

MapPool

classification-dataset

dataset

multi-destination-trip-dataset

VLA-OS-Dataset

SpeakerVid-5M-Dataset

united-states-license-plate-dataset

dataset

Data from: dataset-1

calculus-dataset

colorized-dataset

dataset

CMMMU

m-a-p/CMMMU