CMMMU
🌐 Homepage | 🤗 Paper | 📖 arXiv | 🤗 Dataset | GitHub
Introduction
CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music… See the full description on the dataset page: https://huggingface.co/datasets/m-a-p/CMMMU.
This dataset includes all 7 metro counties that have made their parcel data freely available without a license or fees.
This dataset is a compilation of tax parcel polygon and point layers assembled into a common coordinate system from Twin Cities, Minnesota metropolitan area counties. No attempt has been made to edgematch or rubbersheet between counties. A standard set of attribute fields is included for each county. The attributes are the same for the polygon and points layers. Not all attributes are populated for all counties.
NOTICE: The standard set of attributes changed to the MN Parcel Data Transfer Standard on 1/1/2019.
https://www.mngeo.state.mn.us/committee/standards/parcel_attrib/parcel_attrib.html
See section 5 of the metadata for an attribute summary.
Detailed information about the attributes can be found in the Metro Regional Parcel Attributes document.
The polygon layer contains one record for each real estate/tax parcel polygon within each county's parcel dataset. Some counties have polygons for each individual condominium, and others do not. (See Completeness in Section 2 of the metadata for more information.) The points layer includes the same attribute fields as the polygon dataset. The points are intended to provide information in situations where multiple tax parcels are represented by a single polygon. One primary example of this is the condominium, though some counties stacked polygons for condos. Condominiums, by definition, are legally owned as individual, taxed real estate units. Records for condominiums may not show up in the polygon dataset. The points for the point dataset often will be randomly placed or stacked within the parcel polygon with which they are associated.
The polygon layer is broken into individual county shape files. The points layer is provided as both individual county files and as one file for the entire metro area.
In many places a one-to-one relationship does not exist between these parcel polygons or points and the actual buildings or occupancy units that lie within them. There may be many buildings on one parcel and there may be many occupancy units (e.g. apartments, stores or offices) within each building. Additionally, no information exists within this dataset about residents of parcels. Parcel owner and taxpayer information exists for many, but not all counties.
This is a MetroGIS Regionally Endorsed dataset.
Additional information may be available from each county at the links listed below. Also, any questions or comments about suspected errors or omissions in this dataset can be addressed to the contact person at each individual county.
Anoka = http://www.anokacounty.us/315/GIS
Caver = http://www.co.carver.mn.us/GIS
Dakota = http://www.co.dakota.mn.us/homeproperty/propertymaps/pages/default.aspx
Hennepin = https://gis-hennepin.hub.arcgis.com/pages/open-data
Ramsey = https://www.ramseycounty.us/your-government/open-government/research-data
Scott = http://opendata.gis.co.scott.mn.us/
Washington: http://www.co.washington.mn.us/index.aspx?NID=1606
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
AS-Core
AS-Core is the human-verified subset of AS-1B.
semantic_tag_1m.json: the human verified annotations for semantic tags. region_vqa_1m.jsonl: the human verified annotations for region VQA. region_caption_400k.jsonl: the region captions generated base on paraphrasing the region question-answer pairs.
NOTE: The bbox format is x1y1x2y2.
Introduction
We present the All-Seeing Project with: All-Seeing 1B (AS-1B) dataset: we propose a new large-scale dataset… See the full description on the dataset page: https://huggingface.co/datasets/OpenGVLab/AS-Core.
dataset-list
The datasets in this dataset repository are from public datasets DeepMatcher,Magellan and WDC, which cover a variety of domains, such as product, citation and restaurant. Each dataset contains entities from two relational tables with multiple attributes, and a set of labeled matching/non-matching entity pairs.
dataset_name domain
abt_buy Product
amazon_google Product
anime Anime
beer Product
books2 Book
books4 Book
cameras WDC-Product
computers… See the full description on the dataset page: https://huggingface.co/datasets/RUC-DataLab/ER-dataset.
Rhma/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
latterworks/geo-img-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Orenbac/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MapPool - Bubbling up an extremely large corpus of maps for AI
MapPool is a dataset of 75 million potential maps and textual captions. It has been derived from CommonPool, a dataset consisting of 12 billion text-image pairs from the Internet. The images have been encoded by a vision transformer and classified into maps and non-maps by a support vector machine. This approach outperforms previous models and yields a validation accuracy of 98.5%. The MapPool dataset may help to train… See the full description on the dataset page: https://huggingface.co/datasets/sraimund/MapPool.
thomas-jardinet/classification-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
robertsw/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Intro
Booking.com provides a unique dataset based on millions of real anonymized bookings to encourage the research on sequential recommendation problems. Many travelers go on trips which include more than one destination. Our mission at Booking.com is to make it easier for everyone to experience the world, and we can help to do that by providing real-time recommendations for what their next in-trip destination will be. By making accurate predictions, we help deliver a frictionless… See the full description on the dataset page: https://huggingface.co/datasets/Booking-com/multi-destination-trip-dataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card
This is the training dataset used in the paper VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models.
Source
Project Page: https://nus-lins-lab.github.io/vlaos/ Paper: https://arxiv.org/abs/2506.17561 Code: https://github.com/HeegerGao/VLA-OS Model: https://huggingface.co/Linslab/VLA-OS
Usage
Ensure you have installed git lfs: curl -s… See the full description on the dataset page: https://huggingface.co/datasets/Linslab/VLA-OS-Dataset.
Data Usage (download from hugging face)
We provide separate list files for all data and SFT data. The all_data_list.json file contains the YouTube video IDs and the names of several clips obtained from the video segmentation (these names serve as unique identifiers and can be used to locate the corresponding annotations in the annotation folder). Every YouTube video ID specific to a single video on youtube.com, for example, you can access 8Hg_-5aUOYo through Link… See the full description on the dataset page: https://huggingface.co/datasets/dorni/SpeakerVid-5M-Dataset.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Dataset of license plate recognition
Dataset offers 89,986 images of vehicles featuring license plates from the USA, making it an excellent resource for tasks involving OCR (Optical Character Recognition), license plate identification, and vehicle registration data extraction. Each image is accompanied by a CSV file that provides the corresponding plate text and country code, ideal for developing and testing text recognition systems. With this dataset, researchers and developers can… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/united-states-license-plate-dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
luker42/dataset-1 dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
di-zhang-fdu/calculus-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
annyorange/colorized-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
userprofile/dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
CMMMU
🌐 Homepage | 🤗 Paper | 📖 arXiv | 🤗 Dataset | GitHub
Introduction
CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music… See the full description on the dataset page: https://huggingface.co/datasets/m-a-p/CMMMU.