6 datasets found

h
Cambrian-10M
huggingface.co
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NYU VisionX (2024). Cambrian-10M [Dataset]. https://huggingface.co/datasets/nyu-visionx/Cambrian-10M
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 25, 2024
Dataset authored and provided by
NYU VisionX
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Cambrian-10M Dataset

Please see paper & website for more information:

https://cambrian-mllm.github.io/ https://arxiv.org/abs/2406.16860

Overview

Cambrian-10M is a comprehensive dataset designed for instruction tuning, particularly in multimodal settings involving visual interaction data. The dataset is crafted to address the scarcity of high-quality multimodal instruction-tuning data and to maintain the language abilities of multimodal large language models (LLMs).… See the full description on the dataset page: https://huggingface.co/datasets/nyu-visionx/Cambrian-10M.
h
Cambrian-Alignment
huggingface.co
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cambrian-Alignment [Dataset]. https://huggingface.co/datasets/nyu-visionx/Cambrian-Alignment
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 25, 2024
Dataset authored and provided by
NYU VisionX
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Cambrian-Alignment Dataset

Please see paper & website for more information:

https://cambrian-mllm.github.io/ https://arxiv.org/abs/2406.16860

Overview

Cambrian-Alignment is an question-answering alignment dataset comprised of alignment data from LLaVA, Mini-Gemini, Allava, and ShareGPT4V.

Getting Started with Cambrian Alignment Data

Before you start, ensure you have sufficient storage space to download and process the data.

Download the Data Repository… See the full description on the dataset page: https://huggingface.co/datasets/nyu-visionx/Cambrian-Alignment.
h
CV-Bench
huggingface.co
Updated Jul 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NYU VisionX (2024). CV-Bench [Dataset]. https://huggingface.co/datasets/nyu-visionx/CV-Bench
Explore at:
Dataset updated
Jul 7, 2024
Dataset authored and provided by
NYU VisionX
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Cambrian Vision-Centric Benchmark (CV-Bench)

This repository contains the Cambrian Vision-Centric Benchmark (CV-Bench), introduced in Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.

Files

The test*.parquet files contain the dataset annotations and images pre-loaded for processing with HF Datasets. These can be loaded in 3 different configurations using… See the full description on the dataset page: https://huggingface.co/datasets/nyu-visionx/CV-Bench.
Recharge across the Cambrian Limestone Aquifer
researchdata.edu.au
Updated Jul 28, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2020). Recharge across the Cambrian Limestone Aquifer [Dataset]. https://researchdata.edu.au/recharge-cambrian-limestone-aquifer/2980282
Explore at:
Dataset updated
Jul 28, 2020
Dataset provided by
Data.govhttps://data.gov/
Authors
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The dataset was derived by the Geological and Bioregional Assessment Program from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. This is an initial dataset that was published for peer review on 21/07/2020 and will be finalised when the journal paper is revised and accepted for publication. This dataset contains the inputs, outputs and code used to estimate recharge across the extent of the Cambrian Limestone Aquifer. The method is described in a journal paper:

Crosbie and Rachakonda (2020) Constraining probabilistic chloride mass balance recharge estimates using baseflow and remotely sensed evapotranspiration: The Cambrian Limestone Aquifer northern Australia. Submitted to Hydrogeology Journal.

A copy of this draft journal paper is included in the dataset.

Attribution

Geological and Bioregional Assessment Program

History

This dataset contains the inputs, outputs and code used to estimate recharge across the extent of the Cambrian Limestone Aquifer. The method is described in a journal paper:\r \r Crosbie and Rachakonda (2020) Constraining probabilistic chloride mass balance recharge estimates using baseflow and remotely sensed evapotranspiration: The Cambrian Limestone Aquifer northern Australia. Submitted to Hydrogeology Journal\r \r A copy of this draft journal paper is included in the dataset.
h
Oasis
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Letian Zhang, Oasis [Dataset]. https://huggingface.co/datasets/Letian2003/Oasis
Explore at:
Authors
Letian Zhang
Description
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis

This dataset contains Oasis-500k dataset. [Read the Paper] | [Github Repo]

All images come from Cambrian-10M. Instructions and responses are generated by MLLM.
h
FUSION-Finetune-12M
huggingface.co
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zheng Liu (2025). FUSION-Finetune-12M [Dataset]. https://huggingface.co/datasets/starriver030515/FUSION-Finetune-12M
Explore at:
Dataset updated
Apr 8, 2025
Authors
Zheng Liu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
FUSION-12M Dataset

Please see paper & website for more information:

https://arxiv.org/abs/2504.09925 https://github.com/starriver030515/FUSION

Overview

FUSION-12M is a large-scale, diverse multimodal instruction-tuning dataset used to train FUSION-3B and FUSION-8B models. It builds upon Cambrian-1 by significantly expanding both the quantity and variety of data, particularly in areas such as OCR, mathematical reasoning, and synthetic high-quality Q&A data. The goal is… See the full description on the dataset page: https://huggingface.co/datasets/starriver030515/FUSION-Finetune-12M.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

NYU VisionX (2024). Cambrian-10M [Dataset]. https://huggingface.co/datasets/nyu-visionx/Cambrian-10M

Cambrian-10M

nyu-visionx/Cambrian-10M

Explore at:

7 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 25, 2024

Dataset authored and provided by

NYU VisionX

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Cambrian-10M Dataset

Please see paper & website for more information:

https://cambrian-mllm.github.io/ https://arxiv.org/abs/2406.16860

  Overview

Cambrian-10M is a comprehensive dataset designed for instruction tuning, particularly in multimodal settings involving visual interaction data. The dataset is crafted to address the scarcity of high-quality multimodal instruction-tuning data and to maintain the language abilities of multimodal large language models (LLMs).… See the full description on the dataset page: https://huggingface.co/datasets/nyu-visionx/Cambrian-10M.

Clear search

Close search

Google apps

Main menu

Cambrian-10M

Cambrian-Alignment

CV-Bench

Recharge across the Cambrian Limestone Aquifer

Abstract

Attribution

History

Oasis

FUSION-Finetune-12M

Cambrian-10M

nyu-visionx/Cambrian-10M