cornuHGF/datacomp-medium-12m dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DataComp Medium Pool
This repository contains metadata files for the medium pool of DataComp. For details on how to use the metadata, please visit our website and our github repository. We distribute the image url-text samples and metadata under a standard Creative Common CC-BY-4.0 license. The individual images are under their own copyrights.
Terms and Conditions
We have terms of service that are similar to those adopted by HuggingFace… See the full description on the dataset page: https://huggingface.co/datasets/mlfoundations/datacomp_medium.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Introduction
This repository contains the data for Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources. Project page: https://victorwz.github.io/Open-Qwen2VL Code: https://github.com/Victorwz/Open-Qwen2VL
Dataset
ccs_ebdataset: CC3M-CC12M-SBU filtered by CLIP, we directly download the webdataset based on the released of curated subset of BLIP-1 datacomp_medium_dfn_webdataset: DataComp-Medium-128M filtered by DFN, we just… See the full description on the dataset page: https://huggingface.co/datasets/weizhiwang/Open-Qwen2VL-Data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
cornuHGF/datacomp-medium-12m dataset hosted on Hugging Face and contributed by the HF Datasets community