5 datasets found
  1. h

    YFCC15M

    • huggingface.co
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang (2024). YFCC15M [Dataset]. https://huggingface.co/datasets/Kaichengalex/YFCC15M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2024
    Authors
    Yang
    Description

    YFCC15M Recaption Dataset

    This YFCC15M Dataset is filtered by DeCLIP and recaptioned utilize the diverse description generation framework proposed in RWKV-CLIP. The text is a list of text tokens with a length of 77, encoded using the CLIP tokenizer. You can use from clip.simple_tokenizer import SimpleTokenizer as _Tokenizer to decode it back into the original text.

      Using Dataset
    

    You can easily download and use the arxiver dataset with Hugging Face's datasets library.… See the full description on the dataset page: https://huggingface.co/datasets/Kaichengalex/YFCC15M.

  2. t

    YFCC15M-V1 - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). YFCC15M-V1 - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/yfcc15m-v1
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.

  3. h

    Vietnamese-yfcc15m-OpenAICLIP

    • huggingface.co
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fifth Civil Defender - 5CD (2025). Vietnamese-yfcc15m-OpenAICLIP [Dataset]. https://huggingface.co/datasets/5CD-AI/Vietnamese-yfcc15m-OpenAICLIP
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2025
    Dataset authored and provided by
    Fifth Civil Defender - 5CD
    Description

    5CD-AI/Vietnamese-yfcc15m-OpenAICLIP dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    FLAME-ReCap-YFCC15M-MiniCPM-Llama3-V-2_5

    • huggingface.co
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anjia Cao (2025). FLAME-ReCap-YFCC15M-MiniCPM-Llama3-V-2_5 [Dataset]. https://huggingface.co/datasets/caj/FLAME-ReCap-YFCC15M-MiniCPM-Llama3-V-2_5
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2025
    Authors
    Anjia Cao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset description

    Recaptioned YFCC15M by MiniCPM-Llama3-V-2_5.

      Uses
    

    See https://github.com/MIV-XJTU/FLAME.

      Citation
    

    @article{cao2024flame, title={FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training}, author={Cao, Anjia and Wei, Xing and Ma, Zhiheng}, journal={arXiv preprint arXiv:2411.11927}, year={2024} }

    @article{yao2024minicpmv, title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone}, author={Yao, Yuan… See the full description on the dataset page: https://huggingface.co/datasets/caj/FLAME-ReCap-YFCC15M-MiniCPM-Llama3-V-2_5.

  5. h

    cc15m_yfcc15m

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YX C, cc15m_yfcc15m [Dataset]. https://huggingface.co/datasets/yxchng/cc15m_yfcc15m
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    YX C
    Description

    yxchng/cc15m_yfcc15m dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yang (2024). YFCC15M [Dataset]. https://huggingface.co/datasets/Kaichengalex/YFCC15M

YFCC15M

Kaichengalex/YFCC15M

Explore at:
113 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 25, 2024
Authors
Yang
Description

YFCC15M Recaption Dataset

This YFCC15M Dataset is filtered by DeCLIP and recaptioned utilize the diverse description generation framework proposed in RWKV-CLIP. The text is a list of text tokens with a length of 77, encoded using the CLIP tokenizer. You can use from clip.simple_tokenizer import SimpleTokenizer as _Tokenizer to decode it back into the original text.

  Using Dataset

You can easily download and use the arxiver dataset with Hugging Face's datasets library.… See the full description on the dataset page: https://huggingface.co/datasets/Kaichengalex/YFCC15M.

Search
Clear search
Close search
Google apps
Main menu