26 datasets found
  1. LAION-5B

    • opendatalab.com
    • aidatahub.io
    zip
    Updated Mar 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LAION (2022). LAION-5B [Dataset]. https://opendatalab.com/OpenDataLab/LAION-5B
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 31, 2022
    Dataset provided by
    LAIONhttps://laion.ai/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LAION 5B, a large-scale dataset for research purposes consisting of 5,85B CLIP-filtered image-text pairs. 2,3B contain English language, 2,2B samples from 100+ other languages and 1B samples have texts that do not allow a certain language assignment (e.g. names ). Additionally, we provide several nearest neighbor indices, an improved web interface for exploration & subset creation as well as detection scores for watermark and NSFW. The OpenDataLab website provides processed parquet files, which researchers can download, and download the corresponding image files according to this metadata. Now, we have also open sourced the LAION-5B image download code, the github open source address is as follows: https://github.com/opendatalab/laion5b-downloader

  2. relaion2B-en-research-safe

    • huggingface.co
    Updated Oct 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LAION eV (2025). relaion2B-en-research-safe [Dataset]. https://huggingface.co/datasets/laion/relaion2B-en-research-safe
    Explore at:
    Dataset updated
    Oct 20, 2025
    Dataset provided by
    LAIONhttps://laion.ai/
    Authors
    LAION eV
    Description

    laion/relaion2B-en-research-safe dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. relaion2B-en-research

    • huggingface.co
    Updated Oct 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LAION eV (2025). relaion2B-en-research [Dataset]. https://huggingface.co/datasets/laion/relaion2B-en-research
    Explore at:
    Dataset updated
    Oct 20, 2025
    Dataset provided by
    LAIONhttps://laion.ai/
    Authors
    LAION eV
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    laion/relaion2B-en-research dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    laion-high-resolution-chinese

    • huggingface.co
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wangjunjie (2023). laion-high-resolution-chinese [Dataset]. https://huggingface.co/datasets/wanng/laion-high-resolution-chinese
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2023
    Authors
    wangjunjie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    laion-high-resolution-chinese

      简介 Brief Introduction
    

    取自Laion5B-high-resolution多语言多模态数据集中的中文部分,一共2.66M个图文对。 A subset from Laion5B-high-resolution (a multimodal dataset), around 2.66M image-text pairs (only Chinese).

      数据集信息 Dataset Information
    

    大约一共2.66M个中文图文对。大约占用381MB空间(仅仅是url等文本信息,不包含图片)。

    Homepage: laion-5b Huggingface: laion/laion-high-resolution

      下载 Download
    

    mkdir release && cd release for i in {00000..00015}; do wget… See the full description on the dataset page: https://huggingface.co/datasets/wanng/laion-high-resolution-chinese.

  5. LAION-5B Dataset for SDE

    • kaggle.com
    zip
    Updated Feb 18, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sri Ram K S (2026). LAION-5B Dataset for SDE [Dataset]. https://www.kaggle.com/datasets/sriram2005/laion-5b-dataset-for-sde
    Explore at:
    zip(51816203 bytes)Available download formats
    Dataset updated
    Feb 18, 2026
    Authors
    Sri Ram K S
    Description

    Dataset

    This dataset was created by Sri Ram K S

    Contents

  6. LAION-5B Dataset for CFG

    • kaggle.com
    zip
    Updated Feb 18, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sri Ram K S (2026). LAION-5B Dataset for CFG [Dataset]. https://www.kaggle.com/datasets/sriram2005/laion-5b-dataset-for-cfg
    Explore at:
    zip(51816203 bytes)Available download formats
    Dataset updated
    Feb 18, 2026
    Authors
    Sri Ram K S
    Description

    Dataset

    This dataset was created by Sri Ram K S

    Contents

  7. laion_2b_en_prompts_1

    • kaggle.com
    zip
    Updated Mar 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajkumar Lakshmanamoorthy (2023). laion_2b_en_prompts_1 [Dataset]. https://www.kaggle.com/datasets/rajkumarl/laion-2b-en-prompts-1
    Explore at:
    zip(2620921699 bytes)Available download formats
    Dataset updated
    Mar 29, 2023
    Authors
    Rajkumar Lakshmanamoorthy
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Stable Diffusion 2 was trained on LAION-2B-en dataset which contains pairs of images and english-language prompts. Stable Diffusion 2 model can generate images given prompts. This dataset contains the prompts that are used by Stable Diffusion 2 training. This is a subset of LAION-5B dataset published in "**LAION-5B: An open large-scale dataset for training next generation image-text models**" (https://arxiv.org/abs/2210.08402) by Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, Jenia Jitsev in 2022.

    The idea behind this Kaggle dataset is one can generate corresponding images using the pretrained model available.

  8. filtered-wit

    • huggingface.co
    Updated Feb 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LAION eV (2021). filtered-wit [Dataset]. https://huggingface.co/datasets/laion/filtered-wit
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 26, 2021
    Dataset provided by
    LAIONhttps://laion.ai/
    Authors
    LAION eV
    Description

    Filtered WIT, an Image-Text Dataset.

    A reliable Dataset to run Image-Text models. You can find WIT, Wikipedia Image Text Dataset, here Data was taken from dalle-mini/wit

      Author
    

    Aarush Katta

      Data Structure
    

    The data is stored as tars, containing 10,000 samples per tar. The parquets contain the metadata of each tar, which was crated using this script Each tar contains a .jpg, .txt, and .json. The image is stored in .jpg, the caption in .txt. and the metadata in… See the full description on the dataset page: https://huggingface.co/datasets/laion/filtered-wit.

  9. LAION 5B 6.5 Aesthetic 256x256 TFRecords

    • kaggle.com
    zip
    Updated Feb 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CookieMonsterYum (2023). LAION 5B 6.5 Aesthetic 256x256 TFRecords [Dataset]. https://www.kaggle.com/datasets/cookiemonsteryum/laion-5b-65-aesthetic-256x256-tfrecords
    Explore at:
    zip(6156580388 bytes)Available download formats
    Dataset updated
    Feb 14, 2023
    Authors
    CookieMonsterYum
    Description

    Dataset

    This dataset was created by CookieMonsterYum

    Contents

  10. h

    LAION-EO

    • huggingface.co
    Updated Sep 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikolaj Czerkawski (2023). LAION-EO [Dataset]. https://huggingface.co/datasets/mikonvergence/LAION-EO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 27, 2023
    Authors
    Mikolaj Czerkawski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for LAION-EO

      Dataset Summary
    

    This dataset contains a subset of LAION-5B containing images that are likely to be satellite images. The procedure of acquiring and filtering the dataset has been described in https://arxiv.org/abs/2309.15535.

    Version Number of Samples

    0 24,933

    1 112,985

      Dataset Structure
    

    Each version of the dataset contains a .csv file with metadata with urls to images, which can be easily filtered. Note that the… See the full description on the dataset page: https://huggingface.co/datasets/mikonvergence/LAION-EO.

  11. h

    laion-subset

    • huggingface.co
    Updated Sep 16, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mercity (2008). laion-subset [Dataset]. https://huggingface.co/datasets/Mercity/laion-subset
    Explore at:
    Dataset updated
    Sep 16, 2008
    Dataset authored and provided by
    Mercity
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    LAION Subset for LCM-LoRA Training

    Author: Juhi Singh | HuggingFace

      Dataset Description
    

    This dataset is a curated subset of LAION-5B used for training Latent Consistency Model (LCM) LoRA adapters on Stable Diffusion v1.5. It contains high-quality image-caption pairs optimized for distillation training.

      Dataset Details
    

    Total Examples: 24,840 image-caption pairs Dataset Size: ~4.16 GB Resolution: 512×512 Format: Parquet files with image and caption columns… See the full description on the dataset page: https://huggingface.co/datasets/Mercity/laion-subset.

  12. h

    laion2B-multi-chinese-subset

    • huggingface.co
    Updated Sep 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fengshenbang-LM (2022). laion2B-multi-chinese-subset [Dataset]. https://huggingface.co/datasets/IDEA-CCNL/laion2B-multi-chinese-subset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 7, 2022
    Dataset authored and provided by
    Fengshenbang-LM
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    laion2B-multi-chinese-subset

    Github: Fengshenbang-LM Docs: Fengshenbang-Docs

      简介 Brief Introduction
    

    取自Laion2B多语言多模态数据集中的中文部分,一共143M个图文对。 A subset from Laion2B (a multimodal dataset), around 143M image-text pairs (only Chinese).

      数据集信息 Dataset Information
    

    大约一共143M个中文图文对。大约占用19GB空间(仅仅是url等文本信息,不包含图片)。

    Homepage: laion-5b Huggingface: laion/laion2B-multi

      下载 Download
    

    mkdir laion2b_chinese_release && cd laion2b_chinese_release for i in {00000..00012}; do… See the full description on the dataset page: https://huggingface.co/datasets/IDEA-CCNL/laion2B-multi-chinese-subset.

  13. LAION-5B For TCD Algorithm

    • kaggle.com
    zip
    Updated Feb 18, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sri Ram K S (2026). LAION-5B For TCD Algorithm [Dataset]. https://www.kaggle.com/datasets/sriram2005/laion-5b-for-tcd-algorithm
    Explore at:
    zip(51816203 bytes)Available download formats
    Dataset updated
    Feb 18, 2026
    Authors
    Sri Ram K S
    Description

    Dataset

    This dataset was created by Sri Ram K S

    Contents

  14. h

    the-un-laion-temple

    • huggingface.co
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeSULT (Chiharu Edition) (2024). the-un-laion-temple [Dataset]. https://huggingface.co/datasets/DSULT-Chiharu/the-un-laion-temple
    Explore at:
    Dataset updated
    Jul 4, 2024
    Dataset authored and provided by
    DeSULT (Chiharu Edition)
    License

    https://choosealicense.com/licenses/wtfpl/https://choosealicense.com/licenses/wtfpl/

    Description

    All files uploaded. Enjoy!

      Dataset Card for The Unlaion Temple
    
    
    
    
    
    
      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Laion-5B is still not public, so we decided to create our own dataset. The Unlaion Temple is a raw dataset of CommonCrawl images (Estimated to be a total of 2 Billion urls). We haven't verified whether the links in this dataset are functional. You are responsible for handling the data. We've made some improvements to the dataset based on user feedback:

    All… See the full description on the dataset page: https://huggingface.co/datasets/DSULT-Chiharu/the-un-laion-temple.

  15. LAION-5B for LSD

    • kaggle.com
    zip
    Updated Feb 19, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sri Ram K S (2026). LAION-5B for LSD [Dataset]. https://www.kaggle.com/datasets/sriram2005/laion-5b-for-lsd
    Explore at:
    zip(51816203 bytes)Available download formats
    Dataset updated
    Feb 19, 2026
    Authors
    Sri Ram K S
    Description

    Dataset

    This dataset was created by Sri Ram K S

    Contents

  16. Z

    VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amato, Giuseppe; Bolettieri, Paolo; Fabio Carrara; Fabrizio Falchi; Claudio Gennaro; Nicola Messina; Lucia Vadicamo; Claudio Vairo (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from VBSLHE Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10013328
    Explore at:
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    CNR-ISTI
    Authors
    Amato, Giuseppe; Bolettieri, Paolo; Fabio Carrara; Fabrizio Falchi; Claudio Gennaro; Nicola Messina; Lucia Vadicamo; Claudio Vairo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains a diverse set of features extracted from the VBSLHE dataset (laparoscopic gynecology) . These features will be utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] in the next editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

    We used a snapshot of the dataset provided by the Medical University of Vienna and Toronto that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 75 video files. We divided each video into video shots with a maximum duration of 5 seconds.

    This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

    @inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

    This repository (v2) comprises the following files:

    msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

    extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original VBSLHE videos available.

    features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

    features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

    features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

    features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the video segments.

    objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

    objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.

    objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.

    *Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

    *Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

    "object_class_names": vector with the class name of each detected object.

    "object_scores": scores corresponding to each detected object.

    "object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

    †Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the VBSLHE dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

    We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

    References:

    [Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

    [Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

    [Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

    [He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

    [Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

    [Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

    [Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

    [Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

    [Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

    [Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV

  17. LAION-5B (Image + Text Dataset)

    • kaggle.com
    zip
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prince Rajak (2025). LAION-5B (Image + Text Dataset) [Dataset]. https://www.kaggle.com/datasets/prince7489/laion-5b-image-text-dataset
    Explore at:
    zip(35728 bytes)Available download formats
    Dataset updated
    Oct 3, 2025
    Authors
    Prince Rajak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    TrendVerse-2025 is a unified, multi-domain dataset combining images with captions, text entries, tabular data, and geospatial information. With 1,000 records, it is ideal for AI/ML training, data analysis, visualization, and educational projects, providing a ready-to-use resource that reflects 2025 trends across multiple domains.

  18. Z

    VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Feb 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Amato (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from V3C1+V3C2 Dataset [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_8188569
    Explore at:
    Dataset updated
    Feb 12, 2024
    Dataset provided by
    Claudio Vairo
    Lucia Vadicamo
    Claudio Gennaro
    Nicola Messina
    Giuseppe Amato
    Paolo Bolettieri
    Fabrizio Falchi
    Fabio Carrara
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains a diverse set of features extracted from the V3C1+V3C2 dataset, sourced from the Vimeo Creative Commons Collection. These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

    The original V3C1+V3C2 dataset, provided by NIST, can be downloaded using the instructions provided at https://videobrowsershowdown.org/about-vbs/existing-data-and-tools/.

    It comprises 7,235 video files, amounting for 2,300h of video content and encompassing 2,508,113 predefined video segments.

    We subdivided the predefined video segments longer than 10 seconds into multiple segments, with each segment spanning no longer than 16 seconds. As a result, we obtained a total of 2,648,219 segments. For each segment, we extracted one frame, specifically the middle one, and computed several features, which are described in detail below.

    This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

    @inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

    This repository comprises the following files:

    msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

    extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original V3C videos available.

    features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

    features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

    features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

    features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the video segments. In particular 1) we concatenate consecutive short segments so to create segments at least 3 seconds long; 2) we downsample the obtained segments to 2.5 fps; 3) we feed the network with the first min(36, n) frames, where n is the number of frames of the segment. Notice that the minimum processed length consists of 7 frames, given that the segment is no shorter than 3s.

    objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

    objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.

    objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.

    *Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

    *Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

    "object_class_names": vector with the class name of each detected object.

    "object_scores": scores corresponding to each detected object.

    "object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

    †Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the V3C1+V3C2 dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used. Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

    We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

    References:

    [Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

    [Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

    [Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

    [He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

    [Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

    [Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

    [Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

    [Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

    [Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

    [Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).

  19. Z

    VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

    • data.niaid.nih.gov
    • data.europa.eu
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Amato; Paolo Bolettieri; Fabio Carrara; Fabrizio Falchi; Claudio Gennaro; Nicola Messina; Lucia Vadicamo; Claudio Vairo (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from MVK Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8355036
    Explore at:
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    CNR-ISTI
    Authors
    Giuseppe Amato; Paolo Bolettieri; Fabio Carrara; Fabrizio Falchi; Claudio Gennaro; Nicola Messina; Lucia Vadicamo; Claudio Vairo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains a diverse set of features extracted from the marine video (underwater) dataset (MVK) . These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

    We used a snapshot of the MVK dataset from 2023, that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 1,372 video files. We divided each video into 1 second segments.

    This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

    @inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

    This repository comprises the following files:

    msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

    extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original MVK videos available.

    features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

    features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

    features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

    features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the 1s video segments.

    objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

    objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.

    objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.

    *Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

    *Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

    "object_class_names": vector with the class name of each detected object.

    "object_scores": scores corresponding to each detected object.

    "object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

    †Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the MVK dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

    We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

    References:

    [Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

    [Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

    [Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

    [He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

    [Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

    [Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

    [Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

    [Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

    [Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

    [Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV

  20. h

    laion2B-multi-turkish-subset

    • huggingface.co
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cemil Guney (2023). laion2B-multi-turkish-subset [Dataset]. https://huggingface.co/datasets/mcemilg/laion2B-multi-turkish-subset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2023
    Authors
    Cemil Guney
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Türkiye
    Description

    Dataset Card for laion2B-multi-turkish-subset

      Dataset Summary
    

    LAION-5B is a large scale openly accessible image-text dataset contains text from multiple languages. This is a Turkish subset data of laion/laion2B-multi. It's compatible to be used with image2dataset to fetch the images at scale.

      Data Structure
    

    DatasetDict({ train: Dataset({ features: ['SAMPLE_ID', 'URL', 'TEXT', 'HEIGHT', 'WIDTH', 'LICENSE', 'LANGUAGE', 'NSFW', 'similarity']… See the full description on the dataset page: https://huggingface.co/datasets/mcemilg/laion2B-multi-turkish-subset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
LAION (2022). LAION-5B [Dataset]. https://opendatalab.com/OpenDataLab/LAION-5B
Organization logo

LAION-5B

OpenDataLab/LAION-5B

Explore at:
zipAvailable download formats
Dataset updated
Mar 31, 2022
Dataset provided by
LAIONhttps://laion.ai/
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

LAION 5B, a large-scale dataset for research purposes consisting of 5,85B CLIP-filtered image-text pairs. 2,3B contain English language, 2,2B samples from 100+ other languages and 1B samples have texts that do not allow a certain language assignment (e.g. names ). Additionally, we provide several nearest neighbor indices, an improved web interface for exploration & subset creation as well as detection scores for watermark and NSFW. The OpenDataLab website provides processed parquet files, which researchers can download, and download the corresponding image files according to this metadata. Now, we have also open sourced the LAION-5B image download code, the github open source address is as follows: https://github.com/opendatalab/laion5b-downloader

Search
Clear search
Close search
Google apps
Main menu