38 datasets found

h
laion-5b
huggingface.co
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chenhui Zhang (2024). laion-5b [Dataset]. https://huggingface.co/datasets/danielz01/laion-5b
Explore at:
Dataset updated
Feb 14, 2024
Authors
Chenhui Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
danielz01/laion-5b dataset hosted on Hugging Face and contributed by the HF Datasets community
t
Laion-5b - Dataset - LDM
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Laion-5b - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/laion-5b
Explore at:
Dataset updated
Dec 2, 2024
Description
A large-scale dataset of text and images for training next-generation language models.
LAION-5B
opendatalab.com
zip
Updated Mar 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LAION (2022). LAION-5B [Dataset]. https://opendatalab.com/OpenDataLab/LAION-5B
Explore at:
zipAvailable download formats
Dataset updated
Mar 31, 2022
Dataset provided by
LAIONhttps://laion.ai/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LAION 5B, a large-scale dataset for research purposes consisting of 5,85B CLIP-filtered image-text pairs. 2,3B contain English language, 2,2B samples from 100+ other languages and 1B samples have texts that do not allow a certain language assignment (e.g. names ). Additionally, we provide several nearest neighbor indices, an improved web interface for exploration & subset creation as well as detection scores for watermark and NSFW. The OpenDataLab website provides processed parquet files, which researchers can download, and download the corresponding image files according to this metadata. Now, we have also open sourced the LAION-5B image download code, the github open source address is as follows: https://github.com/opendatalab/laion5b-downloader
relaion2B-en-research
huggingface.co
aifasthub.com
Updated Mar 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LAION eV (2025). relaion2B-en-research [Dataset]. https://huggingface.co/datasets/laion/relaion2B-en-research
Explore at:
Dataset updated
Mar 25, 2025
Dataset provided by
LAIONhttps://laion.ai/
Authors
LAION eV
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
laion/relaion2B-en-research dataset hosted on Hugging Face and contributed by the HF Datasets community
relaion2B-en-research-safe
huggingface.co
Updated May 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LAION eV (2022). relaion2B-en-research-safe [Dataset]. https://huggingface.co/datasets/laion/relaion2B-en-research-safe
Explore at:
Dataset updated
May 22, 2022
Dataset provided by
LAIONhttps://laion.ai/
Authors
LAION eV
Description
laion/relaion2B-en-research-safe dataset hosted on Hugging Face and contributed by the HF Datasets community
LAION 5B 6.5 Aesthetic 512x512 Images
kaggle.com
Updated Jan 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CookieMonsterYum (2023). LAION 5B 6.5 Aesthetic 512x512 Images [Dataset]. https://www.kaggle.com/datasets/cookiemonsteryum/laion-5b-65-aesthetic-512x512-images
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 11, 2023
Dataset provided by
Kaggle
Authors
CookieMonsterYum
Description
Dataset

This dataset was created by CookieMonsterYum

Contents
h
Recap-Long-Laion
huggingface.co
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei Wu (2024). Recap-Long-Laion [Dataset]. https://huggingface.co/datasets/weiwu-ww/Recap-Long-Laion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2024
Authors
Wei Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Recap-Long-Laion

Dataset Description

This dataset consists of long captions of ~49M images from LAION-5B dataset. The long captions are generated by pre-trained Multi-modality Large Language Models (ShareGPT4V/InstructBLIP/LLava1.5) with the text prompt "Describe the image in detail".

Licensing Information

We distribute the image url with long captions under a standard Creative Common CC-BY-4.0 license. The individual images are under their own… See the full description on the dataset page: https://huggingface.co/datasets/weiwu-ww/Recap-Long-Laion.
aesthetics_v2_4.5
huggingface.co
Updated Sep 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LAION eV (2022). aesthetics_v2_4.5 [Dataset]. https://huggingface.co/datasets/laion/aesthetics_v2_4.5
Explore at:
Dataset updated
Sep 5, 2022
Dataset provided by
LAIONhttps://laion.ai/
Authors
LAION eV
Description
laion/aesthetics_v2_4.5 dataset hosted on Hugging Face and contributed by the HF Datasets community
a
LAION-400-MILLION OPEN DATASET
academictorrents.com
bittorrent
Updated Sep 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
None (2021). LAION-400-MILLION OPEN DATASET [Dataset]. https://academictorrents.com/details/34b94abbcefef5a240358b9acd7920c8b675aacc
Explore at:
bittorrent(1211103363514)Available download formats
Dataset updated
Sep 14, 2021
Authors
None
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LAION-400M The world’s largest openly available image-text-pair dataset with 400 million samples. # Concept and Content The LAION-400M dataset is completely openly, freely accessible. All images and texts in the LAION-400M dataset have been filtered with OpenAI‘s CLIP by calculating the cosine similarity between the text and image embeddings and dropping those with a similarity below 0.3 The threshold of 0.3 had been determined through human evaluations and seems to be a good heuristic for estimating semantic image-text-content matching. The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled between 2014 and 2021. # Download Information You can find The CLIP image embeddings (NumPy files) The parquet files KNN index of image embeddings # LAION-400M Dataset Statistics The LAION-400M and future even bigger ones are in fact datasets of datasets. For instance, it can be filtered out by image sizes into smaller datasets like th
filtered-wit
huggingface.co
Updated Jul 4, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LAION eV (2017). filtered-wit [Dataset]. https://huggingface.co/datasets/laion/filtered-wit
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 4, 2017
Dataset provided by
LAIONhttps://laion.ai/
Authors
LAION eV
Description
Filtered WIT, an Image-Text Dataset.

A reliable Dataset to run Image-Text models. You can find WIT, Wikipedia Image Text Dataset, here Data was taken from dalle-mini/wit

Author

Aarush Katta

Data Structure

The data is stored as tars, containing 10,000 samples per tar. The parquets contain the metadata of each tar, which was crated using this script Each tar contains a .jpg, .txt, and .json. The image is stored in .jpg, the caption in .txt. and the metadata in… See the full description on the dataset page: https://huggingface.co/datasets/laion/filtered-wit.
h
laion-is-still-closed-laion-source-lmao
huggingface.co
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
An Sung-Jae (2024). laion-is-still-closed-laion-source-lmao [Dataset]. https://huggingface.co/datasets/AnSungJae3489/laion-is-still-closed-laion-source-lmao
Explore at:
Dataset updated
May 24, 2024
Authors
An Sung-Jae
License
https://choosealicense.com/licenses/wtfpl/https://choosealicense.com/licenses/wtfpl/
Description
Dataset Card for ilovehentai9000/laion-is-still-closed-laion-source-lmao

Dataset Details Dataset Description

Laion-5B Still closed"fuck it. we do it ourselves."

This is a dataset of raw CommonCrawl of images. We have not checked if these links even work. You're on your own. All we checked is that they contain alt text + some very simple metrics.

Uses

Just don't. Do look like I give a fuck what you do lol. If you don't like it, report the actual… See the full description on the dataset page: https://huggingface.co/datasets/AnSungJae3489/laion-is-still-closed-laion-source-lmao.
a
the-un-laion-temple
aifasthub.com
huggingface.co
Updated Aug 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
An Sung-Jae (2025). the-un-laion-temple [Dataset]. https://aifasthub.com/datasets/AnSungJae3489/the-un-laion-temple
Explore at:
Dataset updated
Aug 23, 2025
Authors
An Sung-Jae
License
https://choosealicense.com/licenses/wtfpl/https://choosealicense.com/licenses/wtfpl/
Description
All files uploaded. Enjoy!

Dataset Card for The Unlaion Temple Dataset Details Dataset Description

Laion-5B is still not public, so we decided to create our own dataset. The Unlaion Temple is a raw dataset of CommonCrawl images (Estimated to be a total of 2 Billion urls). We haven't verified whether the links in this dataset are functional. You are responsible for handling the data. We've made some improvements to the dataset based on user feedback:

All… See the full description on the dataset page: https://huggingface.co/datasets/AnSungJae3489/the-un-laion-temple.
t
Xiang Gao, Zhengbo Xu, Junhan Zhao, Jiaying Liu (2024). Dataset:...
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Xiang Gao, Zhengbo Xu, Junhan Zhao, Jiaying Liu (2024). Dataset: LAION-Aesthetics 6.5+. https://doi.org/10.57702/zvbnqhl9 [Dataset]. https://service.tib.eu/ldmservice/dataset/laion-aesthetics-6-5-
Explore at:
Dataset updated
Dec 2, 2024
Description
LAION-Aesthetics 6.5+ dataset contains 625K image-text pairs.
wuerstchen-dataset
huggingface.co
Updated Oct 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LAION eV (2023). wuerstchen-dataset [Dataset]. https://huggingface.co/datasets/laion/wuerstchen-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 18, 2023
Dataset provided by
LAIONhttps://laion.ai/
Authors
LAION eV
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
laion/wuerstchen-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Z
VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...
data.niaid.nih.gov
data.europa.eu
Updated Jan 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claudio Gennaro (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from VBSLHE Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10013328
Explore at:
Dataset updated
Jan 25, 2024
Dataset provided by
Amato, Giuseppe
Bolettieri, Paolo
Claudio Vairo
Nicola Messina
Lucia Vadicamo
Claudio Gennaro
Fabrizio Falchi
Fabio Carrara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains a diverse set of features extracted from the VBSLHE dataset (laparoscopic gynecology) . These features will be utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] in the next editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

We used a snapshot of the dataset provided by the Medical University of Vienna and Toronto that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 75 video files. We divided each video into video shots with a maximum duration of 5 seconds.

This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

This repository (v2) comprises the following files:

msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original VBSLHE videos available.

features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the video segments.

objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.

objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.

*Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

*Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

"object_class_names": vector with the class name of each detected object.

"object_scores": scores corresponding to each detected object.

"object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

†Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the VBSLHE dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

References:

[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV
Z
VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...
data.niaid.nih.gov
Updated Jan 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giuseppe Amato (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from MVK Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8355036
Explore at:
Dataset updated
Jan 25, 2024
Dataset provided by
Claudio Vairo
Nicola Messina
Lucia Vadicamo
Claudio Gennaro
Fabrizio Falchi
Paolo Bolettieri
Fabio Carrara
Giuseppe Amato
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains a diverse set of features extracted from the marine video (underwater) dataset (MVK) . These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

We used a snapshot of the MVK dataset from 2023, that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 1,372 video files. We divided each video into 1 second segments.

This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

This repository comprises the following files:

msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original MVK videos available.

features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the 1s video segments.

objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.

objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.

*Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

*Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

"object_class_names": vector with the class name of each detected object.

"object_scores": scores corresponding to each detected object.

"object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

†Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the MVK dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

References:

[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV
h
laion-gpt4v
huggingface.co
Updated Jun 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quan Nguyen (2024). laion-gpt4v [Dataset]. https://huggingface.co/datasets/qnguyen3/laion-gpt4v
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 18, 2024
Authors
Quan Nguyen
Description
qnguyen3/laion-gpt4v dataset hosted on Hugging Face and contributed by the HF Datasets community
Z
VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...
data.niaid.nih.gov
data.europa.eu
Updated Feb 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claudio Vairo (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from V3C1+V3C2 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8188569
Explore at:
Dataset updated
Feb 12, 2024
Dataset provided by
Claudio Vairo
Nicola Messina
Lucia Vadicamo
Claudio Gennaro
Fabrizio Falchi
Paolo Bolettieri
Fabio Carrara
Giuseppe Amato
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains a diverse set of features extracted from the V3C1+V3C2 dataset, sourced from the Vimeo Creative Commons Collection. These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

The original V3C1+V3C2 dataset, provided by NIST, can be downloaded using the instructions provided at https://videobrowsershowdown.org/about-vbs/existing-data-and-tools/.

It comprises 7,235 video files, amounting for 2,300h of video content and encompassing 2,508,113 predefined video segments.

We subdivided the predefined video segments longer than 10 seconds into multiple segments, with each segment spanning no longer than 16 seconds. As a result, we obtained a total of 2,648,219 segments. For each segment, we extracted one frame, specifically the middle one, and computed several features, which are described in detail below.

This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

This repository comprises the following files:

msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original V3C videos available.

features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the video segments. In particular 1) we concatenate consecutive short segments so to create segments at least 3 seconds long; 2) we downsample the obtained segments to 2.5 fps; 3) we feed the network with the first min(36, n) frames, where n is the number of frames of the segment. Notice that the minimum processed length consists of 7 frames, given that the segment is no shorter than 3s.

objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.

objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.

*Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

*Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

"object_class_names": vector with the class name of each detected object.

"object_scores": scores corresponding to each detected object.

"object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

†Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the V3C1+V3C2 dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used. Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

References:

[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).
R
Performance Analytics for 5 Lion Festival
reelmetrics.com
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ReelMetrics (2023). Performance Analytics for 5 Lion Festival [Dataset]. https://www.reelmetrics.com/games/abb2e2ad-d0db-4f16-948b-3ee4c8ae217d/5-lion-festival
Explore at:
Dataset updated
Jun 30, 2023
Dataset authored and provided by
ReelMetrics
License
https://www.reelmetrics.com/legalhttps://www.reelmetrics.com/legal
Variables measured
Games Played, Market Share, Theoretical Win, Venue Distribution, Performance Metrics
Description
Comprehensive performance analytics and metrics for 5 Lion Festival by Konami.
i
Bathymétrie - golfe du Lion, canyon Lacaze-Duthiers, résolution 5 m...
sextant.ifremer.fr
doi, ogc:wms
Updated Sep 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ifremer (2023). Bathymétrie - golfe du Lion, canyon Lacaze-Duthiers, résolution 5 m (campagne CALADU_2021, 2021) [Dataset]. https://sextant.ifremer.fr/geonetwork/srv/api/records/74492729-a034-42ee-a517-7037cda010b4
Explore at:
doi, ogc:wmsAvailable download formats
Dataset updated
Sep 15, 2023
Dataset provided by
Ifremer
Time period covered
Jul 6, 2021 - Jul 24, 2021
Area covered

Description
Bathymétrie de la tête du canyon Lacaze-Duthiers (campagne CALADU_2021, 2021), au pas de 5 mètres. Ce jeu de données est un Modèle Numérique de Terrain bathymétrique, issu de l'acquisition acoustique du sondeur multifaisceau EM304 du N/O Thalassa. Référence verticale : niveau de la mer observé (pas de correction de marée) Rapport de traitement : Gaillot Arnaud (2024). CALADU_2021. Traitement des données SMF coque et engin. Ref. ASTI-2024-331. Ifremer. Description de la campagne : FABRI Marie-Claire (2021) CALADU_2021 cruise, RV Thalassa, https://doi.org/10.17600/18001575

Facebook

Twitter

Click to copy link

Link copied

Cite

Chenhui Zhang (2024). laion-5b [Dataset]. https://huggingface.co/datasets/danielz01/laion-5b

laion-5b

danielz01/laion-5b

Explore at:

Dataset updated

Feb 14, 2024

Authors

Chenhui Zhang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

danielz01/laion-5b dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

laion-5b

Laion-5b - Dataset - LDM

LAION-5B

relaion2B-en-research

relaion2B-en-research-safe

LAION 5B 6.5 Aesthetic 512x512 Images

Dataset

Contents

Recap-Long-Laion

aesthetics_v2_4.5

LAION-400-MILLION OPEN DATASET

filtered-wit

laion-is-still-closed-laion-source-lmao

the-un-laion-temple

Xiang Gao, Zhengbo Xu, Junhan Zhao, Jiaying Liu (2024). Dataset:...

wuerstchen-dataset

VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

laion-gpt4v

VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

Performance Analytics for 5 Lion Festival

Bathymétrie - golfe du Lion, canyon Lacaze-Duthiers, résolution 5 m...

laion-5b

danielz01/laion-5b