Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
laion/relaion2B-en-research dataset hosted on Hugging Face and contributed by the HF Datasets community
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used the LAION-5B dataset for training.
laion/relaion2B-en-research-safe dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Recap-Long-Laion
Dataset Description
This dataset consists of long captions of ~49M images from LAION-5B dataset. The long captions are generated by pre-trained Multi-modality Large Language Models (ShareGPT4V/InstructBLIP/LLava1.5) with the text prompt "Describe the image in detail".
Licensing Information
We distribute the image url with long captions under a standard Creative Common CC-BY-4.0 license. The individual images are under their own… See the full description on the dataset page: https://huggingface.co/datasets/weiwu-ww/Recap-Long-Laion.
LAION-COCO is the world’s largest dataset of 600M generated high-quality captions for publicly available web-images. The images are extracted from the english subset of Laion-5B with an ensemble of BLIP L/14 and 2 CLIP versions (L/14 and RN50x64). This dataset allow models to produce high quality captions for images.
A subset of the LAION 5B samples with English captions, obtained using LAION-Aesthetics_Predictor V2 625K image-text pairs with predicted aesthetics scores of 6.5 or higher available at https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
laion2B-multi-chinese-subset
Github: Fengshenbang-LM Docs: Fengshenbang-Docs
简介 Brief Introduction
取自Laion2B多语言多模态数据集中的中文部分,一共143M个图文对。 A subset from Laion2B (a multimodal dataset), around 143M image-text pairs (only Chinese).
数据集信息 Dataset Information
大约一共143M个中文图文对。大约占用19GB空间(仅仅是url等文本信息,不包含图片)。
Homepage: laion-5b Huggingface: laion/laion2B-multi
下载 Download
mkdir laion2b_chinese_release && cd laion2b_chinese_release for i in {00000..00012}; do… See the full description on the dataset page: https://huggingface.co/datasets/IDEA-CCNL/laion2B-multi-chinese-subset.
laion/aesthetics_v2_4.5 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Stable Diffusion 2 was trained on LAION-2B-en dataset which contains pairs of images and english-language prompts. Stable Diffusion 2 model can generate images given prompts. This dataset contains the prompts that are used by Stable Diffusion 2 training. This is a subset of LAION-5B dataset published in "**LAION-5B: An open large-scale dataset for training next generation image-text models**" (https://arxiv.org/abs/2210.08402) by Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, Jenia Jitsev in 2022.
The idea behind this Kaggle dataset is one can generate corresponding images using the pretrained model available.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LAION-400M The world’s largest openly available image-text-pair dataset with 400 million samples. # Concept and Content The LAION-400M dataset is completely openly, freely accessible. All images and texts in the LAION-400M dataset have been filtered with OpenAI‘s CLIP by calculating the cosine similarity between the text and image embeddings and dropping those with a similarity below 0.3 The threshold of 0.3 had been determined through human evaluations and seems to be a good heuristic for estimating semantic image-text-content matching. The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled between 2014 and 2021. # Download Information You can find The CLIP image embeddings (NumPy files) The parquet files KNN index of image embeddings # LAION-400M Dataset Statistics The LAION-400M and future even bigger ones are in fact datasets of datasets. For instance, it can be filtered out by image sizes into smaller datasets like th
Filtered WIT, an Image-Text Dataset.
A reliable Dataset to run Image-Text models. You can find WIT, Wikipedia Image Text Dataset, here Data was taken from dalle-mini/wit
Author
Aarush Katta
Data Structure
The data is stored as tars, containing 10,000 samples per tar. The parquets contain the metadata of each tar, which was crated using this script Each tar contains a .jpg, .txt, and .json. The image is stored in .jpg, the caption in .txt. and the metadata in… See the full description on the dataset page: https://huggingface.co/datasets/laion/filtered-wit.
https://choosealicense.com/licenses/wtfpl/https://choosealicense.com/licenses/wtfpl/
All files uploaded. Enjoy!
Dataset Card for The Unlaion Temple
Dataset Details
Dataset Description
Laion-5B is still not public, so we decided to create our own dataset. The Unlaion Temple is a raw dataset of CommonCrawl images (Estimated to be a total of 2 Billion urls). We haven't verified whether the links in this dataset are functional. You are responsible for handling the data. We've made some improvements to the dataset based on user feedback:
All… See the full description on the dataset page: https://huggingface.co/datasets/AnSungJae3489/the-un-laion-temple.
https://choosealicense.com/licenses/wtfpl/https://choosealicense.com/licenses/wtfpl/
Dataset Card for ilovehentai9000/laion-is-still-closed-laion-source-lmao
Dataset Details
Dataset Description
Laion-5B Still closed"fuck it. we do it ourselves."
This is a dataset of raw CommonCrawl of images. We have not checked if these links even work. You're on your own. All we checked is that they contain alt text + some very simple metrics.
Uses
Just don't. Do look like I give a fuck what you do lol. If you don't like it, report the actual… See the full description on the dataset page: https://huggingface.co/datasets/AnSungJae3489/laion-is-still-closed-laion-source-lmao.
LAION-Aesthetics 6.5+ dataset contains 625K image-text pairs.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
laion/laions_got_talent_enhanced_just_flash_annotations dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains a diverse set of features extracted from the VBSLHE dataset (laparoscopic gynecology) . These features will be utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] in the next editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).
We used a snapshot of the dataset provided by the Medical University of Vienna and Toronto that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 75 video files. We divided each video into video shots with a maximum duration of 5 seconds.
This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:
@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }
This repository (v2) comprises the following files:
msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").
extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original VBSLHE videos available.
features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.
features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.
features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.
features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the video segments.
objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).
objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.
objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.
*Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected
*Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:
"object_class_names": vector with the class name of each detected object.
"object_scores": scores corresponding to each detected object.
"object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).
†Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the VBSLHE dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.
We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.
References:
[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.
[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.
[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.
[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.
[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.
[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).
[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.
[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.
[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains a diverse set of features extracted from the marine video (underwater) dataset (MVK) . These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).
We used a snapshot of the MVK dataset from 2023, that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 1,372 video files. We divided each video into 1 second segments.
This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:
@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }
This repository comprises the following files:
msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").
extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original MVK videos available.
features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.
features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.
features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.
features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the 1s video segments.
objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).
objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.
objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.
*Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected
*Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:
"object_class_names": vector with the class name of each detected object.
"object_scores": scores corresponding to each detected object.
"object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).
†Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the MVK dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.
We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.
References:
[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.
[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.
[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.
[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.
[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.
[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).
[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.
[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.
[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Historical price and volatility data for Lion in Taiwan New Dollar across different time periods.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Historical price and volatility data for Lion Cat in Euro across different time periods.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Historical price and volatility data for US Dollar in Lucky Lion across different time periods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
laion/relaion2B-en-research dataset hosted on Hugging Face and contributed by the HF Datasets community