Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LAION-400M The world’s largest openly available image-text-pair dataset with 400 million samples. # Concept and Content The LAION-400M dataset is completely openly, freely accessible. All images and texts in the LAION-400M dataset have been filtered with OpenAI‘s CLIP by calculating the cosine similarity between the text and image embeddings and dropping those with a similarity below 0.3 The threshold of 0.3 had been determined through human evaluations and seems to be a good heuristic for estimating semantic image-text-content matching. The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled between 2014 and 2021. # Download Information You can find The CLIP image embeddings (NumPy files) The parquet files KNN index of image embeddings # LAION-400M Dataset Statistics The LAION-400M and future even bigger ones are in fact datasets of datasets. For instance, it can be filtered out by image sizes into smaller datasets like th
jp1924/Laion400m-2 dataset hosted on Hugging Face and contributed by the HF Datasets community
Filtered WIT, an Image-Text Dataset.
A reliable Dataset to run Image-Text models. You can find WIT, Wikipedia Image Text Dataset, here Data was taken from dalle-mini/wit
Author
Aarush Katta
Data Structure
The data is stored as tars, containing 10,000 samples per tar. The parquets contain the metadata of each tar, which was crated using this script Each tar contains a .jpg, .txt, and .json. The image is stored in .jpg, the caption in .txt. and the metadata in… See the full description on the dataset page: https://huggingface.co/datasets/laion/filtered-wit.
laion/laion2B-en-aesthetic dataset hosted on Hugging Face and contributed by the HF Datasets community
laion/relaion2B-en-research-safe dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for "laion-hd-subset"
More Information needed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Lion is a dataset for object detection tasks - it contains Lion annotations for 565 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
LAION COCO with aesthetic score and watermark score
This dataset contains 10% samples of the LAION-COCO dataset filtered by some text rules (remove url, special tokens, etc.), and image rules (image size > 384x384, aesthetic score>4.75 and watermark probability<0.5). There are total 8,563,753 data instances in this dataset. And the corresponding aesthetic score and watermark score are also included. Noted: watermark score in the table means the probability of the existence of the… See the full description on the dataset page: https://huggingface.co/datasets/guangyil/laion-coco-aesthetic.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Lion Toy is a dataset for object detection tasks - it contains Lions annotations for 1,645 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
## Overview
Lion is a dataset for object detection tasks - it contains Car Bike Motorbike annotations for 6,085 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Lion Yolo is a dataset for object detection tasks - it contains Lion annotations for 212 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Red Lion population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Red Lion. The dataset can be utilized to understand the population distribution of Red Lion by age. For example, using this dataset, we can identify the largest age group in Red Lion.
Key observations
The largest age group in Red Lion, PA was for the group of age 30 to 34 years years with a population of 790 (12.17%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Red Lion, PA was the 75 to 79 years years with a population of 94 (1.45%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Red Lion Population by Age. You can refer the same here
https://www.bitget.com/uk/price/lion-faihttps://www.bitget.com/uk/price/lion-fai
Відстеження історії цін LION FAI дозволяє криптоінвесторам легко контролювати ефективність своїх інвестицій. Ви можете зручно відстежувати значення відкриття, максимуму і закриття для LION FAI, а також обсягу торгівлі протягом певного часу. Крім того, ви можете миттєво переглянути щоденні зміни у відсотках, що дозволяє легко визначити дні зі значними коливаннями. Згідно з нашими даними історії цін LION FAI, його вартість злетіла до безпрецедентного піку в 2025-06-23, перевищивши -- USD. З іншого боку, найнижча точка цінової траєкторії LION FAI, яку зазвичай називають «історичним мінімумом LION FAI», сталася на 2025-06-23. Якби хтось придбав LION FAI за цей час, то зараз він би мав неабиякий прибуток у розмірі 0%. За задумом, буде створено 1,000,000,000 LION FAI. Наразі циркулююча пропозиція LION FAI становить приблизно 0. Всі ціни, вказані на цій сторінці, отримані від надійного джерела Bitget. Дуже важливо покладатися на єдине джерело для тестування ваших інвестицій, оскільки значення можуть відрізнятися у різних продавців. До наших даних історичної ціни LION FAI входять дані з інтервалами в 1 хвилину, 1 день, 1 тиждень та 1 місяць (відкриття/макс./мін./закриття/обсяг). Ці дані пройшли ретельне тестування для забезпечення узгодженості, повноти та точності. Вони спеціально підібрані для моделювання торгівлі та бек-тестування, доступні для безоплатного завантаження та оновлюються в режимі реального часу.
The datasets used in the creation of the predicted Habitat Suitability models includes the CWHR range maps of Californias regularly-occurring vertebrates which were digitized as GIS layers to support the predictions of the CWHR System software. These vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.The models also used the CALFIRE-FRAP compiled "best available" land cover data known as Fveg. This compilation dataset was created as a single data layer, to support the various analyses required for the Forest and Rangeland Assessment, a legislatively mandated function. These data are being updated to support on-going analyses and to prepare for the next FRAP assessment in 2015. An accurate depiction of the spatial distribution of habitat types within California is required for a variety of legislatively-mandated government functions. The California Department of Forestry and Fire Protections CALFIRE Fire and Resource Assessment Program (FRAP), in cooperation with California Department of Fish and Wildlife VegCamp program and extensive use of USDA Forest Service Region 5 Remote Sensing Laboratory (RSL) data, has compiled the "best available" land cover data available for California into a single comprehensive statewide data set. The data span a period from approximately 1990 to 2014. Typically the most current, detailed and consistent data were collected for various regions of the state. Decision rules were developed that controlled which layers were given priority in areas of overlap. Cross-walks were used to compile the various sources into the common classification scheme, the California Wildlife Habitat Relationships (CWHR) system.CWHR range data was used together with the FVEG vegetation maps and CWHR habitat suitability ranks to create Predicted Habitat Suitability maps for species. The Predicted Habitat Suitability maps show the mean habitat suitability score for the species, as defined in CWHR. CWHR defines habitat suitability as NO SUITABILITY (0), LOW (0.33), MEDIUM (0.66), or HIGH (1) for reproduction, cover, and feeding for each species in each habitat stage (habitat type, size, and density combination). The mean is the average of the reproduction, cover, and feeding scores, and can be interpreted as LOW (less than 0.34), MEDIUM (0.34-0.66), and HIGH (greater than 0.66) suitability. Note that habitat suitability ranks were developed based on habitat patch sizes >40 acres in size, and are best interpreted for habitat patches >200 acres in size. The CWHR Predicted Habitat Suitability rasters are named according to the 4 digit alpha-numeric species CWHR ID code. The CWHR Species Lookup Table contains a record for each species including its CWHR ID, scientific name, common name, and range map revision history (available for download at https://www.wildlife.ca.gov/Data/CWHR).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the data for the Red Lion, PA population pyramid, which represents the Red Lion population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Red Lion Population by Age. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Check Tiger And Lion is a dataset for object detection tasks - it contains Objects annotations for 1,277 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Red Lion by race. It includes the population of Red Lion across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Red Lion across relevant racial categories.
Key observations
The percent distribution of Red Lion population by race (across all racial categories recognized by the U.S. Census Bureau): 85.59% are white, 5.14% are Black or African American, 0.46% are American Indian and Alaska Native, 3.85% are Asian and 4.96% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Red Lion Population by Race & Ethnicity. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for California's wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR" STYLE="text-decoration:underline;">https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Lion_2 is a dataset for object detection tasks - it contains Lion annotations for 1,227 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains a diverse set of features extracted from the marine video (underwater) dataset (MVK) . These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).
We used a snapshot of the MVK dataset from 2023, that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 1,372 video files. We divided each video into 1 second segments.
This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:
@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }
This repository comprises the following files:
msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").
extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original MVK videos available.
features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.
features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.
features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.
features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the 1s video segments.
objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).
objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.
objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.
*Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected
*Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:
"object_class_names": vector with the class name of each detected object.
"object_scores": scores corresponding to each detected object.
"object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).
†Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the MVK dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.
We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.
References:
[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.
[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.
[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.
[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.
[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.
[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).
[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.
[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.
[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LAION-400M The world’s largest openly available image-text-pair dataset with 400 million samples. # Concept and Content The LAION-400M dataset is completely openly, freely accessible. All images and texts in the LAION-400M dataset have been filtered with OpenAI‘s CLIP by calculating the cosine similarity between the text and image embeddings and dropping those with a similarity below 0.3 The threshold of 0.3 had been determined through human evaluations and seems to be a good heuristic for estimating semantic image-text-content matching. The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled between 2014 and 2021. # Download Information You can find The CLIP image embeddings (NumPy files) The parquet files KNN index of image embeddings # LAION-400M Dataset Statistics The LAION-400M and future even bigger ones are in fact datasets of datasets. For instance, it can be filtered out by image sizes into smaller datasets like th