Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval (ArXiv)
Similar to most vision related tasks, deep learning models have taken over in the field of content-based image retrieval (CBIR) over the course of the last decade. However, most publications that aim to optimise neural networks for CBIR, train and test their models on domain specific datasets. It is therefore unclear, if those networks can be used as a general-purpose image feature extractor. After analyzing popular image retrieval test sets we decided to manually curate GPR1200, an easy to use and accessible but challenging benchmark dataset with 1200 categories and 10 class examples. Classes and images were manually selected from six publicly available datasets of different image areas, ensuring high class diversity and clean class boundaries.
https://github.com/Visual-Computing/GPR1200/raw/main/images/GPR_main_pic.jpg" alt="GPR1200">
Benchmark your Image Retrieval Models on It
https://github.com/Visual-Computing/GPR1200/raw/main/images/result_table.JPG" alt="Image Retrieval">
Facebook
TwitterDid you ever go through your vacation photos and ask yourself: What is the name of this temple I visited in China? Who created this monument I saw in France? Landmark recognition can help! This technology can predict landmark labels directly from image pixels, to help people better understand and organize their photo collections. Today, a great obstacle to landmark recognition research is the lack of large annotated datasets. This motivated us to release Google-Landmarks, the largest worldwide dataset to date, to foster progress in this problem.
The dataset is divided into two sets of images, to evaluate two different computer vision tasks: recognition and retrieval. The data was originally described in [1], and published as part of the Google Landmark Recognition Challenge and Google Landmark Retrieval Challenge. Additionally, to spur research in this field, we have open-sourced Deep Local Features (DELF), an attentive local feature descriptor that we believe is especially suited for this kind of task. DELF's code can be found on github via this link.
UPDATE: We have now also made available the Google Landmark Boxes dataset, containing 86 thousand bounding boxes.
If you make use of the Google Landmarks dataset in your research, please consider citing:
H. Noh, A. Araujo, J. Sim, T. Weyand, B. Han, "Large-Scale Image Retrieval with Attentive Deep Local Features", Proc. ICCV'17
If you make use of the Google Landmark Boxes dataset in your research, please consider citing:
M. Teichmann*, A. Araujo*, M. Zhu and J. Sim, “Detect-to-Retrieve: Efficient Regional Aggregation for Image Search”, Proc. CVPR'19
The two challenges associated to this dataset can be found in the following links:
The Landmark Recognition Workshop at CVPR 2018 will discuss recent progress on landmark recognition and image retrieval, taking into account the results of the above-mentioned challenges. Top submissions for the challenges will be invited to give talks at the workshop.
The dataset contains URLs of images which are publicly available online (this Python script may be useful to download the images). Note that no image data is released, only URLs.
The dataset contains test images, training images and index images. The test images are used in both tasks: for the recognition task, a landmark label may be predicted for each test image; for the retrieval task, relevant index images may be retrieved for each test image. The training images are associated to landmark labels, and can be used to train models for the recognition and retrieval challenges (for a visualization of the geographic distribution of training images, see [3]). The index images are used in the retrieval task, composing the set from which images should be retrieved.
Note that the test set for both the recognition and retrieval tasks is the same, to encourage researchers to experiment with both. We also encourage participants to use the training data from the recognition task to train models which could be useful for the retrieval task. Note, however, that there are no landmarks in common between the training/index sets of the two tasks.
The images listed in the dataset are not directly in our control, so their availability may change over time, and the dataset files may be updated to remove URLs which no longer work.
The training and index sets were constructed by clustering photos with respect to their geolocation and visual similarity using an algorithm similar to the one described in [4]. Matches between training images were established using local feature matching. Note that there may be multiple clusters per landmark, which typically correspond to different views or different parts of the landmark. To avoid bias, no computer vision algorithms were used for ground truth generation. Instead, we established ground truth correspondences between test images and landmarks using human annotators.
The images listed in this dataset are publicly available on the web, and may have different licenses. Google does not own their copyright.
...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NeSy4VRD
NeSy4VRD is a multifaceted, multipurpose resource designed to foster neurosymbolic AI (NeSy) research, particularly NeSy research using Semantic Web technologies such as OWL ontologies, OWL-based knowledge graphs and OWL-based reasoning as symbolic components. The NeSy4VRD research resource pertains to the computer vision field of AI and, within that field, to the application tasks of visual relationship detection (VRD) and scene graph generation.
Whilst the core motivation of the NeSy4VRD research resource is to foster computer vision-based NeSy research using Semantic Web technologies such as OWL ontologies and OWL-based knowledge graphs, AI researchers can readily use NeSy4VRD to either: 1) pursue computer vision-based NeSy research without involving Semantic Web technologies as symbolic components, or 2) pursue computer vision research without NeSy (i.e. pursue research that focuses purely on deep learning alone, without involving symbolic components of any kind). This is the sense in which we describe NeSy4VRD as being multipurpose: it can readily be used by diverse groups of computer vision-based AI researchers with diverse interests and objectives.
The NeSy4VRD research resource in its entirety is distributed across two locations: Zenodo and GitHub.
NeSy4VRD on Zenodo: the NeSy4VRD dataset package
This entry on Zenodo hosts the NeSy4VRD dataset package, which includes the NeSy4VRD dataset and its companion NeSy4VRD ontology, an OWL ontology called VRD-World.
The NeSy4VRD dataset consists of an image dataset with associated visual relationship annotations. The images of the NeSy4VRD dataset are the same as those that were once publicly available as part of the VRD dataset. The NeSy4VRD visual relationship annotations are a highly customised and quality-improved version of the original VRD visual relationship annotations. The NeSy4VRD dataset is designed for computer vision-based research that involves detecting objects in images and predicting relationships between ordered pairs of those objects. A visual relationship for an image of the NeSy4VRD dataset has the form <'subject', 'predicate', 'object'>, where the 'subject' and 'object' are two objects in the image, and the 'predicate' describes some relation between them. Both the 'subject' and 'object' objects are specified in terms of bounding boxes and object classes. For example, representative annotated visual relationships are <'person', 'ride', 'horse'>, <'hat', 'on', 'teddy bear'> and <'cat', 'under', 'pillow'>.
Visual relationship detection is pursued as a computer vision application task in its own right, and as a building block capability for the broader application task of scene graph generation. Scene graph generation, in turn, is commonly used as a precursor to a variety of enriched, downstream visual understanding and reasoning application tasks, such as image captioning, visual question answering, image retrieval, image generation and multimedia event processing.
The NeSy4VRD ontology, VRD-World, is a rich, well-aligned, companion OWL ontology engineered specifically for use with the NeSy4VRD dataset. It directly describes the domain of the NeSy4VRD dataset, as reflected in the NeSy4VRD visual relationship annotations. More specifically, all of the object classes that feature in the NeSy4VRD visual relationship annotations have corresponding classes within the VRD-World OWL class hierarchy, and all of the predicates that feature in the NeSy4VRD visual relationship annotations have corresponding properties within the VRD-World OWL object property hierarchy. The rich structure of the VRD-World class hierarchy and the rich characteristics and relationships of the VRD-World object properties together give the VRD-World OWL ontology rich inference semantics. These provide ample opportunity for OWL reasoning to be meaningfully exercised and exploited in NeSy research that uses OWL ontologies and OWL-based knowledge graphs as symbolic components. There is also ample potential for NeSy researchers to explore supplementing the OWL reasoning capabilities afforded by the VRD-World ontology with Datalog rules and reasoning.
Use of the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset is, of course, purely optional, however. Computer vision AI researchers who have no interest in NeSy, or NeSy researchers who have no interest in OWL ontologies and OWL-based knowledge graphs, can ignore the NeSy4VRD ontology and use the NeSy4VRD dataset by itself.
All computer vision-based AI research user groups can, if they wish, also avail themselves of the other components of the NeSy4VRD research resource available on GitHub.
NeSy4VRD on GitHub: open source infrastructure supporting extensibility, and sample code
The NeSy4VRD research resource incorporates additional components that are companions to the NeSy4VRD dataset package here on Zenodo. These companion components are available at NeSy4VRD on GitHub. These companion components consist of:
The NeSy4VRD infrastructure supporting extensibility consists of:
The purpose behind providing comprehensive infrastructure to support extensibility of the NeSy4VRD visual relationship annotations is to make it easy for researchers to take the NeSy4VRD dataset in new directions, by further enriching the annotations, or by tailoring them to introduce new or more data conditions that better suit their particular research needs and interests. The option to use the NeSy4VRD extensibility infrastructure in this way applies equally well to each of the diverse potential NeSy4VRD user groups already mentioned.
The NeSy4VRD extensibility infrastructure, however, may be of particular interest to NeSy researchers interested in using the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset. These researchers can of course tailor the VRD-World ontology if they wish without needing to modify or extend the NeSy4VRD visual relationship annotations in any way. But their degrees of freedom for doing so will be limited by the need to maintain alignment with the NeSy4VRD visual relationship annotations and the particular set of object classes and predicates to which they refer. If NeSy researchers want full freedom to tailor the VRD-World ontology, they may well need to tailor the NeSy4VRD visual relationship annotations first, in order that alignment be maintained.
To illustrate our point, and to illustrate our vision of how the NeSy4VRD extensibility infrastructure can be used, let us consider a simple example. It is common in computer vision to distinguish between thing objects (that have well-defined shapes) and stuff objects (that are amorphous). Suppose a researcher wishes to have a greater number of stuff object classes with which to work. Water is such a stuff object. Many VRD images contain water but it is not currently one of the annotated object classes and hence is never referenced in any visual relationship annotations. So adding a Water class to the class hierarchy of the VRD-World ontology would be pointless because it would never acquire any instances (because an object detector would never detect any). However, our hypothetical researcher could choose to do the following:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The LAION-400M dataset is completely openly, freely accessible.
Check https://laion.ai/laion-400-open-dataset/ for the full description of this dataset.
All images and texts in the LAION-400M dataset have been filtered with OpenAI‘s CLIP by calculating the cosine similarity between the text and image embeddings and dropping those with a similarity below 0.3
The threshold of 0.3 had been determined through human evaluations and seems to be a good heuristic for estimating semantic image-text-content matching.
The image-text-pairs have been extracted from the Common Crawl webdata dump and are from random web pages crawled between 2014 and 2021.
Use img2dataset to download subsets of this.
The LAION-400M and future even bigger ones are in fact datasets of datasets. For instance, it can be filtered out by image sizes into smaller datasets like this:
Number of unique samples 413M
Number with height or width >= 1024 26M
Number with height and width >= 1024 9.6M
Number with height or width >= 512 112M
Number with height and width >= 512 67M
Number with height or width >= 256 268M
Number with height and width >= 256 211M
By using the KNN index specialized datasets can also be extracted by domains of interest. They are (or will be) sufficient in size to train domain specialized models.
http://gallerytest.christoph-schuhmann.de/photos/index.php?/category/4 (todo: replace link with local gallery) https://rom1504.github.io/clip-retrieval/ is a simple visualization of the dataset. There you can search among the dataset using clip and a knn index.
We produced the dataset in several formats to address the various use cases:
In this kaggle, we provide the url and caption metadata dataset. Check https://laion.ai/laion-400-open-dataset/ for the other formats and the full explanation.
We provide 32 parquet files of size around 1GB (total 50GB) with the image URLs, the associated texts and additional metadata in the following format:
SAMPLE_ID | URL | TEXT | LICENSE | NSFW | similarity | WIDTH | HEIGHT
where
SAMPLE_ID: A unique identifier LICENSE: If a Creative Commons License could be extracted from the image data, we name it here like e.g. “creativecommons.org/licenses/by-nc-sa/3.0/” - otherwise you’ll find it here a “?” NSFW: CLIP had been used to estimate if the image has NSFW content. The estimation has been pretty conservative, reducing the number of false negatives at the cost of more false positives. Possible values are “UNLIKELY”, “UNSURE” and “NSFW” similarity: Value of the cosine similarity between the text and image embedding WIDTH and HEIGHT: image size as the image was embedded. Originals that were larger than 4K size were resized to 4K
This metadata dataset is best used to redownload the whole dataset or a subset of it. The img2dataset tool can be used to efficiently download such subsets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MuMu is a Multimodal Music dataset with multi-label genre annotations that combines information from the Amazon Reviews dataset and the Million Song Dataset (MSD). The former contains millions of album customer reviews and album metadata gathered from Amazon.com. The latter is a collection of metadata and precomputed audio features for a million songs.
To map the information from both datasets we use MusicBrainz. This process yields the final set of 147,295 songs, which belong to 31,471 albums. For the mapped set of albums, there are 447,583 customer reviews from the Amazon Dataset. The dataset have been used for multi-label music genre classification experiments in the related publication. In addition to genre annotations, this dataset provides further information about each album, such as genre annotations, average rating, selling rank, similar products, and cover image url. For every text review it also provides helpfulness score of the reviews, average rating, and summary of the review.
The mapping between the three datasets (Amazon, MusicBrainz and MSD), genre annotations, metadata, data splits, text reviews and links to images are available here. Images and audio files can not be released due to copyright issues.
MuMu dataset (mapping, metadata, annotations and text reviews)
Data splits and multimodal feature embeddings for ISMIR multi-label classification experiments
These data can be used together with the Tartarus deep learning library https://github.com/sergiooramas/tartarus.
NOTE: This version provides simplified files with metadata and splits.
Scientific References
Please cite the following papers if using MuMu dataset or Tartarus library.
Oramas, S., Barbieri, F., Nieto, O., and Serra, X (2018). Multimodal Deep Learning for Music Genre Classification, Transactions of the International Society for Music Information Retrieval, V(1).
Oramas S., Nieto O., Barbieri F., & Serra X. (2017). Multi-label Music Genre Classification from audio, text and images using Deep Features. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). https://arxiv.org/abs/1707.04916
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This brain tumor dataset containing 3064 T1-weighted contrast-inhanced images from 233 patients with three kinds of brain tumor: meningioma (708 slices), glioma (1426 slices), and pituitary tumor (930 slices). Due to the file size limit of repository, we split the whole dataset into 4 subsets, and achive them in 4 .zip files with each .zip file containing 766 slices.The 5-fold cross-validation indices are also provided.
This data is organized in matlab data format (.mat file). Each file stores a struct containing the following fields for an image:
This data was used in the following paper: 1. Cheng, Jun, et al. "Enhanced Performance of Brain Tumor Classification via Tumor Region Augmentation and Partition." PloS one 10.10 (2015). Enhanced performance of brain tumor classification via tumor region augmentation and partition
Matlab source codes are available on github https://github.com/chengjun583/brainTumorRetrieval
Facebook
TwitterThis is a reference global feature extraction model for the Google Landmark Retrieval 2020 Competition. You can use it as an initial submission to the competition or to better understand the model submission format and requirements.
To create a submission to the competition, download the dataset, and zip its contents.
This dataset contains a simplified version of DELG (ResNet-101 backbone with ArcFace). It outputs global features output only, and has been exported as a Tensorflow SavedModel, with the competition's required serving signature, serving_default (the default when creating a SavedModel), and the required output, global_descriptor.
The model takes as input a single arbitrarily sized uint8 tensor of an RGB image, and outputs the embedding for the image as a float tensor with shape (2048,) to global_descriptor.
DELG (github):
"Unifying Deep Local and Global Features for Image Search",
B. Cao*, A. Araujo* and J. Sim,
arxiv:2001.05027
"Google Landmarks Dataset v2 - A Large-Scale Benchmark for Instance-Level Recognition and Retrieval",
T. Weyand*, A. Araujo*, B. Cao and J. Sim,
Proc. CVPR'20
GLDv2 clean (Kaggle dataset here):
"Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset",
K. Ozaki, S. Yokoo
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Indonesian textile craftsmanship has evolved over millennia, transitioning from basic utilitarian weaving techniques around 2500 BC to more intricate patterns and religious symbolism and social and culture during the time, with production hubs across regions like Sumatra, Borneo, Java, Celebes, Nusa Tenggara, and Bali. These textiles evolved from utilitarian items to carriers of sacred meanings, divided into secular and sacred cloths, both renowned for their aesthetic beauty. They played a pivotal role in individuals' cultural journeys, symbolizing life stages like maternity, matrimony, and mortality, with designs reflecting religious beliefs and the era's influence. The Batik technique, a hallmark of Indonesian textile artistry, involves creating intricate patterns using a resist wax method. Traditionally, artisans used a tool called a canting to draw patterns on fabric, a process known as batik tulis (drawn batik). Following the drawing phase, the cloth was dyed using natural dyes, and then subjected to the "lorot" process, involving boiling the wax out of the fabric.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19051508%2Fe543b4e91ad5dffe2b54e7f4300cc7b2%2F2024-02-16%2015.09.06%20copy%202.jpg?generation=1708074019154098&alt=media" alt="">
Batik making is revered for its complexity and demands high craftsmanship, requiring precise hand gestures and mastery of the canting tool. It stands as one of the most challenging pattern-making techniques in textile artistry. [1]
The primary objective of this dataset is to serve as a resource for research or academic or educational purposes rather than commercial endeavors. The dataset was meticulously compiled to include high-quality images representative of various types of Batik, encompassing the rich diversity of Batik Nusantara or Indonesian Batik from the Aceh to Papua regions.
Andrew has mentioned that the cornerstone of effective machine learning lies in the quality of the data. Meticulously curated datasets hold the power to unlock valuable insights and drive meaningful results. In other words, data is more important than models. In contrast, datasets lacking in quality may hinder the learning process and lead to suboptimal outcomes. Therefore, prioritizing data quality is paramount, as it lays the foundation for successful machine learning initiatives [2]. Also Sebastian added that the effectiveness of a machine learning algorithm greatly depends on the quality of the data and the richness of the information it encapsulates [3].
This dataset was meticulously carefully collected with the assistance of Ultralytics. The ownership of all images within this dataset belongs to respective parties, to whom we extend our gratitude for their contribution of these visually captivating images.
[Dataset creator's name]. ([Year & Month of dataset creation]). [Name of the dataset], [Version of the dataset]. Retrieved [Date Retrieved] from [URL of the dataset].
Comprising 40 raw images per class with image dimension of 224 x 224, this dataset encompasses a wide array of Batik designs, each representing a distinct category. The classes include 'Aceh PintuAceh', 'Bali Barong', 'Bali Merak', 'DKI OndelOndel', 'JawaBarat Megamendung', 'JawaTimur Pring', 'Kalimantan Dayak', 'Lampung Gajah', 'Madura Mataketeran', 'Maluku Pala', 'NTB Lumbung', 'Papua Asmat', 'Papua Cendrawasih', 'Papua Tifa', 'Solo Parang', 'SulawesiSelatan Lontara', 'SumateraBarat Rumah Minang', 'SumateraUtara Boraspati', 'Yogyakarta Kawung', and 'Yogyakarta Parang' [2][3][4][5][6][7]. These classes collectively portray the rich heritage of Batik Nusantara or Batik Indonesia, spanning from the Aceh to Papua regions.
Feel free to explore image augmentation techniques to further enhance the dataset.
Simple Coding is available @ git with assumption using Colab. For reference, the following pre-trained architectures have been added: VGG16, ResNet50, Xception, MobileNetV2, along with Content-Based Image Retrieval (CBIR), Random Forest, a CNN architecture, and modeling, in addition to the MLP. It is also available on Kaggle Dataset Notebooks (Code).
Below are steps to utilise the dataset using either Google Colab or Jupyter Notebook:
1. Begin by downloading the dataset.
2. Upon extraction, you'll find separate folders for training and testing data. Should you require validation data, either manually split a portion (approximately around 20%) from the training set and store it separately, or perform on-the-fly splitting during coding.
3. If splitting validation data manually, remember to re-zip the dataset after the separation process.
4....
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval (ArXiv)
Similar to most vision related tasks, deep learning models have taken over in the field of content-based image retrieval (CBIR) over the course of the last decade. However, most publications that aim to optimise neural networks for CBIR, train and test their models on domain specific datasets. It is therefore unclear, if those networks can be used as a general-purpose image feature extractor. After analyzing popular image retrieval test sets we decided to manually curate GPR1200, an easy to use and accessible but challenging benchmark dataset with 1200 categories and 10 class examples. Classes and images were manually selected from six publicly available datasets of different image areas, ensuring high class diversity and clean class boundaries.
https://github.com/Visual-Computing/GPR1200/raw/main/images/GPR_main_pic.jpg" alt="GPR1200">
Benchmark your Image Retrieval Models on It
https://github.com/Visual-Computing/GPR1200/raw/main/images/result_table.JPG" alt="Image Retrieval">