28 datasets found

LAS&T: Large Shape And Texture Dataset
zenodo.org
jpeg, zip
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sagi Eppel; Sagi Eppel (2025). LAS&T: Large Shape And Texture Dataset [Dataset]. http://doi.org/10.5281/zenodo.15453634
Explore at:
jpeg, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15453634
Dataset updated
May 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sagi Eppel; Sagi Eppel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Large Shape And Texture Dataset (LAS&T)

LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D with 650,000 images, based on real world shapes and textures.

Overview

The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes.

Main Dataset webpage

The dataset contain four parts parts:

3D shape recognition and retrieval.

2D shape recognition and retrieval.

3D Materials recognition and retrieval.

2D Texture recognition and retrieval.

Each can be used independently for training and testing.

Additional assets are a set of 350,000 natural 2D shapes extracted from real-world images (SHAPES_COLLECTION_350k.zip)

3D shape recognition real-world images benchmark

The scripts used to generate and test the dataset are supplied as in SCRIPT** files.

Shapes Recognition and Retrieval:

For shape recognition the goal is to identify the same shape in different images, where the material/texture/color of the shape is changed, the shape is rotated, and the background is replaced. Hence, only the shape remains the same in both images. All files with 3D shapes contain samples of the 3D shape dataset. This is tested for 3D shapes/objects with realistic light simulation. All files with 2D shapes contain samples of the 2D shape dataset. Examples files contain images with examples for each set.

Main files:

Real_Images_3D_shape_matching_Benchmarks.zip contains real-world image benchmarks for 3D shapes.

3D_Shape_Recognition_Synthethic_GENERAL_LARGE_SET_76k.zip A Large number of synthetic examples 3D shapes with max variability can be used for training/testing 3D shape/objects recognition/retrieval.

2D_Shapes_Recognition_Textured_Synthetic_Resize2_GENERAL_LARGE_SET_61k.zip A Large number of synthetic examples for 2D shapes with max variability can be used for training/testing 2D shape recognition/retrieval.

SHAPES_2D_365k.zip 365,000 2D shapes extracted from real-world images saved as black and white .png image files.

File structure:

All jpg images that are in the exact same subfolder contain the exact same shape (but with different texture/color/background/orientation).

Textures and Materials Recognition and Retrieval

For texture and materials, the goal is to identify and match images containing the same material or textures, however the shape/object on which the material texture is applied is different, and so is the background and light.

This is done for physics-based material in 3D and abstract 2D textures.

3D_Materials_PBR_Synthetic_GENERAL_LARGE_SET_80K.zip A Large number of examples of 3D materials in physics grounded can be used for training or testing of material recognition/retrieval.

2D_Textures_Recogition_GENERAL_LARGE_SET_Synthetic_53K.zip

Large number of images of 2D texture in maximum variability of setting can be used for training/testing 2D textured recognition/retrieval.

File structure:

All jpg images that are in the exact same subfolder contain the exact same texture/material (but overlay on different objects with different background/and illumination/orientation).

Data Generation:

The images in the synthetic part of the dataset were created by automatically extracting shapes and textures from natural images and combining them in synthetic images. This created synthetic images that completely rely on real-world patterns, making extremely diverse and complex shapes and textures. As far as we know this is the largest and most diverse shape and texture recognition/retrieval dataset. 3D data was generated using physics-based material and rendering (blender) making the images physically grounded and enabling using the data to train for real-world examples. The scripts for generating the data are supplied in files with the world SCRIPTS* in them.

Real-world image data:

For 3D shape recognition and retrieval, we also supply a real-world natural image benchmark. With a variety of natural images containing the exact same 3D shape but made/coated with different materials and in different environments and orientations. The goal is again to identify the same shape in different images. The benchmark is available at: Real_Images_3D_shape_matching_Benchmarks.zip

File structure:

Files containing the word 'GENERAL_LARGE_SET' contains synthetic images that can be used for training or testing, the type of data (2D shapes, 3D shapes, 2D textures, 3D materials) that appears in the file name, as well as the number of images. Files containing MultiTests contain a number of different tests in which only a single aspect of the aspect of the instance is changed (for example only the background.) File containing "SCRIPTS" contain data generation testing scripts. Images containing "examples" are example of each test.

Shapes Collections

The file SHAPES_COLLECTION_350k.zip contains 350,000 2D shapes extracted from natural images and used for the dataset generation.

Evaluating and Testing

For evaluating and testing see: SCRIPTS_Testing_LVLM_ON_LAST_VQA.zip
This can be use to test leading LVLMs using api, create human tests, and in general turn the dataset into multichoice question images similar to the one in the paper.
Shape Detector | InceptionV3 | Acc : 99.99%
kaggle.com
Updated Oct 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepNets (2022). Shape Detector | InceptionV3 | Acc : 99.99% [Dataset]. https://www.kaggle.com/datasets/utkarshsaxenadn/shape-detector-inceptionv3-acc-9999
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 24, 2022
Dataset provided by
Kaggle
Authors
DeepNets
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This data set contains the file for the models weights trained on the shape detector data set. This is a InceptionV3 model which achieved almost 100% accuracy onboard training and testing dataset..
NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep...
zenodo.org
text/x-python, zip
Updated Apr 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giulio Del Corso; Giulio Del Corso; Volpini Federico; Volpini Federico; Claudia Caudai; Claudia Caudai; Davide Moroni; Davide Moroni; Sara Colantonio; Sara Colantonio (2025). NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep learning models [Dataset]. http://doi.org/10.5281/zenodo.15194187
Explore at:
zip, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15194187
Dataset updated
Apr 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Giulio Del Corso; Giulio Del Corso; Volpini Federico; Volpini Federico; Claudia Caudai; Claudia Caudai; Davide Moroni; Davide Moroni; Sara Colantonio; Sara Colantonio
License
Attribution-NonCommercial-NoDerivs 2.5 (CC BY-NC-ND 2.5)https://creativecommons.org/licenses/by-nc-nd/2.5/
License information was derived automatically
Time period covered
Dec 18, 2024
Description
NADA (Not-A-Database) is an easy-to-use geometric shape data generator that allows users to define non-uniform multivariate parameter distributions to test novel methodologies. The full open-source package is provided at GIT:NA_DAtabase. See Technical Report for details on how to use the provided package.

This database includes 3 repositories:

NADA_Dis: Is the model able to correctly characterize/Disentangle a complex latent space?
The repository contains 3x100,000 synthetic black and white images to test the ability of the models to correctly define a proper latent space (e.g., autoencoders) and disentangle it. The first 100,000 images contain 4 shapes and uniform parameter space distributions, while the other images have a more complex underlying distribution (truncated Gaussian and correlated marginal variables).

NADA_OOD: Does the model identify Out-Of-Distribution images?
The repository contains 100,000 training images (4 different shapes with 3 possible colors located in the upper left corner of the canvas) and 6x100,000 increasingly different sets of images (changing the color class balance, reducing the radius of the shape, moving the shape to the lower left corner) providing increasingly challenging out-of-distribution images.
This can help to test not only the capability of a model, but also methods that produce reliability estimates and should correctly classify OOD elements as "unreliable" as they are far from the original distributions.

NADA_AlEp: Does the model distinguish between different types (Aleatoric/Epistemic) of uncertainties?
The repository contains 5x100,000 images with different type of noise/uncertainties:

NADA_AlEp_0_Clean: Dataset clean of noise to use as a possible training set.

NADA_AlEp_1_White_Noise: Epistemic white noise dataset. Each image is perturbed with an amount of white noise randomly sampled from 0% to 90%.

NADA_AlEp_2_Deformation: Dataset with Epistemic deformation noise. Each image is deformed by a randomly amount uniformly sampled between 0% and 90%. 0% corresponds to the original image, while 100% is a full deformation to the circumscribing circle.

NADA_AlEp_3_Label: Dataset with label noise. Formally, 20% of Triangles of a given color are missclassified as a Square with a random color (among Blue, Orange, and Brown) and viceversa (Squares to Triangles). Label noise introduces \textit{Aleatoric Uncertainty} because it is inherent in the data and cannot be reduced.

NADA_AlEp_4_Combined: Combined dataset with all previous sources of uncertainty.

Each image can be used for classification (shape/color) or regression (radius/area) tasks.

All datasets can be modified and adapted to the user's research question using the included open source data generator.
P
LAS&T: Large Shape & Texture Dataset Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LAS&T: Large Shape & Texture Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/las-t-large-shape-texture-dataset
Explore at:
Description
Large Shape and Texture dataset (LAS&T) is a giant dataset of shapes and textures for tasks of visual shapes and textures identification and retrieval from single image.

LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D with 650,000 images, based on real world shapes and textures

Overview The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes.

The dataset divided to several parts 3D shape recognition and retrieval.

2D shape recognition and retrieval.

3D Materials recognition and retrieval.

2D Texture recognition and retrieval.

Each can be used independently for training and testing.

Additional assets are a set of 350,000 natural 2D shapes extracted from real-world images

3D shape recognition real-world images benchmark
Training and testing data for deep learning assisted jet tomography
figshare.com
hdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LongGang Pang; zhong yang; Yayun He; wei chen; WeiYao Ke; Xin-Nian Wang (2023). Training and testing data for deep learning assisted jet tomography [Dataset]. http://doi.org/10.6084/m9.figshare.20422500.v1
Explore at:
hdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20422500.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
LongGang Pang; zhong yang; Yayun He; wei chen; WeiYao Ke; Xin-Nian Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
When energetic partons traverse the quark gluon plasma (QGP), they will deposite energy and momentum into the medium. Mach cones are expected to form whose opening angles are tightly related to the speed of sound of QGP. This provides a way to detect the QGP equation of state. However, the mach cones are distorted by the collective expansion of QGP. The distortions depend on the initial jet production positions and its travelling direction.

We trained a deep point cloud neural network to locate the iniital jet production positions using the momenta of final state hadrons with transverse momentum pt>2 GeV. This folder contains training and testing data for this AI4Science interdisplinary study.

There are 3 files in hdf5 format. 1. CoLBT_Hadrons_Frag.h5 (734.17 MB) , stores training and testing data from CoLBT model using fragmentation for particlization.

The data tables contained are listed below. gamma_pt_phi_eta_test Dataset {97908, 3} gamma_pt_phi_eta_train Dataset {78334, 3} hadrons_test Dataset {97908, 90, 6} hadrons_train Dataset {78334, 90, 6} ids_test Dataset {97908} ids_train Dataset {78334} jet_pt_phi_eta_test Dataset {97908, 3} jet_pt_phi_eta_train Dataset {78334, 3} jetxy_test Dataset {97908, 2} jetxy_train Dataset {78334, 2} where the data are split into training and testing sets. In the training set, gamma_pt_phi_eta_train is a 2D numpy array which stores the global information (pt, phi, pseudo-rapidity) of 78334 gamma triggers. hadrons_train is a numpy array of shape {78334, 90, 6} where 78334 is the number of events, 90 is the maximum number of hadrons in the jet cone and 6 is the number of features of each hadron. jet_pt_phi_eta_train is a numpy array of shape {78334, 3} where 3 stands for (pt, phi, eta) of the jet using jet finding algorithm. jetxy_train is a numpy array of shape {78334, 2} where 2 stands for (x, y). They are the jet production positions that the neural network is going to predict.

CoLBT_Hadrons_Comb.h5 (408.4 MB), , stores training and testing data from CoLBT model using combination for particlization.

The data tables contained are listed below. gamma_pt_phi_eta_test Dataset {19615, 3} gamma_pt_phi_eta_train Dataset {78334, 3} hadrons_test Dataset {19615, 90, 6} hadrons_train Dataset {78334, 90, 6} ids_test Dataset {19615} ids_train Dataset {78334} jet_pt_phi_eta_test Dataset {19615, 3} jet_pt_phi_eta_train Dataset {78334, 3} jetxy_test Dataset {19615, 2} jetxy_train Dataset {78334, 2} where the data are split into training and testing sets. In the training set, gamma_pt_phi_eta_train is a 2D numpy array which stores the global information (pt, phi, pseudo-rapidity) of 78334 gamma triggers. hadrons_train is a numpy array of shape {78334, 90, 6} where 78334 is the number of events, 90 is the maximum number of hadrons in the jet cone and 6 is the number of features of each hadron. jet_pt_phi_eta_train is a numpy array of shape {78334, 3} where 3 stands for (pt, phi, eta) of the jet using jet finding algorithm. jetxy_train is a numpy array of shape {78334, 2} where 2 stands for (x, y). They are the jet production positions that the neural network is going to predict.

Lido_Hadrons_Frag.h5 (186.22 MB), stores the testing data used for deep learning assisted jet tomography from LIDO Monte Carlo model which is different from CoLBT that is used for training. gamma_pt_phi_eta_test Dataset {44867, 3}, global information of gamma, (pt, phi, pseudo-rapidity) for 44867 events hadrons_test Dataset {44867, 90, 6}, final state hadrons for 44867 events, maximum number of hadrons is 90 for each event, number of features is 6 for each hadron. jet_pt_phi_eta_test Dataset {44867, 3}: the global information of the jet hadrons inside the cone. jetxy_test Dataset {44867, 2}: the production positions of the initial jet in the transverse plane.
h
basic_shapes_object_detection
huggingface.co
Updated Mar 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dries Verachtert (2024). basic_shapes_object_detection [Dataset]. https://huggingface.co/datasets/driesverachtert/basic_shapes_object_detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 23, 2024
Authors
Dries Verachtert
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Basic Shapes Object Detection

Description

This Basic Shapes Object Detection dataset has been created to test fine-tuning of object detection models. Fine-tuning some model to detect the basic shapes should be rather easy: just a bit of training should be enough to get the model to do correct object detection quite fast. Each entry in the dataset has a RGB PNG image with a white background and 3 basic geometric shapes:

A blue square A red circle A green triangle

All… See the full description on the dataset page: https://huggingface.co/datasets/driesverachtert/basic_shapes_object_detection.
T
fashion_mnist
tensorflow.org
opendatalab.com
+3more
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). fashion_mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/fashion_mnist
Explore at:
Dataset updated
Jun 1, 2024
Description
Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('fashion_mnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/fashion_mnist-3.0.1.png" alt="Visualization" width="500px">
Port Smoke Synthetic Dataset
kaggle.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NeuroBot (2025). Port Smoke Synthetic Dataset [Dataset]. https://www.kaggle.com/datasets/neurobotdata/port-smoke-synthetic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
NeuroBot
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview This dataset contains synthetic images of port scenes for training and testing port smoke monitoring AI systems. Each image simulates various smoke conditions in various port operating environments, as well as key information such as smoke of different shapes and concentrations. In particular, some smoke effects in rare scenarios are added to the images, such as dense black smoke caused by sudden fires and colored smoke caused by chemical leaks, aiming to challenge the machine learning model's ability to identify and analyze complex smoke conditions. This dataset is very valuable for projects focusing on computer vision, smoke detection and recognition, and port environment simulation. If you want to see more practical application cases of synthetic data in the port field, you can visit www.neurobot.co to schedule a demo, or register to upload personal images to generate customized synthetic data that meets your project needs.

Note Important disclaimer: This dataset is not part of any official port research, nor does it appear in peer-reviewed articles reviewed by port experts or security researchers. It is recommended for educational purposes only. The synthetic smoke and other elements in the images are not generated based on real port data. Do not use them in actual port smoke monitoring production systems without proper review by experts in the field of AI safety and port operation regulations. Please be responsible when using this dataset and fully consider the possible ethical implications.
f
Arabic Handwritten Characters Dataset
figshare.com
kaggle.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Loey (2023). Arabic Handwritten Characters Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.12236960.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12236960.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Mohamed Loey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Arabic Handwritten Characters DatasetAstractHandwritten Arabic character recognition systems face several challenges, including the unlimited variation in human handwriting and large public databases. In this work, we model a deep learning architecture that can be effectively apply to recognizing Arabic handwritten characters. A Convolutional Neural Network (CNN) is a special type of feed-forward multilayer trained in supervised mode. The CNN trained and tested our database that contain 16800 of handwritten Arabic characters. In this paper, the optimization methods implemented to increase the performance of CNN. Common machine learning methods usually apply a combination of feature extractor and trainable classifier. The use of CNN leads to significant improvements across different machine-learning classification algorithms. Our proposed CNN is giving an average 5.1% misclassification error on testing data.ContextThe motivation of this study is to use cross knowledge learned from multiple works to enhancement the performance of Arabic handwritten character recognition. In recent years, Arabic handwritten characters recognition with different handwriting styles as well, making it important to find and work on a new and advanced solution for handwriting recognition. A deep learning systems needs a huge number of data (images) to be able to make a good decisions.ContentThe data-set is composed of 16,800 characters written by 60 participants, the age range is between 19 to 40 years, and 90% of participants are right-hand. Each participant wrote each character (from ’alef’ to ’yeh’) ten times on two forms as shown in Fig. 7(a) & 7(b). The forms were scanned at the resolution of 300 dpi. Each block is segmented automatically using Matlab 2016a to determining the coordinates for each block. The database is partitioned into two sets: a training set (13,440 characters to 480 images per class) and a test set (3,360 characters to 120 images per class). Writers of training set and test set are exclusive. Ordering of including writers to test set are randomized to make sure that writers of test set are not from a single institution (to ensure variability of the test set).In an experimental section we showed that the results were promising with 94.9% classification accuracy rate on testing images. In future work, we plan to work on improving the performance of handwritten Arabic character recognition.AcknowledgementsAhmed El-Sawy, Mohamed Loey, Hazem EL-Bakry, Arabic Handwritten Characters Recognition using Convolutional Neural Network, WSEAS, 2017Our proposed CNN is giving an average 5.1% misclassification error on testing data.InspirationCreating the proposed database presents more challenges because it deals with many issues such as style of writing, thickness, dots number and position. Some characters have different shapes while written in the same position. For example the teh character has different shapes in isolated position.Benha Universityhttp://bu.edu.eg/staff/mloeyhttps://mloey.github.io/
T
cifar10
tensorflow.org
opendatalab.com
+3more
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). cifar10 [Dataset]. https://www.tensorflow.org/datasets/catalog/cifar10
Explore at:
Dataset updated
Jun 1, 2024
Description
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('cifar10', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/cifar10-3.0.2.png" alt="Visualization" width="500px">
f
Confusion matrix training and test data set.
plos.figshare.com
xls
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yvan J. Garcia-Lopez; Patricia Henostroza Marquez; Nicolas Nuñez Morales (2025). Confusion matrix training and test data set. [Dataset]. http://doi.org/10.1371/journal.pone.0321989.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0321989.t006
Dataset updated
Apr 24, 2025
Dataset provided by
PLOS ONE
Authors
Yvan J. Garcia-Lopez; Patricia Henostroza Marquez; Nicolas Nuñez Morales
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study is about what matters: predicting when microfinance institutions might fail, especially in places where financial stability is closely linked to economic inclusion. The challenge? Creating something practical and usable. The Adjusted Gross Granular Model (ARGM) model comes here. It combines clever techniques, such as granular computing and machine learning, to handle messy and imbalanced data, ensuring that the model is not just a theoretical concept but a practical tool that can be used in the real world.Data from 56 financial institutions in Peru was analyzed over almost a decade (2014–2023). The results were quite promising. The model detected risks with nearly 90% accuracy in detecting failures and was right more than 95% of the time in identifying safe institutions. But what does this mean in practice? It was tested and flagged six institutions (20% of the total) as high risk. This tool’s impact on emerging markets would be very significant. Financial regulators could act in advance with this model, potentially preventing financial disasters. This is not just a theoretical exercise but a practical solution to a pressing problem in these markets, where every failure has domino effects on small businesses and clients in local communities, who may see their life savings affected and lost due to the failure of these institutions. Ultimately, this research is not just about a machine learning model or using statistics to evaluate results. It is about giving regulators and supervisors of financial institutions a tool they can rely on to help them take action before it is too late when microfinance institutions get into bad financial shape and to make immediate decisions in the event of a possible collapse.
P
Gemstones Images Dataset Dataset
paperswithcode.com
Updated Jun 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Gemstones Images Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/https-github-com-orgs-dataset-ninja
Explore at:
Dataset updated
Jun 22, 2025
Description
Description:

👉 Download the dataset here

Classes: 87 distinct gemstone classes for accurate classification

Context: This dataset aims to assist algorithms in distinguishing between various gemstones, such as ruby, amethyst, and emerald, by providing labeled images.

Download Dataset

Content:

Overview: The dataset comprises over 3,200 high-quality images of gemstones, organized into 87 classes. These images are pre-sorted into training and testing sets, facilitating easy implementation in machine learning projects.

Details:

Shapes Included: The images capture gemstones in various shapes including round, oval, square, rectangle, and heart.

Structure:

Train Folder (~56 MB): Contains 87 subfolders, each representing a different class, with approximately 2,800 images in .jpeg format.

Test Folder (~8 MB): Also contains 87 subfolders, with around 400 images in total, ensuring a balanced test set for evaluating model performance.

Use Cases:

Classification Tasks: Ideal for training classification models to identify gemstone types.

Computer Vision Projects: Suitable for projects requiring detailed image analysis and pattern recognition in diverse shapes and colors.

E-commerce Applications: Can be used to develop systems for automatic gemstone recognition and cataloging in online marketplaces.

Additional Information:

File Formats: All images are provided in .jpeg format, ensuring compatibility with various image processing tools.

Data Quality: Images are carefully labeled and curated to maintain high quality and relevance.

This dataset is sourced from Kaggle.
f
Arabic Handwritten Digits Dataset
figshare.com
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Loey (2023). Arabic Handwritten Digits Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.12236948.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12236948.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Mohamed Loey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Arabic Handwritten Digits DatasetAbstractIn recent years, handwritten digits recognition has been an important areadue to its applications in several fields. This work is focusing on the recognitionpart of handwritten Arabic digits recognition that face several challenges, includingthe unlimited variation in human handwriting and the large public databases. Thepaper provided a deep learning technique that can be effectively apply to recognizing Arabic handwritten digits. LeNet-5, a Convolutional Neural Network (CNN)trained and tested MADBase database (Arabic handwritten digits images) that contain 60000 training and 10000 testing images. A comparison is held amongst theresults, and it is shown by the end that the use of CNN was leaded to significantimprovements across different machine-learning classification algorithms.The Convolutional Neural Network was trained and tested MADBase database (Arabic handwritten digits images) that contain 60000 training and 10000 testing images. Moreover, the CNN is giving an average recognition accuracy of 99.15%.ContextThe motivation of this study is to use cross knowledge learned from multiple works to enhancement the performance of Arabic handwritten digits recognition. In recent years, Arabic handwritten digits recognition with different handwriting styles as well, making it important to find and work on a new and advanced solution for handwriting recognition. A deep learning systems needs a huge number of data (images) to be able to make a good decisions.ContentThe MADBase is modified Arabic handwritten digits database contains 60,000 training images, and 10,000 test images. MADBase were written by 700 writers. Each writer wrote each digit (from 0 -9) ten times. To ensure including different writing styles, the database was gathered from different institutions: Colleges of Engineering and Law, School of Medicine, the Open University (whose students span a wide range of ages), a high school, and a governmental institution.MADBase is available for free and can be downloaded from (http://datacenter.aucegypt.edu/shazeem/) .AcknowledgementsCNN for Handwritten Arabic Digits Recognition Based on LeNet-5http://link.springer.com/chapter/10.1007/978-3-319-48308-5_54Ahmed El-Sawy, Hazem El-Bakry, Mohamed LoeyProceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016Volume 533 of the series Advances in Intelligent Systems and Computing pp 566-575InspirationCreating the proposed database presents more challenges because it deals with many issues such as style of writing, thickness, dots number and position. Some characters have different shapes while written in the same position. For example the teh character has different shapes in isolated position.Arabic Handwritten Characters Datasethttps://www.kaggle.com/mloey1/ahcd1Benha Universityhttp://bu.edu.eg/staff/mloeyhttps://mloey.github.io/
Z
Global Wheat Head Dataset 2021
data.niaid.nih.gov
zenodo.org
Updated Jul 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DAVID Etienne (2021). Global Wheat Head Dataset 2021 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5092308
Explore at:
Dataset updated
Jul 13, 2021
Dataset authored and provided by
DAVID Etienne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the full Global Wheat Head Dataset 2021. Labels are included in csv.

Tutorials available here: https://www.aicrowd.com/challenges/global-wheat-challenge-2021

🕵️ Introduction

Wheat is the basis of the diet of a large part of humanity. Therefore, this cereal is widely studied by scientists to ensure food security. A tedious, yet important part of this research is the measurement of different characteristics of the plants, also known as Plant Phenotyping. Monitoring plant architectural characteristics allow the breeders to grow better varieties and the farmers to make better decisions, but this critical step is still done manually. The emergence of UAV, camera and smartphone makes in-field RGB images more available and could be a solution to manual measurement. For instance, the counting of the wheat head can be done with Deep Learning. However, this task can be visually challenging. There is often an overlap of dense wheat plants, and the wind can blur the photographs, making identify single heads difficult. Additionally, appearances vary due to maturity, colour, genotype, and head orientation. Finally, because wheat is grown worldwide, different varieties, planting densities, patterns, and field conditions must be considered. To end manual counting, a robust algorithm must be created to address all these issues.

💾 Dataset

The dataset is composed of more than 6000 images of 1024x1024 pixels containing 300k+ unique wheat heads, with the corresponding bounding boxes. The images come from 11 countries and covers 44 unique measurement sessions. A measurement session is a set of images acquired at the same location, during a coherent timestamp (usually a few hours), with a specific sensor. In comparison to the 2020 competition on Kaggle, it represents 4 new countries, 22 new measurements sessions, 1200 new images and 120k new wheat heads. This amount of new situations will help to reinforce the quality of the test dataset. The 2020 dataset was labelled by researchers and students from 9 institutions across 7 countries. The additional data have been labelled by Human in the Loop, an ethical AI labelling company. We hope these changes will help in finding the most robust algorithms possible!

The task is to localize the wheat head contained in each image. The goal is to obtain a model which is robust to variation in shape, illumination, sensor and locations. A set of boxes coordinates is provided for each image.

The training dataset will be the images acquired in Europe and Canada, which cover approximately 4000 images and the test dataset will be composed of the images from North America (except Canada), Asia, Oceania and Africa and covers approximately 2000 images. It represents 7 new measurements sessions available for training but 17 new measurements sessions for the test!

📁 Files

Following files are available in the resources section:

images: the folder contains all images

competition_train.csv , competition_val.csv, competition_test.csv : contains the splits used for the 2021 Global Wheat Challenge

Val contains the "public test", which is the test set of Global Wheat Head 2020

Test contains the "private test".

Metadata.csv : contains additional metadatas for each domain

💻 Labels

All boxes are contained in a csv with three columns image_name, BoxesString and domain

image_name is the name of the image, without the suffix. All images have a .png extension

BoxesString is a string containing all predicted boxes with the format [x_min,y_min, x_max,y_max]. To concatenate a list of boxes into a PredString, please concatenate all list of coordinates with one space (" ") and all boxes with one semi-column ";". If there is no box, BoxesString is equal to "no_box".

domain give the domain for each image

If you use the dataset for your research, please do not forget to quote:

@article{david2020global, title={Global Wheat Head Detection (GWHD) dataset: a large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods}, author={David, Etienne and Madec, Simon and Sadeghi-Tehran, Pouria and Aasen, Helge and Zheng, Bangyou and Liu, Shouyang and Kirchgessner, Norbert and Ishikawa, Goro and Nagasawa, Koichi and Badhon, Minhajul A and others}, journal={Plant Phenomics}, volume={2020}, year={2020}, publisher={Science Partner Journal} }

@misc{david2021global, title={Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods}, author={Etienne David and Mario Serouart and Daniel Smith and Simon Madec and Kaaviya Velumani and Shouyang Liu and Xu Wang and Francisco Pinto Espinosa and Shahameh Shafiee and Izzat S. A. Tahir and Hisashi Tsujimoto and Shuhei Nasuda and Bangyou Zheng and Norbert Kichgessner and Helge Aasen and Andreas Hund and Pouria Sadhegi-Tehran and Koichi Nagasawa and Goro Ishikawa and Sébastien Dandrifosse and Alexis Carlier and Benoit Mercatoris and Ken Kuroki and Haozhou Wang and Masanori Ishii and Minhajul A. Badhon and Curtis Pozniak and David Shaner LeBauer and Morten Lilimo and Jesse Poland and Scott Chapman and Benoit de Solan and Frédéric Baret and Ian Stavness and Wei Guo}, year={2021}, eprint={2105.07660}, archivePrefix={arXiv}, primaryClass={cs.CV} }
a
Cat Annotation Dataset Merged
academictorrents.com
bittorrent
Updated Jul 2, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weiwei Zhang and Jian Sun and Xiaoou Tang (2014). Cat Annotation Dataset Merged [Dataset]. https://academictorrents.com/details/c501571c29d16d7f41d159d699d0e7fb37092cbd
Explore at:
bittorrent(1980831996)Available download formats
Dataset updated
Jul 2, 2014
Dataset authored and provided by
Weiwei Zhang and Jian Sun and Xiaoou Tang
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Cat Annotation Dataset The CAT dataset includes 10,000 cat images. For each image, we annotate the head of cat with nine points, two for eyes, one for mouth, and six for ears. The detail configuration of the annotation was shown in Figure 6 of the original paper: Weiwei Zhang, Jian Sun, and Xiaoou Tang, "Cat Head Detection - How to Effectively Exploit Shape and Texture Features", Proc. of European Conf. Computer Vision, vol. 4, pp.802-816, 2008. ### Format The annotation data are stored in a file with the name of the corresponding cat image plus ".cat", one annotation file for each cat image. For each annotation file, the annotation data are stored in the following sequence: 1. Number of points (always 9) 2. Left Eye 3. Right Eye 4. Mouth 5. Left Ear-1 6. Left Ear-2 7. Left Ear-3 8. Right Ear-1 9. Right Ear-2 10. Right Ear-3 ### Training, Validation, and Testing We randomly divide the data into three sets: 5,000 images for training, 2,000 images for valid
Helsinki Tomography Challenge 2022 open tomographic dataset (HTC 2022)
zenodo.org
bin, png
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Meaney; Alexander Meaney; Fernando Silva de Moura; Fernando Silva de Moura; Samuli Siltanen; Samuli Siltanen (2024). Helsinki Tomography Challenge 2022 open tomographic dataset (HTC 2022) [Dataset]. http://doi.org/10.5281/zenodo.6967128
Explore at:
bin, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6967128
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander Meaney; Alexander Meaney; Fernando Silva de Moura; Fernando Silva de Moura; Samuli Siltanen; Samuli Siltanen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Helsinki
Description
This dataset is primarily designed for the Helsinki Tomography Challenge 2022 (HTC 2022), but it can be used for generic algorithm research and development in 2D CT reconstruction.

The dataset contains 2D tomographic measurements, i.e., sinograms and the affiliated metadata containing measurement geometry and other specifications. The sinograms have already been pre-processed with background and flat-field corrections, and compensated for a slightly misaligned center of rotation in the cone-beam computed tomography scanner. The log-transforms from intensity measurements to attenuation data have also been already computed. The data has been stored as MATLAB structs and saved in .mat file format.

The purpose of HTC 2022 is to develop algorithms for limited angle tomography. The challenge data consists of tomographic measurements of a set of plastic phantoms with a diameter of 7 cm and with holes of differing shapes cut into them.

The currently available dataset contains five training phantoms with full angular data. These are designed to facilitate algorithm development and benchmarking for the challenge itself. Four of the training phantoms contain holes. These are labeled ta, tb, tc, and td. A fifth training phantom is a solid disc with no holes. We encourage subsampling these datasets to create limited data sinograms and comparing the reconstruction results to the ground truth obtainable from the full-data sinograms. Note that the phantoms are not all identically centered.

The actual challenge data will be arranged into seven different difficulty levels, labeled 1-7, with each level containing three different phantoms, labeled A-C. As the difficulty level increases, the number of holes increases and their shapes become increasingly complex. Furthermore, the view angle is reduced as the difficulty level increases, starting with a 90 degree field of view at level 1, and reducing by 10 degrees at each increasing level of difficulty. The view-angles in the challenge data will not all begin from 0 degrees.

As the orientation of CT reconstructions can depend on the tools used, we have included example reconstructions for each of the phantoms to demonstrate how the reconstructions obtained from the sinograms and the specified geometry should be oriented. The reconstructions have been computed using the filtered back-projection algorithm provided by the ASTRA Toolbox.

We have also included segmentation examples of the reconstructions to demonstrate the desired format for the final competition entries. The segmentation images for obtained by the following steps:
1) Set all negative pixel values in the reconstruction to zero.
2) Determine a threshold level using Otsu's method.
3) Globally threshold the image using the threshold level.
4) Perform a morphological closing on the image using a disc with a radius of 3 pixels.

The competitors do not need to follow the above procedure, and are encouraged to explore various segmentation techniques for the limited angle reconstructions.

Also included in this dataset is a MATLAB example script for how to work with the CT data.

For getting started, we recommend the following MATLAB toolboxes:

HelTomo - Helsinki Tomography Toolbox
https://github.com/Diagonalizable/HelTomo/

The ASTRA Toolbox
https://www.astra-toolbox.com/

Spot – A Linear-Operator Toolbox
https://www.cs.ubc.ca/labs/scl/spot/

Note that using the above toolboxes for the Challenge is by no means compulsory: the metadata for each dataset contains a full specification of the measurement geometry, and the competitors are free to use any and all computational tools they want to in computing the reconstructions and segmentations.

The full data for all the test phantoms will be released after the Helsinki Tomography Challenge 2022 has ended.

All measurements were conducted at the Industrial Mathematics Computed Tomography Laboratory at the University of Helsinki.
Style Transfer for Object Detection in Art
kaggle.com
Updated Mar 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Kadish (2021). Style Transfer for Object Detection in Art [Dataset]. https://www.kaggle.com/datasets/davidkadish/style-transfer-for-object-detection-in-art/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 11, 2021
Dataset provided by
Kaggle
Authors
David Kadish
Description
Context

Despite recent advances in object detection using deep learning neural networks, these neural networks still struggle to identify objects in art images such as paintings and drawings. This challenge is known as the cross depiction problem and it stems in part from the tendency of neural networks to prioritize identification of an object's texture over its shape. In this paper we propose and evaluate a process for training neural networks to localize objects - specifically people - in art images. We generated a large dataset for training and validation by modifying the images in the COCO dataset using AdaIn style transfer (style-coco.tar.xz). This dataset was used to fine-tune a Faster R-CNN object detection network (2020-12-10_09-45-15_58672_resnet152_stylecoco_epoch_15.pth), which is then tested on the existing People-Art testing dataset (PeopleArt-Coco.tar.xz). The result is a significant improvement on the state of the art and a new way forward for creating datasets to train neural networks to process art images.

Content

2020-12-10_09-45-15_58672_resnet152_stylecoco_epoch_15.pth: Trained object detection network (Faster-RCNN using a ResNet152 backbone pretrained on ImageNet) for use with PyTorch PeopleArt-Coco.tar.xz: People-Art dataset with COCO-formatted annotations (original at https://github.com/BathVisArtData/PeopleArt) style-coco.tar.xz: Stylized COCO dataset containing only the person category. Used to train 2020-12-10_09-45-15_58672_resnet152_stylecoco_epoch_15.pth

Code

The code is available on github at https://github.com/dkadish/Style-Transfer-for-Object-Detection-in-Art

Citing

If you are using this code or the concept of style transfer for object detection in art, please cite our paper (https://arxiv.org/abs/2102.06529):

D. Kadish, S. Risi, and A. S. Løvlie, “Improving Object Detection in Art Images Using Only Style Transfer,” Feb. 2021.
P
ModelNet Dataset
paperswithcode.com
Updated Jan 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhirong Wu; Shuran Song; Aditya Khosla; Fisher Yu; Linguang Zhang; Xiaoou Tang; Jianxiong Xiao (2023). ModelNet Dataset [Dataset]. https://paperswithcode.com/dataset/modelnet
Explore at:
Dataset updated
Jan 29, 2023
Authors
Zhirong Wu; Shuran Song; Aditya Khosla; Fisher Yu; Linguang Zhang; Xiaoou Tang; Jianxiong Xiao
Description
The ModelNet40 dataset contains synthetic object point clouds. As the most widely used benchmark for point cloud analysis, ModelNet40 is popular because of its various categories, clean shapes, well-constructed dataset, etc. The original ModelNet40 consists of 12,311 CAD-generated meshes in 40 categories (such as airplane, car, plant, lamp), of which 9,843 are used for training while the rest 2,468 are reserved for testing. The corresponding point cloud data points are uniformly sampled from the mesh surfaces, and then further preprocessed by moving to the origin and scaling into a unit sphere.
Z
Data from: SIDDA: SInkhorn Dynamic Domain Adaptation for Image...
data.niaid.nih.gov
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pandya, Sneh (2025). SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14583106
Explore at:
Dataset updated
Jan 23, 2025
Dataset authored and provided by
Pandya, Sneh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets used in the paper "SIDDA: SInkhorn Dynamic Domain Adaptation for Image Classification with Equivariant Neural Networks"

Abstract:

Modern deep learning models often do not generalize well in the presence of a "covariate shift"; that is, in situations where the training and test data distributions differ, but the conditional distribution of classification labels given the data remains unchanged. In such cases, neural network (NN) generalization can be reduced to a problem of learning more robust, domain-invariant features that enable the correct alignment of the two datasets in the network's latent space. Domain adaptation (DA) methods include a broad range of techniques aimed at achieving this, which allows the model to perform well on multiple datasets. However, these methods have struggled with the need for extensive hyperparameter tuning, which then incurs significant computational costs. In this work, we introduce SIDDA, an out-of-the-box DA training algorithm built upon the Sinkhorn divergence, that can achieve effective domain alignment with minimal hyperparameter tuning and computational overhead. We demonstrate the efficacy of our method on multiple simulated and real datasets of varying complexity, including simple shapes, handwritten digits, and real astronomical observational data. These datasets include covariate shifts induced by noise and blurring, as well as more complex differences between real astronomical data observed by different telescopes. SIDDA is compatible with a variety of NN architectures, and it works particularly well in improving classification accuracy and model calibration when paired with equivariant neural networks (ENNs), which respect data symmetries by design. We find that SIDDA consistently improves the generalization capabilities of NNs, enhancing classification accuracy in unlabeled target data by up to 40%. Simultaneously, the inclusion of SIDDA during training can improve performance on the labeled source data, though with a more modest enhancement of approximately 1%. We also study the efficacy of DA on ENNs with respect to the varying group orders of the dihedral group D_N, and find that the model performance improves as the degree of equivariance increases. Finally, we find that SIDDA can also improve the model calibration on both source and target data. The largest improvements are obtained when the model is applied to the unlabeled target domain, reaching more than an order of magnitude improvement in both the expected calibration error and the Brier score. SIDDA's versatility across various NN models and datasets, combined with its automated approach to domain alignment, has the potential to significantly advance multi-dataset studies by enabling the development of highly generalizable models.

Datasets:

Dataset directories include train and test subdirectories, which include the source and target domain data within them. The simulated datasets of shapes and astronomical objects were generated using DeepBench, with code for noise and PSF blurring found on our Github. The MNIST-M dataset can be found publically, and the Galaxy Zoo Evo dataset can be accessed following the steps on HuggingFace. Data was split into an 80%/20% train/test split.

Simulated shapes:

train:

source

target (noise)

test:

source

target (noise)

Simulated astronomical objects:

train:

source

target (noise)

test:

source

target (noise

MNIST-M:

train:

source

target (noise)

target (PSF)

test:

source

target (noise)

target (PSF)

Galaxy Zoo Evo:

train:

source (GZ SDSS)

target (GZ DESI)

test:

source (GZ SDSS)

target (GZ DESI)

Paper Data:

Data for generating Figures 4 and 5 in the paper are included in isomap_plot_data.zip and js_distances_group_order.zip, respectively. The code for generating the figures can be found in the notebooks on our Github. Figures 2 and 3 are visualizations of the datasets included here.
Data from: MPOSE2021: a Dataset for Short-time Pose-based Human Action...
zenodo.org
zip
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vittorio Mazzia; Vittorio Mazzia; Simone Angarano; Simone Angarano; Francesco Salvetti; Francesco Salvetti; Federico Angelini; Federico Angelini; Marcello Chiaberge; Marcello Chiaberge (2023). MPOSE2021: a Dataset for Short-time Pose-based Human Action Recognition [Dataset]. http://doi.org/10.5281/zenodo.5506689
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5506689
Dataset updated
Jan 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vittorio Mazzia; Vittorio Mazzia; Simone Angarano; Simone Angarano; Francesco Salvetti; Francesco Salvetti; Federico Angelini; Federico Angelini; Marcello Chiaberge; Marcello Chiaberge
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MPOSE2021

MPOSE2021 is a Dataset for short-time pose-based Human Action Recognition (HAR). MPOSE2021 is specifically designed to perform short-time Human Action Recognition, as presented in [12].

MPOSE2021 is developed as an evolution of the MPOSE Dataset [1-3]. It is made by human pose data detected by OpenPose [4] and Posenet [11] on popular datasets for HAR, i.e. Weizmann [5], i3DPost [6], IXMAS [7], KTH [8], UTKinetic-Action3D (RGB only) [9] and UTD-MHAD (RGB only) [10], alongside original video datasets, i.e. ISLD and ISLD-Additional-Sequences [1]. Since these datasets have heterogenous action labels, each dataset labels is remapped to a common and homogeneous list of actions.

To properly use MPOSE2021 and all the functionalities developed by the authors, we recommend using the official repository MPOSE2021_Dataset.

Dataset Description

The repository contains 3 datasets (namely 1, 2 and 3) which consist of the same data divided in different train/test splits. Each dataset contains X and y numpy arrays for both training and testing. X has the following shape:

(number_of_samples, time_window, number_of_keypoints, x_y_p)

where

time_window = 30

number_of_keypoints = 17 (PoseNet) or 13 (OpenPose)

x_y_p contains 2D keypoint coordinates (x,y) in the original video reference frame and the keypoint confidence (p <= 1)

References

[1] F. Angelini, Z. Fu, Y. Long, L. Shao and S. M. Naqvi, "2D Pose-based Real-time Human Action Recognition with Occlusion-handling," in IEEE Transactions on Multimedia. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8853267&isnumber=4456689

[2] F. Angelini, J. Yan and S. M. Naqvi, "Privacy-preserving Online Human Behaviour Anomaly Detection Based on Body Movements and Objects Positions," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 8444-8448. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8683026&isnumber=8682151

[3] F. Angelini and S. M. Naqvi, "Joint RGB-Pose Based Human Action Recognition for Anomaly Detection Applications," 2019 22th International Conference on Information Fusion (FUSION), Ottawa, ON, Canada, 2019, pp. 1-7. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9011277&isnumber=9011156

[4] Cao, Zhe, et al. "OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields." IEEE transactions on pattern analysis and machine intelligence 43.1 (2019): 172-186.

[5] Gorelick, Lena, et al. "Actions as space-time shapes." IEEE transactions on pattern analysis and machine intelligence 29.12 (2007): 2247-2253.

[6] Starck, Jonathan, and Adrian Hilton. "Surface capture for performance-based animation." IEEE computer graphics and applications 27.3 (2007): 21-31.

[7] Weinland, Daniel, Mustafa Özuysal, and Pascal Fua. "Making action recognition robust to occlusions and viewpoint changes." European Conference on Computer Vision. Springer, Berlin, Heidelberg, 2010.

[8] Schuldt, Christian, Ivan Laptev, and Barbara Caputo. "Recognizing human actions: a local SVM approach." Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004. Vol. 3. IEEE, 2004.

[9] L. Xia, C.C. Chen and JK Aggarwal. "View invariant human action recognition using histograms of 3D joints", 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 20-27, 2012.

[10] C. Chen, R. Jafari, and N. Kehtarnavaz. "UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing a Depth Camera and a Wearable Inertial Sensor". Proceedings of IEEE International Conference on Image Processing, Canada, 2015.

[11] G. Papandreou, T. Zhu, L.C. Chen, S. Gidaris, J. Tompson, K. Murphy. "PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model". Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 269-286

[12] V. Mazzia, S. Angarano, F. Salvetti, F. Angelini, M. Chiaberge. "Action Transformer: A Self-Attention Model for Short-Time Human Action Recognition". arXiv preprint (https://arxiv.org/abs/2107.00606), 2021.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sagi Eppel; Sagi Eppel (2025). LAS&T: Large Shape And Texture Dataset [Dataset]. http://doi.org/10.5281/zenodo.15453634

LAS&T: Large Shape And Texture Dataset

Explore at:

jpeg, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15453634

Dataset updated

May 26, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Sagi Eppel; Sagi Eppel

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Large Shape And Texture Dataset (LAS&T)

LAS&T is the largest and most diverse dataset for shape, texture and material recognition and retrieval in 2D and 3D with 650,000 images, based on real world shapes and textures.

Overview

The LAS&T Dataset aims to test the most basic aspect of vision in the most general way. Mainly the ability to identify any shape, texture, and material in any setting and environment, without being limited to specific types or classes of objects, materials, and environments. For shapes, this means identifying and retrieving any shape in 2D or 3D with every element of the shape changed between images, including the shape material and texture, orientation, size, and environment. For textures and materials, the goal is to recognize the same texture or material when appearing on different objects, environments, and light conditions. The dataset relies on shapes, textures, and materials extracted from real-world images, leading to an almost unlimited quantity and diversity of real-world natural patterns. Each section of the dataset (shapes, and textures), contains 3D parts that rely on physics-based scenes with realistic light materials and object simulation and abstract 2D parts. In addition, the real-world benchmark for 3D shapes.

Main Dataset webpage

The dataset contain four parts parts:

3D shape recognition and retrieval.

2D shape recognition and retrieval.

3D Materials recognition and retrieval.

2D Texture recognition and retrieval.

Each can be used independently for training and testing.

Additional assets are a set of 350,000 natural 2D shapes extracted from real-world images (SHAPES_COLLECTION_350k.zip)

3D shape recognition real-world images benchmark

The scripts used to generate and test the dataset are supplied as in SCRIPT** files.

Shapes Recognition and Retrieval:

For shape recognition the goal is to identify the same shape in different images, where the material/texture/color of the shape is changed, the shape is rotated, and the background is replaced. Hence, only the shape remains the same in both images. All files with 3D shapes contain samples of the 3D shape dataset. This is tested for 3D shapes/objects with realistic light simulation. All files with 2D shapes contain samples of the 2D shape dataset. Examples files contain images with examples for each set.

Main files:

Real_Images_3D_shape_matching_Benchmarks.zip contains real-world image benchmarks for 3D shapes.

3D_Shape_Recognition_Synthethic_GENERAL_LARGE_SET_76k.zip A Large number of synthetic examples 3D shapes with max variability can be used for training/testing 3D shape/objects recognition/retrieval.

2D_Shapes_Recognition_Textured_Synthetic_Resize2_GENERAL_LARGE_SET_61k.zip A Large number of synthetic examples for 2D shapes with max variability can be used for training/testing 2D shape recognition/retrieval.

SHAPES_2D_365k.zip 365,000 2D shapes extracted from real-world images saved as black and white .png image files.

File structure:

All jpg images that are in the exact same subfolder contain the exact same shape (but with different texture/color/background/orientation).

Textures and Materials Recognition and Retrieval

For texture and materials, the goal is to identify and match images containing the same material or textures, however the shape/object on which the material texture is applied is different, and so is the background and light.

This is done for physics-based material in 3D and abstract 2D textures.

3D_Materials_PBR_Synthetic_GENERAL_LARGE_SET_80K.zip A Large number of examples of 3D materials in physics grounded can be used for training or testing of material recognition/retrieval.

2D_Textures_Recogition_GENERAL_LARGE_SET_Synthetic_53K.zip

Large number of images of 2D texture in maximum variability of setting can be used for training/testing 2D textured recognition/retrieval.

File structure:

All jpg images that are in the exact same subfolder contain the exact same texture/material (but overlay on different objects with different background/and illumination/orientation).

Data Generation:

The images in the synthetic part of the dataset were created by automatically extracting shapes and textures from natural images and combining them in synthetic images. This created synthetic images that completely rely on real-world patterns, making extremely diverse and complex shapes and textures. As far as we know this is the largest and most diverse shape and texture recognition/retrieval dataset. 3D data was generated using physics-based material and rendering (blender) making the images physically grounded and enabling using the data to train for real-world examples. The scripts for generating the data are supplied in files with the world SCRIPTS* in them.

Real-world image data:

For 3D shape recognition and retrieval, we also supply a real-world natural image benchmark. With a variety of natural images containing the exact same 3D shape but made/coated with different materials and in different environments and orientations. The goal is again to identify the same shape in different images. The benchmark is available at: Real_Images_3D_shape_matching_Benchmarks.zip

File structure:

Files containing the word 'GENERAL_LARGE_SET' contains synthetic images that can be used for training or testing, the type of data (2D shapes, 3D shapes, 2D textures, 3D materials) that appears in the file name, as well as the number of images. Files containing MultiTests contain a number of different tests in which only a single aspect of the aspect of the instance is changed (for example only the background.) File containing "SCRIPTS" contain data generation testing scripts. Images containing "examples" are example of each test.

Shapes Collections

The file SHAPES_COLLECTION_350k.zip contains 350,000 2D shapes extracted from natural images and used for the dataset generation.

Evaluating and Testing

For evaluating and testing see: SCRIPTS_Testing_LVLM_ON_LAST_VQA.zip
This can be use to test leading LVLMs using api, create human tests, and in general turn the dataset into multichoice question images similar to the one in the paper.

Clear search

Close search

Google apps

Main menu

LAS&T: Large Shape And Texture Dataset

The Large Shape And Texture Dataset (LAS&T)

Overview

Main Dataset webpage

The dataset contain four parts parts:

Shapes Recognition and Retrieval:

Main files:

Real_Images_3D_shape_matching_Benchmarks.zip contains real-world image benchmarks for 3D shapes.

File structure:

Textures and Materials Recognition and Retrieval

File structure:

Data Generation:

Real-world image data:

File structure:

Shapes Collections

Evaluating and Testing

Shape Detector | InceptionV3 | Acc : 99.99%

NADA-SynShapes: A synthetic shape benchmark for testing probabilistic deep...

LAS&T: Large Shape & Texture Dataset Dataset

Training and testing data for deep learning assisted jet tomography

basic_shapes_object_detection

fashion_mnist

Port Smoke Synthetic Dataset

Arabic Handwritten Characters Dataset

cifar10

Confusion matrix training and test data set.

Gemstones Images Dataset Dataset

Arabic Handwritten Digits Dataset

Global Wheat Head Dataset 2021

Cat Annotation Dataset Merged

Helsinki Tomography Challenge 2022 open tomographic dataset (HTC 2022)

Style Transfer for Object Detection in Art

Context

Content

Code

Citing

ModelNet Dataset

Data from: SIDDA: SInkhorn Dynamic Domain Adaptation for Image...

Data from: MPOSE2021: a Dataset for Short-time Pose-based Human Action...

LAS&T: Large Shape And Texture Dataset

The Large Shape And Texture Dataset (LAS&T)

Overview

Main Dataset webpage

The dataset contain four parts parts:

Shapes Recognition and Retrieval:

Main files:

Real_Images_3D_shape_matching_Benchmarks.zip contains real-world image benchmarks for 3D shapes.

File structure:

Textures and Materials Recognition and Retrieval

File structure:

Data Generation:

Real-world image data:

File structure:

Shapes Collections

Evaluating and Testing