Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt" (1984), specifically prepared for YOLO (You Only Look Once) object detection training. The dataset includes sprites of the iconic hunting dog and ducks in various states, augmented to provide a balanced and comprehensive training set for computer vision models.
Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes
| Metric | Value |
|---|---|
| Total Images | 1,004 |
| Dataset Size | 12 MB |
| Image Format | PNG |
| Annotation Format | YOLO (.txt) |
| Classes | 4 |
| Train/Val Split | 711/260 (73%/27%) |
| Class ID | Class Name | Count | Description |
|---|---|---|---|
| 0 | dog | 252 | The hunting dog in various poses (jumping, laughing, sniffing, etc.) |
| 1 | duck_dead | 256 | Dead ducks (both black and red variants) |
| 2 | duck_shot | 248 | Ducks in the moment of being shot |
| 3 | duck_flying | 248 | Flying ducks in all directions (left, right, diagonal) |
yolo_dataset_augmented/
āāā images/
ā āāā train/ # 711 training images
ā āāā val/ # 260 validation images
āāā labels/
ā āāā train/ # 711 YOLO annotation files
ā āāā val/ # 260 YOLO annotation files
āāā classes.txt # Class names mapping
āāā dataset.yaml # YOLO configuration file
āāā augmented_dataset_stats.json # Detailed statistics
The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:
{
'rotation_range': (-15, 15), # Small rotations for game sprites
'brightness_range': (0.7, 1.3), # Brightness variations
'contrast_range': (0.8, 1.2), # Contrast adjustments
'saturation_range': (0.8, 1.2), # Color saturation
'noise_intensity': 0.02, # Gaussian noise
'horizontal_flip_prob': 0.5, # 50% chance horizontal flip
'scaling_range': (0.8, 1.2), # Scale variations
}
from ultralytics import YOLO
# Load and train
model = YOLO('yolov8n.pt') # Load pretrained model
results = model.train(data='dataset.yaml', epochs=100, imgsz=640)
# Validate
metrics = model.val()
# Predict
results = model('path/to/test/image.png')
import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os
class DuckHuntDataset(Dataset):
def _init_(self, images_dir, labels_dir, transform=None):
self.images_dir = images_dir
self.labels_dir = labels_dir
self.transform = transform
self.images = os.listdir(images_dir)
def _len_(self):
return len(self.images)
def _getitem_(self, idx):
img_path = os.path.join(self.images_dir, self.images[idx])
label_path = os.path.join(self.labels_dir,
self.images[idx].replace('.png', '.txt'))
image = Image.open(img_path)
# Load YOLO annotations
with open(label_path, 'r') as f:
labels = f.readlines()
if self.transform:
image = self.transform(image)
return image, labels
# Usage
dataset = DuckHuntDataset('images/train', 'labels/train')
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
Each .txt file contains one line per object:
class_id center_x center_y width height
Example annotation:
0 0.492 0.403 0.212 0.315
Where values are normalized (0-1) relative to image dimensions.
This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SalmonScan dataset is a collection of images of salmon fish, including healthy fish and infected fish. The dataset consists of two classes of images:
Fresh salmon š Infected Salmon š
This dataset is ideal for various computer vision tasks in machine learning and deep learning applications. Whether you are a researcher, developer, or student, the SalmonScan dataset offers a rich and diverse data source to support your projects and experiments.
So, dive in and explore the fascinating world of salmon health and disease!
The SalmonScan dataset consists of approximately 1,208 images of salmon fish, classified into two classes:
Each class contains a representative and diverse collection of images, capturing a range of different perspectives, scales, and lighting conditions. The images have been carefully curated to ensure that they are of high quality and suitable for use in a variety of computer vision tasks.
Data Preprocessing
The input images were preprocessed to enhance their quality and suitability for further analysis. The following steps were taken:
Resizing š: All the images were resized to a uniform size of 600 pixels in width and 250 pixels in height to ensure compatibility with the learning algorithm.
Image Augmentation šø: To overcome the small amount of images, various image augmentation techniques were applied to the input images. These included:
Horizontal Flip ā©ļø: The images were horizontally flipped to create additional samples. Vertical Flip ā¬ļø: The images were vertically flipped to create additional samples. Rotation š: The images were rotated to create additional samples. Cropping šŖ: A portion of the image was randomly cropped to create additional samples. Gaussian Noise š: Gaussian noise was added to the images to create additional samples. Shearing š: The images were sheared to create additional samples. Contrast Adjustment (Gamma) āļø: The gamma correction was applied to the images to adjust their contrast. Contrast Adjustment (Sigmoid) āļø: The sigmoid function was applied to the images to adjust their contrast.
Usage
To use the salmon scan dataset in your ML and DL projects, follow these steps:
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
This dataset contains a comprehensive collection of waste images designed for training machine learning models to classify different types of waste materials, with a strong focus on electronic waste (e-waste) and mixed materials. The dataset includes 7 electronic device categories alongside traditional recyclable materials, making it ideal for modern waste management challenges where electronic devices constitute a significant portion of waste streams. The dataset has been carefully curated and balanced to ensure optimal performance for multi-category waste classification tasks using deep learning approaches.
The dataset includes 17 distinct waste categories covering various types of materials commonly found in waste management scenarios:
balanced_waste_images/
āāā category_1/
ā āāā image_001.jpg
ā āāā image_002.jpg
ā āāā ... (400 images)
āāā category_2/
ā āāā image_001.jpg
ā āāā ... (400 images)
āāā ... (17 categories total)
Note: Dataset is not pre-split. Users need to create train/validation/test splits as needed.
Since the dataset is not pre-split, you'll need to create train/validation/test splits:
import splitfolders
# Split dataset: 80% train, 10% val, 10% test
splitfolders.ratio(
input='balanced_waste_images',
output='split_data',
seed=42,
ratio=(.8, .1, .1),
group_prefix=None,
move=False
)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Data generators with preprocessing
train_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
'split_data/train/',
target_size=(224, 224),
batch_size=32,
class_mode='categorical'
)
val_generator = val_datagen.flow_from_director...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a Bayesian method for multivariate changepoint detection that allows for simultaneous inference on the location of a changepoint and the coefficients of a logistic regression model for distinguishing pre-changepoint data from post-changepoint data. In contrast to many methods for multivariate changepoint detection, the proposed method is applicable to data of mixed type and avoids strict assumptions regarding the distribution of the data and the nature of the change. The regression coefficients provide an interpretable description of a potentially complex change. For posterior inference, the model admits a simple Gibbs sampling algorithm based on Pólya-gamma data augmentation. We establish conditions under which the proposed method is guaranteed to recover the true underlying changepoint. As a testing ground for our method, we consider the problem of detecting topological changes in time series of images. We demonstrate that our proposed method BCLR, combined with a topological feature embedding, performs well on both simulated and real image data. The method also successfully recovers the location and nature of changes in more traditional changepoint tasks. An implementation of our method is available in the Python package bclr.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a collection of images and extracted landmark features for 48 fundamental static signs in Bangla Sign Language (BSL), including 38 alphabets and 10 digits (0-9). It was created to support research in isolated sign language recognition (SLR) for BSL and provide a benchmark resource for the research community. In total, the dataset comprises 14,566 raw images, 14,566 mirrored images, and 29,132 processed feature samples.
Data Contents:
The dataset is organized into two main folders:
01_Images: Contains 29,132 images in .jpg format (14,566 raw + 14,566 mirrored).
⢠Raw_Images: Contains 14,566 original images collected from participants.
⢠Mirrored_Images: Contains 14,566 horizontally flipped versions of the raw images for data augmentation purposes.
⢠Privacy Note: Facial regions in all images within this folder have been anonymized (blurred) to protect participant privacy, as formal
informed consent for sharing identifiable images was not obtained prior to collection.
02_Processed_Features_NPY: Contains 29,132 126-dimensional hand landmark features saved as NumPy arrays in .npy format. Features were extracted using MediaPipe Holistic (capturing 21 landmarks each for the left and right hands, resulting in 63 + 63 = 126 features per image). These feature files are pre-split into train (23,293 samples), val (2,911 samples), and test (2,928 samples) subdirectories (approximately 80%/10%/10%) for standardized model evaluation and benchmarking .
Data Collection: Images were collected from 5 volunteers using a Macbook Air M3 camera. Data collection took place indoors under room lighting conditions against a white background. Images were captured manually using a Python script to ensure clarity.
Potential Use: Researchers can utilize the anonymized raw and mirrored images (01_Images) to develop or test novel feature extraction techniques or multimodal recognition systems. Alternatively, the pre-processed and split .npy feature files (02_Processed_Features_NPY) can be directly used to efficiently train and evaluate machine learning models for static BSL recognition, facilitating reproducible research and benchmarking.
Further Details: Please refer to the README.md file included within the dataset for detailed class mapping (e.g., L1='ঠ', D0='০'), comprehensive file statistics per class , specifics on the data processing pipeline, and citation guidelines.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains hand gesture images for sign language recognition, focusing on 5 commonly used phrases. The images are preprocessed, cropped, and ready for training deep learning models for real-time sign language detection applications.
| Class ID | Meaning | Description |
|---|---|---|
| 0 | Yes | Affirmative gesture |
| 1 | No | Negative gesture |
| 2 | I Love You | Expression of affection |
| 3 | Hello | Greeting gesture |
| 4 | Thank You | Gratitude expression |
data_final/
āāā train/
ā āāā 0/ # Yes (~150 images)
ā āāā 1/ # No (~150 images)
ā āāā 2/ # I Love You (~150 images)
ā āāā 3/ # Hello (~150 images)
ā āāā 4/ # Thank You (~150 images)
āāā val/
ā āāā 0/
ā āāā 1/
ā āāā 2/
ā āāā 3/
ā āāā 4/
āāā test/
āāā 0/
āāā 1/
āāā 2/
āāā 3/
āāā 4/
This dataset is suitable for:
Sign Language Recognition Models
Computer Vision Research
Educational Projects
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rescale=1./255)
train_gen = datagen.flow_from_directory(
'data_final/train',
target_size=(224, 224),
batch_size=32,
class_mode='categorical'
)
val_gen = datagen.flow_from_directory(
'data_final/val',
target_size=(224, 224),
batch_size=32,
class_mode='categorical'
)
from torchvision import datasets, transforms
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
train_dataset = datasets.ImageFolder('data_final/train', transform=transform)
val_dataset = datasets.ImageFolder('data_final/val', transform=transform)
Using transfer learning with MobileNetV2/EfficientNetB0: - Expected Accuracy: 90-97% - Training Time: 20-40 minutes (GPU) - Model Size: ~15 MB
For better generalization, use these augmentation techniques:
python
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=25,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
zoom_range=0.2,
horizontal_flip=True,
brightness_range=[0.7, 1.3]
)
If you use this dataset in your research or project, please cite:
@dataset{sign_language_5phrases_2025,
title={Sign Language Recognition Dataset - 5 Essential Phrases},
author={[Your Name]},
year={2025},
publisher={Kaggle},
url={[Dataset URL]}
}
This dataset is released under [Choose one]: - CC BY 4.0 (Attribution) - Recommended - CC BY-SA 4.0 (Attribution-ShareAlike) - CC0 1.0 (Public Domain)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BIRD is an open dataset that consists of 100,000 multichannel room impulse responses generated using the image method. This makes it the largest multichannel open dataset currently available. We provide some Python code that shows how to download and use this dataset to perform online data augmentation. The code is compatible with the PyTorch dataset class, which eases integration in existing deep learning projects based on this framework.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BioEncoder: a metric learning toolkit for comparative organismal biology
Abstract - In the realm of biological image analysis, deep learning (DL) has become a core toolkit, e.g., for segmentation and classification. However, conventional DL methods are challenged by large biodiversity datasets characterized by unbalanced classes and hard-to-distinguish phenotypic differences between them. Here we present BioEncoder, a user-friendly toolkit for metric learning, which overcomes these challenges by focussing on learning relationships between individual data points rather than on the separability of classes. BioEncoder is released as a Python package, created for ease of use and flexibility across diverse datasets. It features taxon-agnostic data loaders, custom augmentation options, and simple hyperparameter adjustments through text-based configuration files. The toolkit's significance lies in its potential to unlock new research avenues in biological image analysis while democratizing access to advanced deep metric learning techniques. BioEncoder focuses on the urgent need for toolkits bridging the gap between complex DL pipelines and practical applications in biological research.
Dataset - This data repository includes two things: a snapshot of the BioEncoder package (BioEncoder-main.zip, version 1.0.0, downloaded from https://github.com/agporto/BioEncoder on 2024-07-19 at 17:20), and the damselfly dataset used for the case study presented in the paper (bioencoder_data.zip). The dataset archive also encompasses the configuration files and the final model checkpoints from the case study, as well as a script to reproduce the results and figures presented in the paper.
How to use - Get started by consulting the GithHub repository for information on how to install BioEncoder, then download the data archive and run the script. Some parts of the script can be executed using the model checkpoints, for orther parts the training rountine needs to be run.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a collection of car and bike images scraped from the web using the Bing Image Crawler (icrawler library in Python). It was created for educational and research purposes, especially for projects involving computer vision, deep learning, and image classification.
Each image was retrieved from publicly available Bing search results and organized into two folders:
cars/ ā contains images of different types and models of cars
bikes/ ā contains images of various motorcycles and scooters
Usage
This dataset is suitable for:
Training and testing CNNs or transfer learning models (e.g., ResNet, VGG, EfficientNet)
Practicing image preprocessing and augmentation techniques
Developing vehicle recognition or classification systems
Data Collection
Images were automatically collected using:
from icrawler.builtin import BingImageCrawler
with filters={'type': 'photo'} to ensure only photographic content.
License
All images are shared under the CC0: Public Domain license. They are intended solely for non-commercial, academic, and research use.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is divided into two main directories train and test directory each divided into two other directories breast_malignant and breast_benign following this structure:
```
\train
|
|_\breast_malignant
|
|_4000 images
|
|_\breast_benign
|
|_4000 images
\test
|
|_\breast_malignant
|
|_1000 images
|
|_\breast_benign
|
|_1000 images
# Dataset details:
|Path| Subclass|Description|
|-----|-----------|--------------|
|breast_benign| Benign| Non-cancerous breast tissues|
|breast_malignant| Malignant| TCancerous breast tissues|
*Source: Collected from the Breast Cancer dataset by Anas Elmasry on Kaggle.*
# Data augmentation:
the data was augmentend by the original author of the dataset using Kera's `ImageDataGeberator` *[1]* and The augmentations include:
- Rotation: Up to 10 degrees.
- Width & Height Shift: Up to 10% of the total image size.
- Shearing & Zooming: 10% variation.
- Horizontal Flip: Randomly flips images for additional diversity.
- Brightness Adjustment: Ranges from 0.2 to 1.2 for varying light conditions.
The parameters used for augmentation:
```python
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
fill_mode='nearest',
brightness_range=[0.2, 1.2]
)
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Accurate species identification is a prerequisite to assess the medical relevance of a mosquito specimens. In monitoring or surveillance programs, mosquitoes are typically identified based on morphological characters, which can be supported by molecular biological assays. Both methods require intensive experience of the observers and well-equipped laboratories. The use of convolutional neural networks (CNNs) to identify species based on images may be a cost-effective and reliable alternative. In this proof-of-concept study, we developed a CNN to identify seven Aedes species by wing images, only. While previous studies used images of the whole mosquito body, the nearly two-dimensional wings may facilitate standardized image capture and thereby reduce the complexity of the CNN implementation. Mosquitoes were sampled from different sites in Germany. Their wings were mounted and photographed with a professional stereomicroscope. The data set consisted of 1,155 wing images from seven Aedes species, including the exotic species Aedes albopictus und six native Aedes species, as well as 554 wings from different non-Aedes mosquitoes. The wing images were used to train a CNN to differentiate between Aedes and non-Aedes mosquitoes and to classify the seven Aedes species. The training was conducted separately for grayscale and RGB images. Image processing, data augmentation, training, validation and testing were conducted in python using deep-learning framework PyTorch. For both input images, i.e. grayscale and RGB images, our best-performing CNN configuration achieved an accuracy of 100% to discriminate Aedes from non-Aedes mosquito species. The accuracy to predict the Aedes species reached 93% for grayscale images and 96% for RGB images. Aedes albopictus could be identified with an accuracy of 100%. In conclusion, wing images are sufficient to identify mosquito species by CNN based image classification. Thus, wing images can represent a useful complement for CNN-based image classification, e.g. for damaged mosquito specimens. Larger training data sets with further mosquito species and a greater variety of images are required to improve and test broad applicability. Methods The study was based on 1,155 wing photos from female Aedes specimens, including 165 Ae. albopictus, 165 Ae. cinereus, 165 Ae. communis, 165 Ae. punctor, 165 Ae. rusticus, 165 Ae. sticticus and 165 Ae. vexans. As unknown-class we integrated further 554 wing photos from common non-Aedes mosquito species in Germany, including 61 Anopheles claviger (Meigen, 1804), 196 Anopheles maculipennis s.l., 11 Anopheles plumbeus Stephens, 1828, 214 Culex pipiens s.s./Cx. torrentium and 72 Coquillettidia richiardii (Ficalbi, 1889). The field-sampled mosquitoes were directly killed and stored at -20 °C until further preparation. All specimens were identified by morphology. After the morphological species identification, the right wing of each specimen was removed and mounted with euparal (Carl Roth, Karlsruhe, Germany) on microscopic slides. Subsequently, the mounted wings were photographed with a stereomicroscope (Leica M205 C, Leica Microsystems, Wetzlar, Germany) under 20à magnification using standardized illumination under and exposure time (279 ms).
Facebook
Twitter1.Framework overview. This paper proposed a pipeline to construct high-quality datasets for text mining in materials science. Firstly, we utilize the traceable automatic acquisition scheme of literature to ensure the traceability of textual data. Then, a data processing method driven by downstream tasks is performed to generate high-quality pre-annotated corpora conditioned on the characteristics of materials texts. On this basis, we define a general annotation scheme derived from materials science tetrahedron to complete high-quality annotation. Finally, a conditional data augmentation model incorporating materials domain knowledge (cDA-DK) is constructed to augment the data quantity.2.Dataset information. The experimental datasets used in this paper include: the Matscholar dataset publicly published by Weston et al. (DOI: 10.1021/acs.jcim.9b00470), and the NASICON entity recognition dataset constructed by ourselves. Herein, we mainly introduce the details of NASICON entity recognition dataset.2.1 Data collection and preprocessing. Firstly, 55 materials science literature related to NASICON system are collected through Crystallographic Information File (CIF), which contains a wealth of structure-activity relationship information. Note that materials science literature is mostly stored as portable document format (PDF), with content arranged in columns and mixed with tables, images, and formulas, which significantly compromises the readability of the text sequence. To tackle this issue, we employ the text parser PDFMiner (a Python toolkit) to standardize, segment, and parse the original documents, thereby converting PDF literature into plain text. In this process, the entire textual information of literature, encompassing title, author, abstract, keywords, institution, publisher, and publication year, is retained and stored as a unified TXT document. Subsequently, we apply rules based on Python regular expressions to remove redundant information, such as garbled characters and line breaks caused by figures, tables, and formulas. This results in a cleaner text corpus, enhancing its readability and enabling more efficient data analysis. Note that special symbols may also appear as garbled characters, but we refrain from directly deleting them, as they may contain valuable information such as chemical units. Therefore, we converted all such symbols to a special token
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Kidney Cancer Dataset is a subdataset of Multi Cancer Dataset
the data was split into two main directories trainand test following this structure:
\train
|
|_\kidney_normal
|
|_4000 images
|
|_\kidney_tumor
|
|_4000 images
\test
|
|_\kidney_normal
|
|_1000 images
|
|_\kidney_tumor
|
|_1000 images
You can cite this dataset as: https://www.kaggle.com/datasets/djaidwalid/kidney-cancer-dataset/data
| Cancer Type | Classes | Images |
|---|---|---|
| Acute Lymphoblastic Leukemia | 4 | 20,000 |
| Brain Cancer | 3 | 15,000 |
| Breast Cancer | 2 | 10,000 |
| Cervical Cancer | 5 | 25,000 |
| Kidney Cancer | 2 | 10,000 |
| Lung and Colon Cancer | 5 | 25,000 |
| Lymphoma | 3 | 15,000 |
| Oral Cancer | 2 | 10,000 |
I selected for this model only the Kidney Cancer directory.
the data was augmentend by the original author of the dataset using Kera's ImageDataGeberator [1] and The augmentations include:
- Rotation: Up to 10 degrees.
- Width & Height Shift: Up to 10% of the total image size.
- Shearing & Zooming: 10% variation.
- Horizontal Flip: Randomly flips images for additional diversity.
- Brightness Adjustment: Ranges from 0.2 to 1.2 for varying light conditions.
The parameters used for augmentation: ```python from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator( rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1, zoom_range=0.1, horizontal_flip=True, fill_mode='nearest', brightness_range=[0.2, 1.2] )
# ***References:***
*[1]* Obuli Sai Naren. (2022). Multi Cancer Dataset [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/3415848
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PROGRAM SUMMARY No. of lines in distributed program, including test data, etc.: 481 No. of bytes in distributed program, including test data, etc.: 14540.8 Distribution format: .py, .csv Programming language: Python Computer: Any workstation or laptop computer running TensorFlow, Google Colab, Anaconda, Jupyter, pandas, NumPy, Microsoft Azure and Alteryx. Operating system: Windows and Mac OS, Linux.
Nature of problem: Navier-Stokes equations are solved numerically in ANSYS Fluent using Reynolds stress model for turbulence. The simulated values of friction factor are validated with theoretical and experimental data obtained from literature. Artificial neural networks are then used for a prediction-based augmentation of friction factor. The capabilities of the neural networks is discussed, in regard to computational cost and domain limitations.
Solution method: The simulation data is obtained through Reynolds stress modelling of fluid flow through pipe. This data is augmented using the artificial neural network model that predicts within and without data domain.
Restrictions: The code used in this research is limited to smooth pipe bends, in which friction factor is analysed using a steady state incompressible fluid flow.
Runtime: The artificial neural network produces results within a span of 20 seconds for three-dimensional geometry, using the allocated free computational resources of Google Colaboratory cloud-based computing system.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset originates from a research project investigating microgesture-based text editing in virtual reality (VR). The dataset was collected as part of an evaluation of the MicroGEXT system, which enables precise and efficient text editing using small, subtle hand movements. The research aims to explore lightweight, ergonomic alternatives to traditional mid-air gesture interactions.
Data Collection Methods ⢠Hardware: The dataset was collected using the Meta Quest Pro VR headset, utilizing its XR Hand Tracking package to capture hand skeleton data at 72 Hz. ⢠Participants: 10 participants were recruited for gesture elicitation and evaluation. ⢠Procedure:
Technical & Non-Technical Information for Reusability ⢠The dataset is suitable for: ⢠Gesture recognition research (static/dynamic gestures, sub-state segmentation). ⢠Human-computer interaction (HCI) studies focusing on XR input methods. ⢠Machine learning applications, including deep learning-based gesture classification. ⢠Reuse Considerations: ⢠Compatible with Unityās XR Hand Tracking package and Python-based deep learning frameworks (e.g., PyTorch, TensorFlow). ⢠Includes data augmentation scripts for expanding training datasets. ⢠The Null class helps mitigate false activations in real-time applications.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
About Dataset
š§ Overview The Sindhi Handwritten Alphabet Dataset is a comprehensive collection of handwritten Sindhi alphabet images, developed to support research in handwriting recognition, OCR, and computer vision for regional scripts. This dataset emphasizes diversity, authenticity, and real-world handwriting variations, making it highly suitable for AI-based Sindhi character recognition systems.
š§¾ Dataset Summary
Diversity & Realism The dataset captures handwriting from contributors across multiple generations and genders, reflecting a wide range of writing styles and characteristics. Generations Covered: Gen X Millennials Gen Z Gen Alpha
Writing Styles: Cursive Bold Thin Uneven and natural strokes
This diversity ensures models trained on this dataset can generalize well to unseen handwriting and different personal writing habits.
šÆ Usage & Applications
This dataset is ideal for:
Optical Character Recognition (OCR) for Sindhi script Handwritten Character Classification Handwriting Style Analysis across generations AI-based Sindhi Language Digitization & Preservation Computer Vision Research for regional and low-resource languages
Model Development A ResNet-50 based deep learning model was trained on this dataset with four additional layers and strong augmentation strategies.
Model Performance: Training Accuracy: 97% Validation Accuracy: 98% Testing Accuracy: 92%
These results demonstrate the datasetās effectiveness for developing high-performing Sindhi alphabet recognition models.
š©āš» Development Team Shayan Ali Shaikh Muhammad Hamza Under the Supervision of: š Dr. Attaullah Sahito
Data Collection: Handwriting samples were collected from students in 45 schools across Sindh, covering Classes 3ā7. This ensures authentic, naturally varied handwriting samples representing real-world conditions.
š§© Technical Notes
Tools Used: OpenCV, RoboFlow, Python (for preprocessing and manipulation)
Augmentation Techniques: Rotation, noise addition, brightness/contrast variation, and blurring for robustness
š License & Attribution
This dataset is released under the Creative Commons CC BY 4.0 License, allowing free use, sharing, and modification with proper attribution.
Contributors: Shayan Ali Shaikh Muhammad Hamza
š” Acknowledgment
This dataset was developed to digitize and preserve the Sindhi language through AI. We encourage students, researchers, and developers to use this dataset to advance Sindhi handwriting recognition and OCR technologies. Note It is first dataset ever written by students from primary level to secondary level and written on real pages with pen. There is no any computerized mainpulation.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset consists of images of famous (or not-so-famous) landmarks. The collection is organized into a two-level hierarchy structure. The first level is the categories for the landmarks, and the second level is the individual landmarks. There are 6 categories, and categories are: 1. Gothic 2. Modern 3. Mughal 4. Neoclassical 5. Pagodas 6. Pyramids
For each category, there are 5 landmarks, for a total of 30 landmarks. Each landmark has 14 images.
The landmarks dataset is too small to train convolutional neural networks (CNNs) from scratch. The resulting network will overfit the data. Instead, use transfer learning by reusing part of a pre-trained CNN. In transfer learning, instead of training the neural network starting from random weights, the weights for the lower parts of the network are taken from a pre-trained network. Only the higher parts of the network will have to be learned. Chapter 14 of GƩron discusses how to apply pre-trained models for transfer learning.
For this group project, the only allowed pre-trained networks are EfficientNetB0 and VGG16, which are smaller CNNs. The objective of this restriction is to avoid penalizing groups that do not have access to powerful machines and/or machines with GPUs. Groups are allowed to use Google Colab with GPUs to train the models, but be aware of resource usage limitations.
Data augmentation is another way to overcome the problem of small datasets. Keras/TensorFlow provides various image manipulation functions (hitps://www.tensorflow.org/api_docs/python/tf/image) that can be used to generate additional images. Refer to Lecture 9 slides and Chapter 14 of GƩron.
Yet another way to overcome the small dataset problem is experimenting with various ways of combining the models for the two tasks. It is possible to train two distinct models, one for category classification and one for landmark classification. But would landmark classification benefit from knowing the output of classification? Or vice versa?
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 1,023 multimedia data created by a virtual influencer focused on cultural heritage, featuring a diverse range of content including images, videos, and texts representing historical landmarks, traditional attire, cultural festivals, and architectural symbols. The data was collected through content generation, real-world cultural representations, and preprocessed with techniques such as resizing, normalization, and data augmentation to ensure consistency and diversity. The dataset is extracted and flattened into a CSV file format. From the location in Shanghai, China.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
SpaceNet, attained via a novel double-stage augmentation framework: FLARE https://arxiv.org/pdf/2405.13267, is a hierarchically structured and high-quality astronomical image dataset designed for fine-grained and macro classification tasks. Comprising approximately 12,900 samples, SpaceNet integrates lower (LR) to higher resolution (HR) conversion with standard augmentations and a diffusion approach for synthetic sample generation. This dataset enables superior generalization on various recogntion tasks like classification.
Total Samples: Approximately 12,900 images. Fine-Grained Class Distribution: - Asteroid: 283 files - Black Hole: 656 files - Comet: 416 files - Constellation: 1,552 files - Galaxy: 3,984 files - Nebula: 1,192 files - Planet: 1,472 files - Star: 3,269 files
SpaceNet is suitable for:
If you use SpaceNet in your research, please cite it as follows:
python
@misc{alamimam2024flare,
title={FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging},
author={Mohammed Talha Alam and Raza Imam and Mohsen Guizani and Fakhri Karray},
year={2024},
eprint={2405.13267},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A diverse dataset is crucial for training deep learning models, especially in the context of currency note recognition. Factors such as diverse backgrounds, lighting, orientation, and blur can significantly impact model outcomes. While high-quality scans of different currencies are accessible on collectors' websites, these often lack the variety seen in real-world scenarios. Additionally, publicly available datasets, which primarily feature old Thai currency notes, are limited, containing up to only 1000 images.
Recognizing the scarcity of comprehensive datasets for new Thai currency notes, we curated a collection of 3,600 images spanning five denominations: - 20 baht - 50 baht - 100 baht - 500 baht - 1000 baht
These images depict the notes in various orientations and settings, including different backgrounds and lighting conditions, such as illuminated and dark environments.
We used two iPhone models to capture this diversity: - iPhone 13 Pro Max (12-megapixel, f/1.8 rear camera) - iPhone 12 (12-megapixel, f/1.6 rear camera)
Unique scenarios were also included, such as half-folded notes against contrasting backgrounds. For consistency, the iPhone 12 captured 4032Ć3024 resolution shots of the 50 and 1000 baht notes, while the iPhone 13 Pro Max was used for the same resolution images of the other denominations. Our data collection team followed clear guidelines to ensure various image captures.
Each denomination class included 720 images. Specifically, the 20 baht note images were captured in various orientations and settings, such as front views with dark, white, and cluttered backgrounds and front views rotated 180 degrees with the same background variations. The same approach was applied to the 50, 100, 500, and 1000 baht notes. Additionally, images of the back of each note, both normal and rotated 180 degrees, and half-folded top and bottom states, were captured under the same diverse background conditions.
The collected images were meticulously examined during data preparation to address inconsistencies in labeling and variations. Images were organized into folders according to the denominations of the new Thai currency. Given that most images were originally captured with an iPhone in HEIC format, they were converted to JPEG using the 'pyheif' Python module.
The data was divided into training and validation subsets with a 70%:30% ratio. There are a total of 2520 images for training and 1080 for validation.
Here's a brief overview of each objective, research question, and type of analysis you can try to perform:
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt" (1984), specifically prepared for YOLO (You Only Look Once) object detection training. The dataset includes sprites of the iconic hunting dog and ducks in various states, augmented to provide a balanced and comprehensive training set for computer vision models.
Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes
| Metric | Value |
|---|---|
| Total Images | 1,004 |
| Dataset Size | 12 MB |
| Image Format | PNG |
| Annotation Format | YOLO (.txt) |
| Classes | 4 |
| Train/Val Split | 711/260 (73%/27%) |
| Class ID | Class Name | Count | Description |
|---|---|---|---|
| 0 | dog | 252 | The hunting dog in various poses (jumping, laughing, sniffing, etc.) |
| 1 | duck_dead | 256 | Dead ducks (both black and red variants) |
| 2 | duck_shot | 248 | Ducks in the moment of being shot |
| 3 | duck_flying | 248 | Flying ducks in all directions (left, right, diagonal) |
yolo_dataset_augmented/
āāā images/
ā āāā train/ # 711 training images
ā āāā val/ # 260 validation images
āāā labels/
ā āāā train/ # 711 YOLO annotation files
ā āāā val/ # 260 YOLO annotation files
āāā classes.txt # Class names mapping
āāā dataset.yaml # YOLO configuration file
āāā augmented_dataset_stats.json # Detailed statistics
The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:
{
'rotation_range': (-15, 15), # Small rotations for game sprites
'brightness_range': (0.7, 1.3), # Brightness variations
'contrast_range': (0.8, 1.2), # Contrast adjustments
'saturation_range': (0.8, 1.2), # Color saturation
'noise_intensity': 0.02, # Gaussian noise
'horizontal_flip_prob': 0.5, # 50% chance horizontal flip
'scaling_range': (0.8, 1.2), # Scale variations
}
from ultralytics import YOLO
# Load and train
model = YOLO('yolov8n.pt') # Load pretrained model
results = model.train(data='dataset.yaml', epochs=100, imgsz=640)
# Validate
metrics = model.val()
# Predict
results = model('path/to/test/image.png')
import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os
class DuckHuntDataset(Dataset):
def _init_(self, images_dir, labels_dir, transform=None):
self.images_dir = images_dir
self.labels_dir = labels_dir
self.transform = transform
self.images = os.listdir(images_dir)
def _len_(self):
return len(self.images)
def _getitem_(self, idx):
img_path = os.path.join(self.images_dir, self.images[idx])
label_path = os.path.join(self.labels_dir,
self.images[idx].replace('.png', '.txt'))
image = Image.open(img_path)
# Load YOLO annotations
with open(label_path, 'r') as f:
labels = f.readlines()
if self.transform:
image = self.transform(image)
return image, labels
# Usage
dataset = DuckHuntDataset('images/train', 'labels/train')
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
Each .txt file contains one line per object:
class_id center_x center_y width height
Example annotation:
0 0.492 0.403 0.212 0.315
Where values are normalized (0-1) relative to image dimensions.
This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured: