Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Source codes and dataset of the research "Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation". Our work employed PyTorch, a framework for training Deep Learning models with GPU support and automatic back-propagation, to load the MViTv2 s models with Kinetics-400 weights. To simplify the code implementation, eliminating the need for an explicit loop to train and the automation of some hyperparameters, we use the PyTorch Lightning module. The inputs were batches of 10 samples with 16 sequenced images in 3-channel resized to 224 × 224 pixels and normalized from 0 to 1. Most of the papers in our literature survey split the original dataset chronologically. Some authors also apply k-fold cross-validation to emphasize the evaluation of the model stability. However, we adopt a hybrid split taking the first 50,000 to apply the 5-fold cross-validation between the training and validation sets (known data), with 40,000 samples for training and 10,000 for validation. Thus, we can evaluate performance and stability by analyzing the mean and standard deviation of all trained models in the test set, composed of the last 9,834 samples, preserving the chronological order (simulating unknown data). We develop three distinct models to evaluate the impact of oversampling magnetogram sequences through the dataset. The first model, Solar Flare MViT (SF MViT), has trained only with the original data from our base dataset without using oversampling. In the second model, Solar Flare MViT over Train (SF MViT oT), we only apply oversampling on training data, maintaining the original validation dataset. In the third model, Solar Flare MViT over Train and Validation (SF MViT oTV), we apply oversampling in both training and validation sets. We also trained a model oversampling the entire dataset. We called it the "SF_MViT_oTV Test" to verify how resampling or adopting a test set with unreal data may bias the results positively. GitHub version The .zip hosted here contains all files from the project, including the checkpoint and the output files generated by the codes. We have a clean version hosted on GitHub (https://github.com/lfgrim/SFF_MagSeq_MViTs), without the magnetogram_jpg folder (which can be downloaded directly on https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip) and the output and checkpoint files. Most code files hosted here also contain comments on the Portuguese language, which are being updated to English in the GitHub version. Folders Structure In the Root directory of the project, we have two folders:
magnetogram_jpg: holds the source images provided by Space Environment Artificial Intelligence Early Warning Innovation Workshop through the link https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip. It comprises 73,810 samples of high-quality magnetograms captured by HMI/SDO from 2010 May 4 to 2019 January 26. The HMI instrument provides these data (stored in hmi.sharp_720s dataset), making new samples available every 12 minutes. However, the images from this dataset were collected every 96 minutes. Each image has an associated magnetogram comprising a ready-made snippet of one or most solar ARs. It is essential to notice that the magnetograms cropped by SHARP can contain one or more solar ARs classified by the National Oceanic and Atmospheric Administration (NOAA). Seq_Magnetogram: contains the references for source images with the corresponding labels in the next 24 h. and 48 h. in the respectively M24 and M48 sub-folders.
M24/M48: both present the following sub-folders structure:
Seqs16; SF_MViT; SF_MViT_oT; SF_MViT_oTV; SF_MViT_oTV_Test. There are also two files in root:
inst_packages.sh: install the packages and dependencies to run the models. download_MViTS.py: download the pre-trained MViTv2_S from PyTorch and store it in the cache. M24 and M48 folders hold reference text files (flare_Mclass...) linking the images in the magnetogram_jpg folders or the sequences (Seq16_flare_Mclass...) in the Seqs16 folders with their respective labels. They also hold "cria_seqs.py" which was responsible for creating the sequences and "test_pandas.py" to verify head info and check the number of samples categorized by the label of the text files. All the text files with the prefix "Seq16" and inside the Seqs16 folder were created by "criaseqs.py" code based on the correspondent "flare_Mclass" prefixed text files. Seqs16 folder holds reference text files, in which each file contains a sequence of images that was pointed to the magnetogram_jpg folders. All SF_MViT... folders hold the model training codes itself (SF_MViT...py) and the corresponding job submission (jobMViT...), temporary input (Seq16_flare...), output (saida_MVIT... and MViT_S...), error (err_MViT...) and checkpoint files (sample-FLARE...ckpt). Executed model training codes generate output, error, and checkpoint files. There is also a folder called "lightning_logs" that stores logs of trained models. Naming pattern for the files:
magnetogram_jpg: follows the format "hmi.sharp_720s...magnetogram.fits.jpg" and Seqs16: follows the format "hmi.sharp_720s...to.", where:
is the date-time when the sequence ends, and follow the same format of . Reference text files in M24 and M48 or inside SF_MViT... folders follows the format "flare_Mclass_.txt", where:
is Seq16 if refers to a sequence, or void if refers direct to images.
"24h" or "48h".
is "TrainVal" or "Test". The refers to the split of Train/Val.
void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. All SF_MViT...folders:
void or "oT" (over Train) or "oTV" (over Train and Val) or "oTV_Test" (over Train, Val and Test);
"24h" or "48h";
"oneSplit" for a specific split or "allSplits" if run all splits.
void is default to run 1 GPU or "2gpu" to run into 2 gpus systems; Job submission files: "jobMViT_", where:
point the queue in Lovelace environment hosted on CENAPAD-SP (https://www.cenapad.unicamp.br/parque/jobsLovelace) Temporary inputs: "Seq16_flare_Mclass_.txt:
train or val;
void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. Outputs: "saida_MViT_Adam_10-7", where:
k0 to k4, means the correlated split of the output, or void if the output is from all splits. Error files: "err_MViT_Adam_10-7", where:
k0 to k4, means the correlated split of the error log file, or void if the error file is from all splits. Checkpoint files: "sample-FLARE_MViT_S_10-7-epoch=-valid_loss=-Wloss_k=.ckpt", where:
epoch number of the checkpoint;
corresponding valid loss;
0 to 4.
Facebook
TwitterA big thank you to my GitHub Sponsors for their support!
In addition to the sponsors at the link above, I've received hardware and/or cloud resources from * Nvidia (https://www.nvidia.com/en-us/) * TFRC (https://www.tensorflow.org/tfrc)
I'm fortunate to be able to dedicate significant time and money of my own supporting this and other open source projects. However, as the projects increase in scope, outside support is needed to continue with the current trajectory of hardware, infrastructure, and electricty costs.
timm bits branch).data, a bit more consistency, unit tests for all!efficientnetv2_rw_t weights, a custom 'tiny' 13.6M param variant that is a bit better than (non NoisyStudent) B3 models. Both faster and better accuracy (at same or lower res)
vit_base_patch16_sam_224) and B/32 (vit_base_patch32_sam_224) models.jx_nest_base - 83.534, jx_nest_small - 83.120, jx_nest_tiny - 81.426gmlp_s16_224 trained to 79.6 top-1, matching paper. Hparams for this and other recent MLP training herevit_large_patch16_384 (87.1 top-1), vit_large_r50_s32_384 (86.2 top-1), vit_base_patch16_384 (86.0 top-1)vit_deit_* renamed to just deit_*gmixer_24_224 MLP /w GLU, 78.1 top-1 w/ 25M params.eca_nfnet_l2 weights from my 'lightweight' series. 84.7 top-1 at 384x384.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BIRD is an open dataset that consists of 100,000 multichannel room impulse responses generated using the image method. This makes it the largest multichannel open dataset currently available. We provide some Python code that shows how to download and use this dataset to perform online data augmentation. The code is compatible with the PyTorch dataset class, which eases integration in existing deep learning projects based on this framework.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains the reference data used for direct force training of Artificial Neural Network (ANN) interatomic potentials using the atomic energy network (ænet) and ænet-PyTorch packages (https://github.com/atomisticnet/aenet-PyTorch). It also includes the GPR-augmented data used for indirect force training via Gaussian Process Regression (GPR) surrogate models using the ænet-GPR package (https://github.com/atomisticnet/aenet-gpr). Each data file contains atomic structures, energies, and atomic forces in XCrySDen Structure Format (XSF). The dataset includes all reference training/test data and corresponding GPR-augmented data used in the four benchmark examples presented in the reference paper, "Scalable Training of Neural Network Potentials for Complex Interfaces Through Data Augmentation". A hierarchy of the dataset is described in the README.txt file, and an overview of the dataset is also summarized in supplementary Table S1 of the reference paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the Wallhack1.8k dataset for WiFi-based long-range activity recognition in Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS)/Through-Wall scenarios, as proposed in [1,2], as well as the CAD models (of 3D-printable parts) of the WiFi systems proposed in [2].
PyTroch Dataloader
A minimal PyTorch dataloader for the Wallhack1.8k dataset is provided at: https://github.com/StrohmayerJ/wallhack1.8k
Dataset Description
The Wallhack1.8k dataset comprises 1,806 CSI amplitude spectrograms (and raw WiFi packet time series) corresponding to three activity classes: "no presence," "walking," and "walking + arm-waving." WiFi packets were transmitted at a frequency of 100 Hz, and each spectrogram captures a temporal context of approximately 4 seconds (400 WiFi packets).
To assess cross-scenario and cross-system generalization, WiFi packet sequences were collected in LoS and through-wall (NLoS) scenarios, utilizing two different WiFi systems (BQ: biquad antenna and PIFA: printed inverted-F antenna). The dataset is structured accordingly:
LOS/BQ/ <- WiFi packets collected in the LoS scenario using the BQ system
LOS/PIFA/ <- WiFi packets collected in the LoS scenario using the PIFA system
NLOS/BQ/ <- WiFi packets collected in the NLoS scenario using the BQ system
NLOS/PIFA/ <- WiFi packets collected in the NLoS scenario using the PIFA system
These directories contain the raw WiFi packet time series (see Table 1). Each row represents a single WiFi packet with the complex CSI vector H being stored in the "data" field and the class label being stored in the "class" field. H is of the form [I, R, I, R, ..., I, R], where two consecutive entries represent imaginary and real parts of complex numbers (the Channel Frequency Responses of subcarriers). Taking the absolute value of H (e.g., via numpy.abs(H)) yields the subcarrier amplitudes A.
To extract the 52 L-LTF subcarriers used in [1], the following indices of A are to be selected:
csi_valid_subcarrier_index = [] csi_valid_subcarrier_index += [i for i in range(6, 32)] csi_valid_subcarrier_index += [i for i in range(33, 59)]
Additional 56 HT-LTF subcarriers can be selected via:
csi_valid_subcarrier_index += [i for i in range(66, 94)]
csi_valid_subcarrier_index += [i for i in range(95, 123)]
For more details on subcarrier selection, see ESP-IDF (Section Wi-Fi Channel State Information) and esp-csi.
Extracted amplitude spectrograms with the corresponding label files of the train/validation/test split: "trainLabels.csv," "validationLabels.csv," and "testLabels.csv," can be found in the spectrograms/ directory.
The columns in the label files correspond to the following: [Spectrogram index, Class label, Room label]
Spectrogram index: [0, ..., n]
Class label: [0,1,2], where 0 = "no presence", 1 = "walking", and 2 = "walking + arm-waving."
Room label: [0,1,2,3,4,5], where labels 1-5 correspond to the room number in the NLoS scenario (see Fig. 3 in [1]). The label 0 corresponds to no room and is used for the "no presence" class.
Dataset Overview:
Table 1: Raw WiFi packet sequences.
Scenario System "no presence" / label 0 "walking" / label 1 "walking + arm-waving" / label 2 Total
LoS BQ b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv
LoS PIFA b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv
NLoS BQ b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv
NLoS PIFA b1.csv w1.csv, w2.csv, w3.csv, w4.csv and w5.csv ww1.csv, ww2.csv, ww3.csv, ww4.csv and ww5.csv
4 20 20 44
Table 2: Sample/Spectrogram distribution across activity classes in Wallhack1.8k.
Scenario System
"no presence" / label 0
"walking" / label 1
"walking + arm-waving" / label 2 Total
LoS BQ 149 154 155
LoS PIFA 149 160 152
NLoS BQ 148 150 152
NLoS PIFA 143 147 147
589 611 606 1,806
Download and UseThis data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to one of our papers [1,2].
[1] Strohmayer, Julian, and Martin Kampel. (2024). “Data Augmentation Techniques for Cross-Domain WiFi CSI-Based Human Activity Recognition”, In IFIP International Conference on Artificial Intelligence Applications and Innovations (pp. 42-56). Cham: Springer Nature Switzerland, doi: https://doi.org/10.1007/978-3-031-63211-2_4.
[2] Strohmayer, Julian, and Martin Kampel., “Directional Antenna Systems for Long-Range Through-Wall Human Activity Recognition,” 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024, pp. 3594-3599, doi: https://doi.org/10.1109/ICIP51287.2024.10647666.
BibTeX citations:
@inproceedings{strohmayer2024data, title={Data Augmentation Techniques for Cross-Domain WiFi CSI-Based Human Activity Recognition}, author={Strohmayer, Julian and Kampel, Martin}, booktitle={IFIP International Conference on Artificial Intelligence Applications and Innovations}, pages={42--56}, year={2024}, organization={Springer}}@INPROCEEDINGS{10647666, author={Strohmayer, Julian and Kampel, Martin}, booktitle={2024 IEEE International Conference on Image Processing (ICIP)}, title={Directional Antenna Systems for Long-Range Through-Wall Human Activity Recognition}, year={2024}, volume={}, number={}, pages={3594-3599}, keywords={Visualization;Accuracy;System performance;Directional antennas;Directive antennas;Reflector antennas;Sensors;Human Activity Recognition;WiFi;Channel State Information;Through-Wall Sensing;ESP32}, doi={10.1109/ICIP51287.2024.10647666}}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary
We provide imagery used to train LROCNet -- our Convolutional Neural Network classifier of orbital imagery of the moon. Images are divided into train, validation, and test zip files, which contain class specific sub-folders. We have three classes: "fresh crater", "old crater", and "none". Classes are described in detail in the attached labeling guide.
Directory Contents
We include the labeling guide and training, testing, and validation data. Training data was split to avoid upload timeouts.
Data Description
We use CDR (Calibrated Data Record) browse imagery (50% resolution) from the Lunar Reconnaissance Orbiter's Narrow Angle Cameras (NACs). Data we get from the NACs are 5-km swaths, at nominal orbit, so we perform a saliency detection step to find surface features of interest. A detector developed for Mars HiRISE (Wagstaff et al.) worked well for our purposes, after updating based on LROC NAC image resolution. We use this detector to create a set of image chipouts (small 227x277 cutouts) from the larger image, sampling the lunar globe.
Class Labeling
We select classes of interest based on what is visible at the NAC resolution, consulting with scientists and performing a literature review. Initially, we have 7 classes: "fresh crater", "old crater", "overlapping craters", "irregular mare patches", "rockfalls and landfalls", "of scientific interest", and "none".
Using the Zooniverse platform, we set up a labeling tool and labeled 5,000 images. We found that "fresh crater" make up 11% of the data, "old crater" 18%, with the vast majority "none". Due to limited examples of the other classes, we reduce our initial class set to: "fresh crater" (with impact ejecta), "old crater", and "none".
We divide the images into train/validation/test sets making sure no image swaths span multiple sets.
Data Augmentation
Using PyTorch, we apply the following augmentation on the training set only: horizontal flip, vertical flip, rotation by 90/180/270 degrees, and brightness adjustment (0.5, 2). In addition, we use weighted sampling so that each class is weighted equally. The training set included here does not include augmentation since that was performed within PyTorch.
Acknowledgements
The author would like to thank the volunteers who provided annotations for this data set, as well as others who contributed to this work (as in the Contributor list). We would also like to thank the PDS Imaging Node for support of this work.
The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).
CL#22-4763
© 2022 California Institute of Technology. Government sponsorship acknowledged.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data supplement: Detection of Drainage Ditches from LiDAR DTM Using U-Net and Transfer Learning
Holger Virro, Alexander Kmoch, William Lidberg, Wai Tik Chan, Evelyn Uuemaa
Accurate mapping of ditches is essential for effective hydrological modeling and land management. Traditional methods, such as manual digitization or threshold-based extraction, utilize LiDAR-derived digital terrain model (DTM) data but are labor-intensive and impractical to apply for large-scale applications. Deep learning offers a promising alternative but requires extensive labeled data, often unavailable. To address this, we developed a transfer learning approach using a U-Net model pre-trained on a large high-quality Swedish dataset and fine-tuned on a smaller localized Estonian dataset. The model uses a single-band LiDAR DTM raster as input, minimizing preprocessing. We identified the optimal model configuration by systematically testing kernel sizes and data augmentation. The best fine-tuned model achieved an overall F1 score of 0.766, demonstrating its effectiveness in detecting drainage ditches in training data-scarce regions. Performance varied by land use, with higher accuracy in peatlands (F1=0.822) than in forests (F1=0.752) and arable land (F1=0.779). These findings underscore the model's suitability for large-scale ditch mapping and its adaptability to different landscapes.
Facebook
TwitterA Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.
Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Structural planes decrease the strength and stability of rock masses, severely affecting their mechanical properties and deformation and failure characteristics. Therefore, investigation and analysis of structural planes are crucial tasks in mining rock mechanics. The drilling camera obtains image information of deep structural planes of rock masses through high-definition camera methods, providing important data sources for the analysis of deep structural planes of rock masses. This paper addresses the problems of high workload, low efficiency, high subjectivity, and poor accuracy brought about by manual processing based on current borehole image analysis and conducts an intelligent segmentation study of borehole image structural planes based on the U2-Net network. By collecting data from 20 different borehole images in different lithological regions, a dataset consisting of 1,013 borehole images with structural plane type, lithology, and color was established. Data augmentation methods such as image flipping, color jittering, blurring, and mixup were applied to expand the dataset to 12,421 images, meeting the requirements for deep network training data. Based on the PyTorch deep learning framework, the initial U2-Net network weights were set, the learning rate was set to 0.001, the training batch was 4, and the Adam optimizer adaptively adjusted the learning rate during the training process. A dedicated network model for segmenting structural planes was obtained, and the model achieved a maximum F-measure value of 0.749 when the confidence threshold was set to 0.7, with an accuracy rate of up to 0.85 within the range of recall rate greater than 0.5. Overall, the model has high accuracy for segmenting structural planes and very low mean absolute error, indicating good segmentation accuracy and certain generalization of the network. The research method in this paper can serve as a reference for the study of intelligent identification of structural planes in borehole images.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
It includes 2822 images. Weed are annotated in YOLO v5 PyTorch format.
The following pre-processing was applied to each image: * Auto-orientation of pixel data (with EXIF-orientation stripping)
The following augmentation was applied to create 3 versions of each source image: * Equal probability of one of the following 90-degree rotations: none, clockwise, counter-clockwise * Random shear of between -15° to +15° horizontally and -15° to +15° vertically * Random brigthness adjustment of between -25 and +25 percent
Crop, Weed
Identifying weeds and distinguish them from crops is very essential in Farming.
This dataset is derived by the following publication:
Kaspars Sudars, Janis Jasko, Ivars Namatevs, Liva Ozola, Niks Badaukis, Dataset of annotated food crops and weed images for robotic computer vision control, Data in Brief, Volume 31, 2020, 105833, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2020.105833. (https://www.sciencedirect.com/science/article/pii/S2352340920307277) Abstract: Weed management technologies that can identify weeds and distinguish them from crops are in need of artificial intelligence solutions based on a computer vision approach, to enable the development of precisely targeted and autonomous robotic weed management systems. A prerequisite of such systems is to create robust and reliable object detection that can unambiguously distinguish weed from food crops. One of the essential steps towards precision agriculture is using annotated images to train convolutional neural networks to distinguish weed from food crops, which can be later followed using mechanical weed removal or selected spraying of herbicides. In this data paper, we propose an open-access dataset with manually annotated images for weed detection. The dataset is composed of 1118 images in which 6 food crops and 8 weed species are identified, altogether 7853 annotations were made in total. Three RGB digital cameras were used for image capturing: Intel RealSense D435, Canon EOS 800D, and Sony W800. The images were taken on food crops and weeds grown in controlled environment and field conditions at different growth stages Keywords: Computer vision; Object detection; Image annotation; Precision agriculture; Crop growth and development
Many thanks to Roboflow team for sharing this data.
Facebook
TwitterTips: If you wanna use data augment, it's unnecessary to transform these tensors to images to do so, actually you can directly apply Torchvision Transforms (or a Compose of Transforms) on tensors, it does work :)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fires are characterized by their sudden onset, rapid spread, and destructive nature, often causing irreversible damage to ecosystems. To address the challenges in forest fire detection, including the varying scales and complex features of flame and smoke, as well as false positives and missed detections caused by environmental interference, we propose a novel object detection model named CBAM-SSD. Firstly, data augmentation techniques involving geometric and color transformations are employed to enrich the dataset, effectively mitigating issues of insufficient and incomplete data collected in real-world scenarios. This significantly enhances the SSD model’s ability to detect flames, which exhibit highly variable morphological characteristics. Furthermore, the CBAM module is integrated into the SSD backbone network to reconstruct its feature extraction structure. This module adaptively weights flame color and smoke texture along the channel dimension and highlights critical fire regions in the spatial dimension, substantially improving the model’s perception of key fire features. Experimental results demonstrate that the CBAM-SSD model is lightweight and suitable for real-time detection, achieving a mAP@0.5 of 97.55% for flames and smoke, a 1.53% improvement over the baseline SSD. Specifically, the AP50 for flame detection reaches 96.61%, a 3.01% increase compared to the baseline, with a recall of 96.40%; while the AP50 for smoke detection reaches 98.49%, with a recall of 98.80%. These results indicate that the improved model delivers higher detection accuracy and lower false and missed detection rates, offering an efficient, convenient, and accurate solution for forest fire detection.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Download and Extract:
Download the dataset from Kaggle.
Extract the ZIP file if needed; images are organized into folders, where each folder name is the class label (like snake, lizard, frog, etc.).
Understand the Structure:
The dataset contains 9 major classes of reptiles and amphibians.
Each class folder contains multiple high-quality images belonging to that species or group.
Load the Dataset into Your Project:
If using PyTorch, use torchvision.datasets.ImageFolder to load images directly.
If using TensorFlow, use tf.keras.utils.image_dataset_from_directory.
You can also manually read images using OpenCV or PIL if needed.
Preprocessing:
Resize images if needed (e.g., 224x224 for ResNet models).
Normalize pixel values (e.g., divide by 255) to prepare for training.
Splitting the Data:
Optionally split the dataset into train, validation, and test sets.
You can split randomly or based on a percentage (e.g., 80% training, 20% validation/testing).
Training Your Model:
You can use any CNN model like ResNet, MobileNet, EfficientNet, etc.
Fine-tune pre-trained models using transfer learning for faster results.
Use the class folders for automatic label generation.
Handling Easily:
Use batch processing and data augmentation (flip, rotate, zoom) during training.
Use GPU if available for faster training.
Keep your classes in a list if needed for mapping predictions back to names.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains hand gesture images for sign language recognition, focusing on 5 commonly used phrases. The images are preprocessed, cropped, and ready for training deep learning models for real-time sign language detection applications.
| Class ID | Meaning | Description |
|---|---|---|
| 0 | Yes | Affirmative gesture |
| 1 | No | Negative gesture |
| 2 | I Love You | Expression of affection |
| 3 | Hello | Greeting gesture |
| 4 | Thank You | Gratitude expression |
data_final/
├── train/
│ ├── 0/ # Yes (~150 images)
│ ├── 1/ # No (~150 images)
│ ├── 2/ # I Love You (~150 images)
│ ├── 3/ # Hello (~150 images)
│ └── 4/ # Thank You (~150 images)
├── val/
│ ├── 0/
│ ├── 1/
│ ├── 2/
│ ├── 3/
│ └── 4/
└── test/
├── 0/
├── 1/
├── 2/
├── 3/
└── 4/
This dataset is suitable for:
Sign Language Recognition Models
Computer Vision Research
Educational Projects
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rescale=1./255)
train_gen = datagen.flow_from_directory(
'data_final/train',
target_size=(224, 224),
batch_size=32,
class_mode='categorical'
)
val_gen = datagen.flow_from_directory(
'data_final/val',
target_size=(224, 224),
batch_size=32,
class_mode='categorical'
)
from torchvision import datasets, transforms
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
train_dataset = datasets.ImageFolder('data_final/train', transform=transform)
val_dataset = datasets.ImageFolder('data_final/val', transform=transform)
Using transfer learning with MobileNetV2/EfficientNetB0: - Expected Accuracy: 90-97% - Training Time: 20-40 minutes (GPU) - Model Size: ~15 MB
For better generalization, use these augmentation techniques:
python
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=25,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
zoom_range=0.2,
horizontal_flip=True,
brightness_range=[0.7, 1.3]
)
If you use this dataset in your research or project, please cite:
@dataset{sign_language_5phrases_2025,
title={Sign Language Recognition Dataset - 5 Essential Phrases},
author={[Your Name]},
year={2025},
publisher={Kaggle},
url={[Dataset URL]}
}
This dataset is released under [Choose one]: - CC BY 4.0 (Attribution) - Recommended - CC BY-SA 4.0 (Attribution-ShareAlike) - CC0 1.0 (Public Domain)
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Source codes and dataset of the research "Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation". Our work employed PyTorch, a framework for training Deep Learning models with GPU support and automatic back-propagation, to load the MViTv2 s models with Kinetics-400 weights. To simplify the code implementation, eliminating the need for an explicit loop to train and the automation of some hyperparameters, we use the PyTorch Lightning module. The inputs were batches of 10 samples with 16 sequenced images in 3-channel resized to 224 × 224 pixels and normalized from 0 to 1. Most of the papers in our literature survey split the original dataset chronologically. Some authors also apply k-fold cross-validation to emphasize the evaluation of the model stability. However, we adopt a hybrid split taking the first 50,000 to apply the 5-fold cross-validation between the training and validation sets (known data), with 40,000 samples for training and 10,000 for validation. Thus, we can evaluate performance and stability by analyzing the mean and standard deviation of all trained models in the test set, composed of the last 9,834 samples, preserving the chronological order (simulating unknown data). We develop three distinct models to evaluate the impact of oversampling magnetogram sequences through the dataset. The first model, Solar Flare MViT (SF MViT), has trained only with the original data from our base dataset without using oversampling. In the second model, Solar Flare MViT over Train (SF MViT oT), we only apply oversampling on training data, maintaining the original validation dataset. In the third model, Solar Flare MViT over Train and Validation (SF MViT oTV), we apply oversampling in both training and validation sets. We also trained a model oversampling the entire dataset. We called it the "SF_MViT_oTV Test" to verify how resampling or adopting a test set with unreal data may bias the results positively. GitHub version The .zip hosted here contains all files from the project, including the checkpoint and the output files generated by the codes. We have a clean version hosted on GitHub (https://github.com/lfgrim/SFF_MagSeq_MViTs), without the magnetogram_jpg folder (which can be downloaded directly on https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip) and the output and checkpoint files. Most code files hosted here also contain comments on the Portuguese language, which are being updated to English in the GitHub version. Folders Structure In the Root directory of the project, we have two folders:
magnetogram_jpg: holds the source images provided by Space Environment Artificial Intelligence Early Warning Innovation Workshop through the link https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip. It comprises 73,810 samples of high-quality magnetograms captured by HMI/SDO from 2010 May 4 to 2019 January 26. The HMI instrument provides these data (stored in hmi.sharp_720s dataset), making new samples available every 12 minutes. However, the images from this dataset were collected every 96 minutes. Each image has an associated magnetogram comprising a ready-made snippet of one or most solar ARs. It is essential to notice that the magnetograms cropped by SHARP can contain one or more solar ARs classified by the National Oceanic and Atmospheric Administration (NOAA). Seq_Magnetogram: contains the references for source images with the corresponding labels in the next 24 h. and 48 h. in the respectively M24 and M48 sub-folders.
M24/M48: both present the following sub-folders structure:
Seqs16; SF_MViT; SF_MViT_oT; SF_MViT_oTV; SF_MViT_oTV_Test. There are also two files in root:
inst_packages.sh: install the packages and dependencies to run the models. download_MViTS.py: download the pre-trained MViTv2_S from PyTorch and store it in the cache. M24 and M48 folders hold reference text files (flare_Mclass...) linking the images in the magnetogram_jpg folders or the sequences (Seq16_flare_Mclass...) in the Seqs16 folders with their respective labels. They also hold "cria_seqs.py" which was responsible for creating the sequences and "test_pandas.py" to verify head info and check the number of samples categorized by the label of the text files. All the text files with the prefix "Seq16" and inside the Seqs16 folder were created by "criaseqs.py" code based on the correspondent "flare_Mclass" prefixed text files. Seqs16 folder holds reference text files, in which each file contains a sequence of images that was pointed to the magnetogram_jpg folders. All SF_MViT... folders hold the model training codes itself (SF_MViT...py) and the corresponding job submission (jobMViT...), temporary input (Seq16_flare...), output (saida_MVIT... and MViT_S...), error (err_MViT...) and checkpoint files (sample-FLARE...ckpt). Executed model training codes generate output, error, and checkpoint files. There is also a folder called "lightning_logs" that stores logs of trained models. Naming pattern for the files:
magnetogram_jpg: follows the format "hmi.sharp_720s...magnetogram.fits.jpg" and Seqs16: follows the format "hmi.sharp_720s...to.", where:
is the date-time when the sequence ends, and follow the same format of . Reference text files in M24 and M48 or inside SF_MViT... folders follows the format "flare_Mclass_.txt", where:
is Seq16 if refers to a sequence, or void if refers direct to images.
"24h" or "48h".
is "TrainVal" or "Test". The refers to the split of Train/Val.
void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. All SF_MViT...folders:
void or "oT" (over Train) or "oTV" (over Train and Val) or "oTV_Test" (over Train, Val and Test);
"24h" or "48h";
"oneSplit" for a specific split or "allSplits" if run all splits.
void is default to run 1 GPU or "2gpu" to run into 2 gpus systems; Job submission files: "jobMViT_", where:
point the queue in Lovelace environment hosted on CENAPAD-SP (https://www.cenapad.unicamp.br/parque/jobsLovelace) Temporary inputs: "Seq16_flare_Mclass_.txt:
train or val;
void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. Outputs: "saida_MViT_Adam_10-7", where:
k0 to k4, means the correlated split of the output, or void if the output is from all splits. Error files: "err_MViT_Adam_10-7", where:
k0 to k4, means the correlated split of the error log file, or void if the error file is from all splits. Checkpoint files: "sample-FLARE_MViT_S_10-7-epoch=-valid_loss=-Wloss_k=.ckpt", where:
epoch number of the checkpoint;
corresponding valid loss;
0 to 4.