19 datasets found
  1. Z

    Data from: Solar flare forecasting based on magnetogram sequences learning...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grim, Luís Fernando Lopes; Sampaio Gradvohl, André Leon (2023). Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_10246576
    Explore at:
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Universidade Estadual de Campinas
    Universidade Estadual de Campinas (UNICAMP)
    Authors
    Grim, Luís Fernando Lopes; Sampaio Gradvohl, André Leon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Source codes and dataset of the research "Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation". Our work employed PyTorch, a framework for training Deep Learning models with GPU support and automatic back-propagation, to load the MViTv2 s models with Kinetics-400 weights. To simplify the code implementation, eliminating the need for an explicit loop to train and the automation of some hyperparameters, we use the PyTorch Lightning module. The inputs were batches of 10 samples with 16 sequenced images in 3-channel resized to 224 × 224 pixels and normalized from 0 to 1. Most of the papers in our literature survey split the original dataset chronologically. Some authors also apply k-fold cross-validation to emphasize the evaluation of the model stability. However, we adopt a hybrid split taking the first 50,000 to apply the 5-fold cross-validation between the training and validation sets (known data), with 40,000 samples for training and 10,000 for validation. Thus, we can evaluate performance and stability by analyzing the mean and standard deviation of all trained models in the test set, composed of the last 9,834 samples, preserving the chronological order (simulating unknown data). We develop three distinct models to evaluate the impact of oversampling magnetogram sequences through the dataset. The first model, Solar Flare MViT (SF MViT), has trained only with the original data from our base dataset without using oversampling. In the second model, Solar Flare MViT over Train (SF MViT oT), we only apply oversampling on training data, maintaining the original validation dataset. In the third model, Solar Flare MViT over Train and Validation (SF MViT oTV), we apply oversampling in both training and validation sets. We also trained a model oversampling the entire dataset. We called it the "SF_MViT_oTV Test" to verify how resampling or adopting a test set with unreal data may bias the results positively. GitHub version The .zip hosted here contains all files from the project, including the checkpoint and the output files generated by the codes. We have a clean version hosted on GitHub (https://github.com/lfgrim/SFF_MagSeq_MViTs), without the magnetogram_jpg folder (which can be downloaded directly on https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip) and the output and checkpoint files. Most code files hosted here also contain comments on the Portuguese language, which are being updated to English in the GitHub version. Folders Structure In the Root directory of the project, we have two folders:

    magnetogram_jpg: holds the source images provided by Space Environment Artificial Intelligence Early Warning Innovation Workshop through the link https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip. It comprises 73,810 samples of high-quality magnetograms captured by HMI/SDO from 2010 May 4 to 2019 January 26. The HMI instrument provides these data (stored in hmi.sharp_720s dataset), making new samples available every 12 minutes. However, the images from this dataset were collected every 96 minutes. Each image has an associated magnetogram comprising a ready-made snippet of one or most solar ARs. It is essential to notice that the magnetograms cropped by SHARP can contain one or more solar ARs classified by the National Oceanic and Atmospheric Administration (NOAA). Seq_Magnetogram: contains the references for source images with the corresponding labels in the next 24 h. and 48 h. in the respectively M24 and M48 sub-folders.

    M24/M48: both present the following sub-folders structure:

    Seqs16; SF_MViT; SF_MViT_oT; SF_MViT_oTV; SF_MViT_oTV_Test. There are also two files in root:

    inst_packages.sh: install the packages and dependencies to run the models. download_MViTS.py: download the pre-trained MViTv2_S from PyTorch and store it in the cache. M24 and M48 folders hold reference text files (flare_Mclass...) linking the images in the magnetogram_jpg folders or the sequences (Seq16_flare_Mclass...) in the Seqs16 folders with their respective labels. They also hold "cria_seqs.py" which was responsible for creating the sequences and "test_pandas.py" to verify head info and check the number of samples categorized by the label of the text files. All the text files with the prefix "Seq16" and inside the Seqs16 folder were created by "criaseqs.py" code based on the correspondent "flare_Mclass" prefixed text files. Seqs16 folder holds reference text files, in which each file contains a sequence of images that was pointed to the magnetogram_jpg folders. All SF_MViT... folders hold the model training codes itself (SF_MViT...py) and the corresponding job submission (jobMViT...), temporary input (Seq16_flare...), output (saida_MVIT... and MViT_S...), error (err_MViT...) and checkpoint files (sample-FLARE...ckpt). Executed model training codes generate output, error, and checkpoint files. There is also a folder called "lightning_logs" that stores logs of trained models. Naming pattern for the files:

    magnetogram_jpg: follows the format "hmi.sharp_720s...magnetogram.fits.jpg" and Seqs16: follows the format "hmi.sharp_720s...to.", where:

    hmi: is the instrument that captured the image
    sharp_720s: is the database source of SDO/HMI.
    is the identification of SHARP region, and can contain one or more solar ARs classified by the (NOAA).
    is the date-time the instrument captured the image in the format yyyymmdd_hhnnss_TAI (y:year, m:month, d:day, h:hours, n:minutes, s:seconds).
    is the date-time when the sequence starts, and follow the same format of .

    is the date-time when the sequence ends, and follow the same format of . Reference text files in M24 and M48 or inside SF_MViT... folders follows the format "flare_Mclass_.txt", where:

    is Seq16 if refers to a sequence, or void if refers direct to images.

    "24h" or "48h".

    is "TrainVal" or "Test". The refers to the split of Train/Val.

    void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. All SF_MViT...folders:

    Model training codes: "SF_MViT_M+_", where:

    void or "oT" (over Train) or "oTV" (over Train and Val) or "oTV_Test" (over Train, Val and Test);

    "24h" or "48h";

    "oneSplit" for a specific split or "allSplits" if run all splits.

    void is default to run 1 GPU or "2gpu" to run into 2 gpus systems; Job submission files: "jobMViT_", where:

    point the queue in Lovelace environment hosted on CENAPAD-SP (https://www.cenapad.unicamp.br/parque/jobsLovelace) Temporary inputs: "Seq16_flare_Mclass_.txt:

    train or val;

    void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. Outputs: "saida_MViT_Adam_10-7", where:

    k0 to k4, means the correlated split of the output, or void if the output is from all splits. Error files: "err_MViT_Adam_10-7", where:

    k0 to k4, means the correlated split of the error log file, or void if the error file is from all splits. Checkpoint files: "sample-FLARE_MViT_S_10-7-epoch=-valid_loss=-Wloss_k=.ckpt", where:

    epoch number of the checkpoint;

    corresponding valid loss;

    0 to 4.

  2. pytorch_image_models

    • kaggle.com
    zip
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HyeongChan Kim (2025). pytorch_image_models [Dataset]. https://www.kaggle.com/datasets/kozistr/pytorch-image-models
    Explore at:
    zip(3469394 bytes)Available download formats
    Dataset updated
    Oct 30, 2025
    Authors
    HyeongChan Kim
    Description

    PyTorch Image Models

    Sponsors

    A big thank you to my GitHub Sponsors for their support!

    In addition to the sponsors at the link above, I've received hardware and/or cloud resources from * Nvidia (https://www.nvidia.com/en-us/) * TFRC (https://www.tensorflow.org/tfrc)

    I'm fortunate to be able to dedicate significant time and money of my own supporting this and other open source projects. However, as the projects increase in scope, outside support is needed to continue with the current trajectory of hardware, infrastructure, and electricty costs.

    What's New

    Aug 18, 2021

    • Optimizer bonanza!
      • Add LAMB and LARS optimizers, incl trust ratio clipping options. Tweaked to work properly in PyTorch XLA (tested on TPUs w/ timm bits branch)
      • Add MADGRAD from FB research w/ a few tweaks (decoupled decay option, step handling that works with PyTorch XLA)
      • Some cleanup on all optimizers and factory. No more .data, a bit more consistency, unit tests for all!
      • SGDP and AdamP still won't work with PyTorch XLA but others should (have yet to test Adabelief, Adafactor, Adahessian myself).
    • EfficientNet-V2 XL TF ported weights added, but they don't validate well in PyTorch (L is better). The pre-processing for the V2 TF training is a bit diff and the fine-tuned 21k -> 1k weights are very sensitive and less robust than the 1k weights.
    • Added PyTorch trained EfficientNet-V2 'Tiny' w/ GlobalContext attn weights. Only .1-.2 top-1 better than the SE so more of a curiosity for those interested.

    July 12, 2021

    July 5-9, 2021

    • Add efficientnetv2_rw_t weights, a custom 'tiny' 13.6M param variant that is a bit better than (non NoisyStudent) B3 models. Both faster and better accuracy (at same or lower res)
      • top-1 82.34 @ 288x288 and 82.54 @ 320x320
    • Add SAM pretrained in1k weight for ViT B/16 (vit_base_patch16_sam_224) and B/32 (vit_base_patch32_sam_224) models.
    • Add 'Aggregating Nested Transformer' (NesT) w/ weights converted from official Flax impl. Contributed by Alexander Soare.
      • jx_nest_base - 83.534, jx_nest_small - 83.120, jx_nest_tiny - 81.426

    June 23, 2021

    • Reproduce gMLP model training, gmlp_s16_224 trained to 79.6 top-1, matching paper. Hparams for this and other recent MLP training here

    June 20, 2021

    • Release Vision Transformer 'AugReg' weights from How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
      • .npz weight loading support added, can load any of the 50K+ weights from the AugReg series
      • See example notebook from official impl for navigating the augreg weights
      • Replaced all default weights w/ best AugReg variant (if possible). All AugReg 21k classifiers work.
      • Highlights: vit_large_patch16_384 (87.1 top-1), vit_large_r50_s32_384 (86.2 top-1), vit_base_patch16_384 (86.0 top-1)
      • vit_deit_* renamed to just deit_*
      • Remove my old small model, replace with DeiT compatible small w/ AugReg weights
    • Add 1st training of my gmixer_24_224 MLP /w GLU, 78.1 top-1 w/ 25M params.
    • Add weights from official ResMLP release (https://github.com/facebookresearch/deit)
    • Add eca_nfnet_l2 weights from my 'lightweight' series. 84.7 top-1 at 384x384.
    • Add distilled BiT 50x1 student and 152x2 Teacher weights from Knowledge distillation: A good teacher is patient and consistent
    • NFNets and ResNetV2-BiT models work w/ Pytorch XLA now
      • weight standardization uses F.batch_norm instead of std_mean (std_mean wasn't lowered)
      • eps values adjusted, will be slight differences but should be quite close
    • Improve test coverage and classifier interface of non-conv (vision transformer and mlp) models ...
  3. Data supplement: Detection of Drainage Ditches from LiDAR DTM Using U-Net...

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Holger Virro; Holger Virro; Alexander Kmoch; Alexander Kmoch; William Lidberg; William Lidberg; Merle Muru; Merle Muru; Wai Tik Chan; Wai Tik Chan; Desalew Meseret Moges; Desalew Meseret Moges; Evelyn Uuemaa; Evelyn Uuemaa (2025). Data supplement: Detection of Drainage Ditches from LiDAR DTM Using U-Net and Transfer Learning [Dataset]. http://doi.org/10.5281/zenodo.14893004
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Holger Virro; Holger Virro; Alexander Kmoch; Alexander Kmoch; William Lidberg; William Lidberg; Merle Muru; Merle Muru; Wai Tik Chan; Wai Tik Chan; Desalew Meseret Moges; Desalew Meseret Moges; Evelyn Uuemaa; Evelyn Uuemaa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data supplement: Detection of Drainage Ditches from LiDAR DTM Using U-Net and Transfer Learning

    Holger Virro, Alexander Kmoch, William Lidberg, Wai Tik Chan, Evelyn Uuemaa

    Accurate mapping of ditches is essential for effective hydrological modeling and land management. Traditional methods, such as manual digitization or threshold-based extraction, utilize LiDAR-derived digital terrain model (DTM) data but are labor-intensive and impractical to apply for large-scale applications. Deep learning offers a promising alternative but requires extensive labeled data, often unavailable. To address this, we developed a transfer learning approach using a U-Net model pre-trained on a large high-quality Swedish dataset and fine-tuned on a smaller localized Estonian dataset. The model uses a single-band LiDAR DTM raster as input, minimizing preprocessing. We identified the optimal model configuration by systematically testing kernel sizes and data augmentation. The best fine-tuned model achieved an overall F1 score of 0.766, demonstrating its effectiveness in detecting drainage ditches in training data-scarce regions. Performance varied by land use, with higher accuracy in peatlands (F1=0.822) than in forests (F1=0.752) and arable land (F1=0.779). These findings underscore the model's suitability for large-scale ditch mapping and its adaptability to different landscapes.

  4. m

    Database of scalable training of neural network potentials for complex...

    • archive.materialscloud.org
    • materialscloud-archive-failover.cineca.it
    bz2, text/markdown +1
    Updated Aug 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith; In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith (2025). Database of scalable training of neural network potentials for complex interfaces through data augmentation [Dataset]. http://doi.org/10.24435/materialscloud:w6-9a
    Explore at:
    bz2, txt, text/markdownAvailable download formats
    Dataset updated
    Aug 13, 2025
    Dataset provided by
    Materials Cloud
    Authors
    In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith; In Won Yeu; Annika Stuke; Alexander Urban; Nongnuch Artrith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database contains the reference data used for direct force training of Artificial Neural Network (ANN) interatomic potentials using the atomic energy network (ænet) and ænet-PyTorch packages (https://github.com/atomisticnet/aenet-PyTorch). It also includes the GPR-augmented data used for indirect force training via Gaussian Process Regression (GPR) surrogate models using the ænet-GPR package (https://github.com/atomisticnet/aenet-gpr). Each data file contains atomic structures, energies, and atomic forces in XCrySDen Structure Format (XSF). The dataset includes all reference training/test data and corresponding GPR-augmented data used in the four benchmark examples presented in the reference paper, "Scalable Training of Neural Network Potentials for Complex Interfaces Through Data Augmentation". A hierarchy of the dataset is described in the README.txt file, and an overview of the dataset is also summarized in supplementary Table S1 of the reference paper.

  5. Lunar Reconnaissance Orbiter Imagery for LROCNet Moon Classifier

    • zenodo.org
    bin, zip
    Updated Nov 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Dunkel; Emily Dunkel (2022). Lunar Reconnaissance Orbiter Imagery for LROCNet Moon Classifier [Dataset]. http://doi.org/10.5281/zenodo.7041842
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Nov 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Emily Dunkel; Emily Dunkel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary

    We provide imagery used to train LROCNet -- our Convolutional Neural Network classifier of orbital imagery of the moon. Images are divided into train, validation, and test zip files, which contain class specific sub-folders. We have three classes: "fresh crater", "old crater", and "none". Classes are described in detail in the attached labeling guide.

    Directory Contents

    We include the labeling guide and training, testing, and validation data. Training data was split to avoid upload timeouts.

    • LROC_Labeling_Intro_for_release.ppt: Labeling guide
    • val: Validation images divided into class sub-folders
      • ejecta: "fresh crater" class
      • oldcrater: "old crater" class
      • none: "none" class
    • test: Testing images divided into class sub-folders
      • ejecta: "fresh crater" class
      • oldcrater: "old crater" class
      • none: "none" class
    • ejecta_train: Training images of "fresh crater" class
    • oldcrater_train: Training images of "old crater" class
    • none_train1-4: Training images of "none" class (divided into 4 just for uploading)

    Data Description

    We use CDR (Calibrated Data Record) browse imagery (50% resolution) from the Lunar Reconnaissance Orbiter's Narrow Angle Cameras (NACs). Data we get from the NACs are 5-km swaths, at nominal orbit, so we perform a saliency detection step to find surface features of interest. A detector developed for Mars HiRISE (Wagstaff et al.) worked well for our purposes, after updating based on LROC NAC image resolution. We use this detector to create a set of image chipouts (small 227x277 cutouts) from the larger image, sampling the lunar globe.

    Class Labeling

    We select classes of interest based on what is visible at the NAC resolution, consulting with scientists and performing a literature review. Initially, we have 7 classes: "fresh crater", "old crater", "overlapping craters", "irregular mare patches", "rockfalls and landfalls", "of scientific interest", and "none".

    Using the Zooniverse platform, we set up a labeling tool and labeled 5,000 images. We found that "fresh crater" make up 11% of the data, "old crater" 18%, with the vast majority "none". Due to limited examples of the other classes, we reduce our initial class set to: "fresh crater" (with impact ejecta), "old crater", and "none".

    We divide the images into train/validation/test sets making sure no image swaths span multiple sets.

    Data Augmentation

    Using PyTorch, we apply the following augmentation on the training set only: horizontal flip, vertical flip, rotation by 90/180/270 degrees, and brightness adjustment (0.5, 2). In addition, we use weighted sampling so that each class is weighted equally. The training set included here does not include augmentation since that was performed within PyTorch.

    Acknowledgements

    The author would like to thank the volunteers who provided annotations for this data set, as well as others who contributed to this work (as in the Contributor list). We would also like to thank the PDS Imaging Node for support of this work.

    The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).

    CL#22-4763

    © 2022 California Institute of Technology. Government sponsorship acknowledged.

  6. Z

    BIRD: Big Impulse Response Dataset

    • data.niaid.nih.gov
    • kaggle.com
    Updated Oct 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grondin, François; Lauzon, Jean-Samuel; Michaud, Simon; Ravanelli, Mirco; Michaud, François (2020). BIRD: Big Impulse Response Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4139415
    Explore at:
    Dataset updated
    Oct 29, 2020
    Dataset provided by
    Université de Sherbrooke
    Mila - Université de Montréal
    Authors
    Grondin, François; Lauzon, Jean-Samuel; Michaud, Simon; Ravanelli, Mirco; Michaud, François
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BIRD is an open dataset that consists of 100,000 multichannel room impulse responses generated using the image method. This makes it the largest multichannel open dataset currently available. We provide some Python code that shows how to download and use this dataset to perform online data augmentation. The code is compatible with the PyTorch dataset class, which eases integration in existing deep learning projects based on this framework.

  7. Dataset

    • kaggle.com
    zip
    Updated Nov 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Fowler (2025). Dataset [Dataset]. https://www.kaggle.com/datasets/vacantdaniel/dataset
    Explore at:
    zip(41539094 bytes)Available download formats
    Dataset updated
    Nov 10, 2025
    Authors
    Daniel Fowler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ECG Image Digitization - Deep Learning Solution

    Overview

    This dataset contains a complete deep learning pipeline for digitizing ECG images into time series data for the PhysioNet ECG Image Digitization Challenge. The solution extracts 12-lead ECG signals from degraded images including scans, photos, and physically damaged printouts.

    Model Architecture

    Encoder-Decoder Design

    • Encoder: ResNet-inspired CNN with residual connections

      • Progressive channel expansion: 64 → 128 → 256 → 512
      • Extracts robust spatial features from preprocessed ECG images
      • Handles rotation, noise, and physical artifacts
    • Decoder: Bidirectional LSTM with attention

      • 3-layer BiLSTM with 512 hidden units per direction
      • 12 independent prediction heads for lead-specific reconstruction
      • Dynamic sequence length: 10s for Lead II, 2.5s for other leads

    Preprocessing Pipeline

    The solution includes comprehensive image enhancement:

    1. Denoising: Non-local means filtering removes scanning artifacts
    2. Rotation Correction: Hough transform-based automatic alignment
    3. Contrast Enhancement: CLAHE (Contrast Limited Adaptive Histogram Equalization)
    4. Normalization: ImageNet-style standardization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    5. Resizing: Standardized to 512×1024 pixels for consistent processing

    Training Strategy

    Optimization

    • Loss Function: Mean Squared Error (MSE) between predicted and ground truth signals
    • Optimizer: AdamW with weight decay (1e-5) for regularization
    • Learning Rate: 1e-3 with ReduceLROnPlateau scheduler (patience=5, factor=0.5)
    • Batch Size: 4 (optimized for memory efficiency)
    • Epochs: 50 with early stopping based on validation SNR

    Data Augmentation

    Training robustness achieved through realistic augmentations: - Gaussian noise injection (simulates scanning noise) - Motion and Gaussian blur (simulates camera shake) - Grid distortion (simulates paper warping) - Random brightness/contrast adjustments - Elastic transformations (simulates physical deformation)

    Validation Strategy

    • 90/10 train-validation split
    • Modified SNR metric for evaluation
    • Time alignment: ±0.2 second tolerance
    • Vertical alignment: Automatic offset correction

    Advanced Features

    Ensemble Methods

    • Weighted averaging of multiple models
    • Stacking ensemble support for improved accuracy
    • Configurable ensemble weights

    Post-Processing

    • Wavelet denoising for signal smoothing
    • Savitzky-Golay filtering
    • Baseline drift correction
    • ECG lead relationship constraints (Einthoven's and Goldberger's laws)

    Transfer Learning

    • Pre-trained ResNet and EfficientNet backbones from ImageNet
    • Fine-tuning on ECG-specific features
    • Faster convergence and better generalization

    Cross-Validation

    • K-fold validation framework (default k=5)
    • Stratified splits for robust evaluation
    • Performance metrics aggregated across folds

    Key Capabilities

    Artifact Handling

    • Rotation Errors: Automatic detection and correction up to ±45 degrees
    • Scanning Noise: Robust denoising without signal degradation
    • Physical Damage: Resilient to stains, tears, and occlusions
    • Variable Quality: Handles both high-quality scans and low-quality photos

    Multi-Lead Consistency

    • Independent prediction heads maintain lead-specific characteristics
    • Enforces physiological relationships between leads
    • Consistent temporal alignment across all leads

    Signal Quality Metrics

    • Modified SNR calculation with automatic alignment
    • Lead-specific performance tracking
    • Comprehensive validation reporting

    Dataset Contents

    Core Implementation Files

    • model.py: Neural network architecture (ResNet encoder + LSTM decoder)
    • preprocessing.py: Image preprocessing and augmentation pipeline
    • dataset.py: PyTorch data loaders for training and inference
    • metrics.py: Modified SNR evaluation metric
    • train.py: Complete training loop with checkpointing
    • inference.py: Batch prediction and submission generation
    • app.py: Streamlit interactive visualization interface

    Advanced Features

    • ensemble.py: Multi-model ensemble methods
    • postprocessing.py: Signal refinement and constraint enforcement
    • transfer_learning.py: Pre-trained backbone integration
    • cross_validation.py: K-fold validation framework
    • hyperparameter_tuning.py: Automated grid search

    Utilities

    • config.py: Centralized configuration management
    • utils.py: Helper functions for data handling
    • kaggle_submission_standalone.py: Self-contained Kaggle notebook

    Performance Characteristics

    Training Performance

    • Training Time: 2-4 hours on GPU (50 epochs, full dataset)
    • Memory Usage: ~6GB GPU memory (batch size 4)
    • Convergence: Typically reaches optimal SNR within 30-40 epochs

    Inference Performance

    • Speed: ~1-2 sec...
  8. Drill image dataset for training part II.

    • plos.figshare.com
    zip
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingjun Yu; Guannan Wang; Hai Cheng; Wenzhi Guo; Yanbiao Liu (2024). Drill image dataset for training part II. [Dataset]. http://doi.org/10.1371/journal.pone.0299471.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Qingjun Yu; Guannan Wang; Hai Cheng; Wenzhi Guo; Yanbiao Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Structural planes decrease the strength and stability of rock masses, severely affecting their mechanical properties and deformation and failure characteristics. Therefore, investigation and analysis of structural planes are crucial tasks in mining rock mechanics. The drilling camera obtains image information of deep structural planes of rock masses through high-definition camera methods, providing important data sources for the analysis of deep structural planes of rock masses. This paper addresses the problems of high workload, low efficiency, high subjectivity, and poor accuracy brought about by manual processing based on current borehole image analysis and conducts an intelligent segmentation study of borehole image structural planes based on the U2-Net network. By collecting data from 20 different borehole images in different lithological regions, a dataset consisting of 1,013 borehole images with structural plane type, lithology, and color was established. Data augmentation methods such as image flipping, color jittering, blurring, and mixup were applied to expand the dataset to 12,421 images, meeting the requirements for deep network training data. Based on the PyTorch deep learning framework, the initial U2-Net network weights were set, the learning rate was set to 0.001, the training batch was 4, and the Adam optimizer adaptively adjusted the learning rate during the training process. A dedicated network model for segmenting structural planes was obtained, and the model achieved a maximum F-measure value of 0.749 when the confidence threshold was set to 0.7, with an accuracy rate of up to 0.85 within the range of recall rate greater than 0.5. Overall, the model has high accuracy for segmenting structural planes and very low mean absolute error, indicating good segmentation accuracy and certain generalization of the network. The research method in this paper can serve as a reference for the study of intelligent identification of structural planes in borehole images.

  9. Improved-ssd-pytorch.

    • plos.figshare.com
    zip
    Updated Nov 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diansheng Zhang; Yueyuan Zhang; Leilei Dong; Shifeng Ruan; Zhiwei Liu (2025). Improved-ssd-pytorch. [Dataset]. http://doi.org/10.1371/journal.pone.0333574.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 18, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Diansheng Zhang; Yueyuan Zhang; Leilei Dong; Shifeng Ruan; Zhiwei Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fires are characterized by their sudden onset, rapid spread, and destructive nature, often causing irreversible damage to ecosystems. To address the challenges in forest fire detection, including the varying scales and complex features of flame and smoke, as well as false positives and missed detections caused by environmental interference, we propose a novel object detection model named CBAM-SSD. Firstly, data augmentation techniques involving geometric and color transformations are employed to enrich the dataset, effectively mitigating issues of insufficient and incomplete data collected in real-world scenarios. This significantly enhances the SSD model’s ability to detect flames, which exhibit highly variable morphological characteristics. Furthermore, the CBAM module is integrated into the SSD backbone network to reconstruct its feature extraction structure. This module adaptively weights flame color and smoke texture along the channel dimension and highlights critical fire regions in the spatial dimension, substantially improving the model’s perception of key fire features. Experimental results demonstrate that the CBAM-SSD model is lightweight and suitable for real-time detection, achieving a mAP@0.5 of 97.55% for flames and smoke, a 1.53% improvement over the baseline SSD. Specifically, the AP50 for flame detection reaches 96.61%, a 3.01% increase compared to the baseline, with a recall of 96.40%; while the AP50 for smoke detection reaches 98.49%, with a recall of 98.80%. These results indicate that the improved model delivers higher detection accuracy and lower false and missed detection rates, offering an efficient, convenient, and accurate solution for forest fire detection.

  10. Smart Wardrobe Clothing Dataset

    • kaggle.com
    zip
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hizkia Siregar (2025). Smart Wardrobe Clothing Dataset [Dataset]. https://www.kaggle.com/datasets/hizkiasiregar/smart-wardrobe-clothing-dataset
    Explore at:
    zip(299723916 bytes)Available download formats
    Dataset updated
    Sep 1, 2025
    Authors
    Hizkia Siregar
    Description

    This dataset was created to support machine learning research in clothing classification, particularly for smart wardrobe and laundry applications. Inspired by the digital wardrobe concept popularized in media such as Clueless (1995), the dataset contains three primary categories of clothing items: - Tops: t-shirts, button-up shirts, sweaters, hoodies, and other upper garments. - Bottoms: jeans, shorts, formal pants, long trousers, and other lower garments. - Socks: long socks and short socks photographed in pairs and individually.

    All images were self-collected using an iPhone camera in HEIC format and later converted to JPG/PNG. Backgrounds were removed manually using Canva and programmatically using Rembg with the U²-Net model. Augmentation techniques (rotation, flipping, cropping, brightness and contrast adjustments) were applied to increase dataset diversity. - Raw images: 521 (200 tops, 200 bottoms, 121 socks) - Final images after augmentation: ~1,900 (balanced across all classes)

    This dataset can be used for experiments in: - Image classification - Data augmentation pipelines - Transfer learning (e.g., Teachable Machine, TensorFlow, PyTorch) - Applied computer vision in smart wardrobe and smart home systems

  11. WeedCrop Image Dataset

    • kaggle.com
    zip
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinayak Shanawad (2022). WeedCrop Image Dataset [Dataset]. https://www.kaggle.com/datasets/vinayakshanawad/weedcrop-image-dataset/data
    Explore at:
    zip(263674982 bytes)Available download formats
    Dataset updated
    Jun 30, 2022
    Authors
    Vinayak Shanawad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    WeedCrop Image Dataset

    Data Description

    It includes 2822 images. Weed are annotated in YOLO v5 PyTorch format.

    The following pre-processing was applied to each image: * Auto-orientation of pixel data (with EXIF-orientation stripping)

    The following augmentation was applied to create 3 versions of each source image: * Equal probability of one of the following 90-degree rotations: none, clockwise, counter-clockwise * Random shear of between -15° to +15° horizontally and -15° to +15° vertically * Random brigthness adjustment of between -25 and +25 percent

    Classes

    Crop, Weed

    Inspiration

    Identifying weeds and distinguish them from crops is very essential in Farming.

    Acknowledgements

    This dataset is derived by the following publication:

    Kaspars Sudars, Janis Jasko, Ivars Namatevs, Liva Ozola, Niks Badaukis, Dataset of annotated food crops and weed images for robotic computer vision control, Data in Brief, Volume 31, 2020, 105833, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2020.105833. (https://www.sciencedirect.com/science/article/pii/S2352340920307277) Abstract: Weed management technologies that can identify weeds and distinguish them from crops are in need of artificial intelligence solutions based on a computer vision approach, to enable the development of precisely targeted and autonomous robotic weed management systems. A prerequisite of such systems is to create robust and reliable object detection that can unambiguously distinguish weed from food crops. One of the essential steps towards precision agriculture is using annotated images to train convolutional neural networks to distinguish weed from food crops, which can be later followed using mechanical weed removal or selected spraying of herbicides. In this data paper, we propose an open-access dataset with manually annotated images for weed detection. The dataset is composed of 1118 images in which 6 food crops and 8 weed species are identified, altogether 7853 annotations were made in total. Three RGB digital cameras were used for image capturing: Intel RealSense D435, Canon EOS 800D, and Sony W800. The images were taken on food crops and weeds grown in controlled environment and field conditions at different growth stages Keywords: Computer vision; Object detection; Image annotation; Precision agriculture; Crop growth and development

    Many thanks to Roboflow team for sharing this data.

  12. S

    Sign4all: a Spanish Sign Language Dataset

    • scidb.cn
    • observatorio-cientifico.ua.es
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Morillas-Espejo; Ester Martinez-Martin (2025). Sign4all: a Spanish Sign Language Dataset [Dataset]. http://doi.org/10.57760/sciencedb.28304
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Francisco Morillas-Espejo; Ester Martinez-Martin
    Description

    DescriptionThe Sign4all dataset is designed for research in Isolated Sign Language Recognition (ISLR), with a focus on Spanish Sign Language (Lengua de Signos Española, LSE). It includes high-resolution RGB video recordings and corresponding skeletal keypoints for 24 signs related to daily activities, particularly within the context of dining and catering. The dataset captures both right-handed and left-handed sign executions, offering a balanced and diverse collection aimed at developing inclusive SLR systems. In total, the dataset includes 7,756 manually segmented video samples and skeletal annotations, with an augmented version expanding to 61,409 samples to support deep learning applications.Data generation proceduresEight participants (4 male, 4 female) recorded signs using an Azure Kinect DK camera at 2560×1440 resolution and 30 fps. Each participant performed all signs with both their dominant and non-dominant hands to simulate variability in signer handedness. Signs were performed from a neutral starting position and followed a structured protocol for consistency. Recordings were conducted in a controlled indoor environment with stable lighting and no clothing restrictions, ensuring realistic visual diversity.All signs are dynamic (in-motion) gestures. Each was recorded approximately 20 times per hand. The camera was fixed at a height of 117 cm, and participants stood at variable distances (100–170 cm) to account for individual height differences.Data structure and formatThe dataset is organized into four versions:RGB Original: Raw segmented videos without background removal or normalization.RGB Normalized: Background-cropped, square videos temporally normalized to 48 frames.RGB Normalized + Augmentation: Additional visual augmentations applied.Skeletal Keypoints: 48×100 matrices per sample, representing 2D keypoints in HDF5 format.The RGB samples are distributed using AVI format.Use cases and reusabilityThe Sign4all dataset supports a variety of applications in Sign Language Technology:Isolated Sign Language RecognitionSkeletal-based gesture recognitionSigner-independent recognition model trainingResearchers can use RGB or skeletal data directly with deep learning models in PyTorch, TensorFlow or other frameworks. Due to privacy concerns, dataset access is restricted and requires a Data Usage Agreement (DUA) request.

  13. research on soyabean leaves

    • figshare.com
    pdf
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prajwal Bawankar (2025). research on soyabean leaves [Dataset]. http://doi.org/10.6084/m9.figshare.28797590.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 15, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Prajwal Bawankar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project focuses on developing an intelligent system capable of detecting and classifying diseases in plant leaves using image processing and deep learning techniques. Leveraging Convolutional Neural Networks (CNNs) and transfer learning, the system analyzes leaf images to identify signs of infection with high accuracy. It supports smart agriculture by enabling early disease detection, reducing crop loss, and providing actionable insights to farmers. The project uses datasets such as PlantVillage and integrates frameworks like TensorFlow, Keras, and PyTorch. The model can be deployed as a web or mobile application, offering a real-time solution for plant health monitoring in agricultural environments.

  14. Audiomentations

    • kaggle.com
    zip
    Updated Apr 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    atfujita (2022). Audiomentations [Dataset]. https://www.kaggle.com/datasets/atsunorifujita/audiomentations
    Explore at:
    zip(62619 bytes)Available download formats
    Dataset updated
    Apr 22, 2022
    Authors
    atfujita
    Description

    A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.

    Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!

  15. cifar_10_in_tensor

    • kaggle.com
    zip
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KKaiWWang (2022). cifar_10_in_tensor [Dataset]. https://www.kaggle.com/datasets/kkaiwwang/cifar-10-in-tensor
    Explore at:
    zip(1454680895 bytes)Available download formats
    Dataset updated
    Oct 28, 2022
    Authors
    KKaiWWang
    Description

    CIFAR-10 Dataset with format of Pytorch Tensor.

    You can directly use torch.load('---File_Path---') to load data.

    The whole dataset was seperated into 3 parts: train_X, train_y, test_X. Specifically, train_X contains 50, 000 'images' and test_X contains 300, 000 'images'. To be more detailed, train_X has shape of (50000, 3, 32, 32), train_y has shape of (50000,) and test_X has shape of (300000, 3, 32, 32).

    Tips: If you wanna use data augment, it's unnecessary to transform these tensors to images to do so, actually you can directly apply Torchvision Transforms (or a Compose of Transforms) on tensors, it does work :)

  16. Grape-Instance-Segmentation-For-Viticulture

    • kaggle.com
    zip
    Updated Aug 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kaannarik (2025). Grape-Instance-Segmentation-For-Viticulture [Dataset]. https://www.kaggle.com/datasets/kaanarikkk/grape-instance-segmentation-for-viticulture
    Explore at:
    zip(241118192 bytes)Available download formats
    Dataset updated
    Aug 25, 2025
    Authors
    kaannarik
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Grape-Instance-Segmentation-For-Viticulture

    In this dataset, using a unique dataset (FGVL Dataset) collected from Sultana seedless grape vineyards in the Aegean Region of Turkey, an example segmentation model has been developed to classify frost-damaged leaves and grape clusters at the pixel level. The dataset includes 418 frost-damaged grapes, 510 frost-damaged leaves, 395 healthy grapes, and 698 healthy leaves, collected after a severe frost event in April 2025 at a vineyard in Manisa. The im-ages were captured in high resolution under natural lighting conditions and manually labeled by experts.

    Instructions

    Participants must use the FGVL Dataset to develop deep learning models for instance segmentation of frost-damaged and healthy grape leaves and clusters.

    You are free to use any image processing or deep learning framework (e.g., YOLOv11, PyTorch, TensorFlow) and apply data augmentation, model tuning, and evaluation techniques.

    Submissions will be evaluated based on mAP@50 and mAP@50-95 metrics on the test set.

  17. cars_wagonr_swift

    • kaggle.com
    zip
    Updated Sep 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajay (2019). cars_wagonr_swift [Dataset]. https://www.kaggle.com/ajaykgp12/cars-wagonr-swift
    Explore at:
    zip(44486490 bytes)Available download formats
    Dataset updated
    Sep 11, 2019
    Authors
    Ajay
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Data science beginners start with curated set of data, but it's a well known fact that in a real Data Science Project, major time is spent on collecting, cleaning and organizing data . Also domain expertise is considered as important aspect of creating good ML models. Being an automobile enthusiast, I tool up this challenge to collect images of two of the popular car models from a used car website, where users upload the images of the car they want to sell and then train a Deep Neural Network to identify model of a car from car images. In my search for images I found that approximately 10 percent of the cars pictures did not represent the intended car correctly and those pictures have to be deleted from final data.

    Content

    There are 4000 images of two of the popular cars (Swift and Wagonr) in India of make Maruti Suzuki with 2000 pictures belonging to each model. The data is divided into training set with 2400 images , validation set with 800 images and test set with 800 images. The data was randomized before splitting into training, test and validation set.

    The starter kernal is provided for keras with CNN. I have also created github project documenting advanced techniques in pytorch and keras for image classification like data augmentation, dropout, batch normalization and transfer learning

    Inspiration

    1. With small dataset like this, how much accuracy can we achieve and whether more data is always better. The baseline model trained in Keras achieves 88% accuracy on validation set, can we achieve even better performance and by how much.

    2. Is the data collected for the two car models representative of all possible car from all over country or there is sample bias .

    3. I would also like someone to extend the concept to build a use case so that if user uploads an incorrect car picture of car , the ML model could automatically flag it. For example user uploading incorrect model or an image which is not a car

  18. Sign Language Dataset - 5 Essential Phrases

    • kaggle.com
    zip
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Hamdey (2025). Sign Language Dataset - 5 Essential Phrases [Dataset]. https://www.kaggle.com/datasets/mohamedhamdey/5-basic-signes
    Explore at:
    zip(22115208 bytes)Available download formats
    Dataset updated
    Oct 25, 2025
    Authors
    Mohamed Hamdey
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Sign Language Recognition Dataset - 5 Essential Phrases

    🎯 Overview

    This dataset contains hand gesture images for sign language recognition, focusing on 5 commonly used phrases. The images are preprocessed, cropped, and ready for training deep learning models for real-time sign language detection applications.

    📊 Dataset Statistics

    • Total Images: ~1,000 images
    • Number of Classes: 5
    • Image Format: JPG
    • Image Size: 224×224 pixels (standardized)
    • Split: 75% Train / 15% Validation / 10% Test

    🏷️ Classes

    Class IDMeaningDescription
    0YesAffirmative gesture
    1NoNegative gesture
    2I Love YouExpression of affection
    3HelloGreeting gesture
    4Thank YouGratitude expression

    📂 Dataset Structure

    data_final/
    ├── train/
    │  ├── 0/  # Yes (~150 images)
    │  ├── 1/  # No (~150 images)
    │  ├── 2/  # I Love You (~150 images)
    │  ├── 3/  # Hello (~150 images)
    │  └── 4/  # Thank You (~150 images)
    ├── val/
    │  ├── 0/
    │  ├── 1/
    │  ├── 2/
    │  ├── 3/
    │  └── 4/
    └── test/
      ├── 0/
      ├── 1/
      ├── 2/
      ├── 3/
      └── 4/
    

    🎨 Data Collection & Preprocessing

    Collection Process:

    • Images collected using webcam in controlled environment
    • Hand gestures detected using MediaPipe hand tracking
    • Multiple angles, positions, and lighting conditions
    • Various hand positions and distances from camera

    Preprocessing:

    • Hand region detection using MediaPipe
    • Automatic cropping to hand bounding box
    • Resized to 224×224 pixels
    • Padding added around hand region
    • Quality control and manual cleaning performed

    🔧 Image Characteristics

    • Resolution: 224×224 pixels
    • Color: RGB
    • Background: Various (natural backgrounds)
    • Lighting: Mixed (natural and artificial)
    • Hand Orientation: Multiple angles
    • Distance: Varied (close, medium, far)

    💡 Use Cases

    This dataset is suitable for:

    1. Sign Language Recognition Models

      • Real-time gesture recognition
      • Sign-to-speech applications
      • Accessibility tools
    2. Computer Vision Research

      • Hand gesture classification
      • Transfer learning experiments
      • Mobile ML applications
    3. Educational Projects

      • Learning deep learning basics
      • Building gesture recognition systems
      • Prototyping accessibility solutions

    🚀 Quick Start

    Load Data with TensorFlow:

    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    
    datagen = ImageDataGenerator(rescale=1./255)
    
    train_gen = datagen.flow_from_directory(
      'data_final/train',
      target_size=(224, 224),
      batch_size=32,
      class_mode='categorical'
    )
    
    val_gen = datagen.flow_from_directory(
      'data_final/val',
      target_size=(224, 224),
      batch_size=32,
      class_mode='categorical'
    )
    

    Load Data with PyTorch:

    from torchvision import datasets, transforms
    
    transform = transforms.Compose([
      transforms.Resize((224, 224)),
      transforms.ToTensor(),
      transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    
    train_dataset = datasets.ImageFolder('data_final/train', transform=transform)
    val_dataset = datasets.ImageFolder('data_final/val', transform=transform)
    

    📈 Baseline Performance

    Using transfer learning with MobileNetV2/EfficientNetB0: - Expected Accuracy: 90-97% - Training Time: 20-40 minutes (GPU) - Model Size: ~15 MB

    🎓 Recommended Augmentation

    For better generalization, use these augmentation techniques: python train_datagen = ImageDataGenerator( rescale=1./255, rotation_range=25, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15, zoom_range=0.2, horizontal_flip=True, brightness_range=[0.7, 1.3] )

    ⚠️ Limitations

    • Limited vocabulary: Only 5 signs (not comprehensive)
    • Single person: Images from one individual (limited diversity)
    • Static gestures: No motion-based signs
    • Controlled environment: May need adaptation for real-world scenarios
    • Hand dominance: Mix of left and right hands

    🔮 Future Improvements

    • Expand to 20+ common signs
    • Include multiple signers (diverse skin tones, ages, genders)
    • Add motion-based gestures (video data)
    • Regional sign language variations
    • More challenging backgrounds

    📜 Citation

    If you use this dataset in your research or project, please cite: @dataset{sign_language_5phrases_2025, title={Sign Language Recognition Dataset - 5 Essential Phrases}, author={[Your Name]}, year={2025}, publisher={Kaggle}, url={[Dataset URL]} }

    📄 License

    This dataset is released under [Choose one]: - CC BY 4.0 (Attribution) - Recommended - CC BY-SA 4.0 (Attribution-ShareAlike) - CC0 1.0 (Public Domain)

    🤝 Acknowledgments

    • MediaPipe by Google for hand tracking
    • TensorFlow/Keras for deep learning fr...
  19. Bone Fracture Detection: Computer Vision Project

    • kaggle.com
    zip
    Updated Feb 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hina Ismail (2024). Bone Fracture Detection: Computer Vision Project [Dataset]. https://www.kaggle.com/datasets/sonialikhan/bone-fracture-detection-computer-vision-project
    Explore at:
    zip(43644754 bytes)Available download formats
    Dataset updated
    Feb 25, 2024
    Authors
    Hina Ismail
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Building a bone fracture detection system using computer vision involves several steps. Here's a general outline to get you started:

    1. Dataset Collection: Gather a dataset of X-ray images with labeled fractures. You can explore datasets like MURA, NIH Chest X-ray Dataset, or create your own dataset with proper ethical considerations.

    2. Data Preprocessing: Clean and preprocess the X-ray images. This may involve resizing, normalization, and data augmentation to increase the diversity of your dataset.

    3. Model Selection: Choose a suitable pre-trained deep learning model for image classification. Models like ResNet, DenseNet, or custom architectures have shown good performance in medical image analysis tasks.

    4. Transfer Learning: Fine-tune the selected model on your X-ray dataset using transfer learning. This helps leverage the knowledge gained from pre-training on a large dataset.

    5. Model Training: Split your dataset into training, validation, and test sets. Train your model on the training set and validate its performance on the validation set to fine-tune hyperparameters.

    6. Evaluation Metrics: Choose appropriate evaluation metrics such as accuracy, precision, recall, F1-score, or area under the ROC curve (AUC) to assess the model's performance.

    7. Post-processing: Implement any necessary post-processing steps, such as non-maximum suppression, to refine the model's output and reduce false positives.

    8. Deployment: Deploy the trained model as part of a computer vision application. This could be a web-based application, mobile app, or integrated into a healthcare system.

    9. Continuous Improvement: Regularly update and improve your model based on new data or advancements in the field. Monitoring its performance in real-world scenarios is crucial.

    10. Ethical Considerations: Ensure that your project follows ethical guidelines and regulations for handling medical data. Implement privacy measures and obtain necessary approvals if you are using patient data.

    Tools and Libraries: Python, TensorFlow, PyTorch, Keras for deep learning implementation. OpenCV for image processing. Flask/Django for building a web application. Docker for containerization. GitHub for version control.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Grim, Luís Fernando Lopes; Sampaio Gradvohl, André Leon (2023). Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_10246576

Data from: Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation

Related Article
Explore at:
Dataset updated
Dec 4, 2023
Dataset provided by
Universidade Estadual de Campinas
Universidade Estadual de Campinas (UNICAMP)
Authors
Grim, Luís Fernando Lopes; Sampaio Gradvohl, André Leon
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Source codes and dataset of the research "Solar flare forecasting based on magnetogram sequences learning with MViT and data augmentation". Our work employed PyTorch, a framework for training Deep Learning models with GPU support and automatic back-propagation, to load the MViTv2 s models with Kinetics-400 weights. To simplify the code implementation, eliminating the need for an explicit loop to train and the automation of some hyperparameters, we use the PyTorch Lightning module. The inputs were batches of 10 samples with 16 sequenced images in 3-channel resized to 224 × 224 pixels and normalized from 0 to 1. Most of the papers in our literature survey split the original dataset chronologically. Some authors also apply k-fold cross-validation to emphasize the evaluation of the model stability. However, we adopt a hybrid split taking the first 50,000 to apply the 5-fold cross-validation between the training and validation sets (known data), with 40,000 samples for training and 10,000 for validation. Thus, we can evaluate performance and stability by analyzing the mean and standard deviation of all trained models in the test set, composed of the last 9,834 samples, preserving the chronological order (simulating unknown data). We develop three distinct models to evaluate the impact of oversampling magnetogram sequences through the dataset. The first model, Solar Flare MViT (SF MViT), has trained only with the original data from our base dataset without using oversampling. In the second model, Solar Flare MViT over Train (SF MViT oT), we only apply oversampling on training data, maintaining the original validation dataset. In the third model, Solar Flare MViT over Train and Validation (SF MViT oTV), we apply oversampling in both training and validation sets. We also trained a model oversampling the entire dataset. We called it the "SF_MViT_oTV Test" to verify how resampling or adopting a test set with unreal data may bias the results positively. GitHub version The .zip hosted here contains all files from the project, including the checkpoint and the output files generated by the codes. We have a clean version hosted on GitHub (https://github.com/lfgrim/SFF_MagSeq_MViTs), without the magnetogram_jpg folder (which can be downloaded directly on https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip) and the output and checkpoint files. Most code files hosted here also contain comments on the Portuguese language, which are being updated to English in the GitHub version. Folders Structure In the Root directory of the project, we have two folders:

magnetogram_jpg: holds the source images provided by Space Environment Artificial Intelligence Early Warning Innovation Workshop through the link https://tianchi-competition.oss-cn-hangzhou.aliyuncs.com/531804/dataset_ss2sff.zip. It comprises 73,810 samples of high-quality magnetograms captured by HMI/SDO from 2010 May 4 to 2019 January 26. The HMI instrument provides these data (stored in hmi.sharp_720s dataset), making new samples available every 12 minutes. However, the images from this dataset were collected every 96 minutes. Each image has an associated magnetogram comprising a ready-made snippet of one or most solar ARs. It is essential to notice that the magnetograms cropped by SHARP can contain one or more solar ARs classified by the National Oceanic and Atmospheric Administration (NOAA). Seq_Magnetogram: contains the references for source images with the corresponding labels in the next 24 h. and 48 h. in the respectively M24 and M48 sub-folders.

M24/M48: both present the following sub-folders structure:

Seqs16; SF_MViT; SF_MViT_oT; SF_MViT_oTV; SF_MViT_oTV_Test. There are also two files in root:

inst_packages.sh: install the packages and dependencies to run the models. download_MViTS.py: download the pre-trained MViTv2_S from PyTorch and store it in the cache. M24 and M48 folders hold reference text files (flare_Mclass...) linking the images in the magnetogram_jpg folders or the sequences (Seq16_flare_Mclass...) in the Seqs16 folders with their respective labels. They also hold "cria_seqs.py" which was responsible for creating the sequences and "test_pandas.py" to verify head info and check the number of samples categorized by the label of the text files. All the text files with the prefix "Seq16" and inside the Seqs16 folder were created by "criaseqs.py" code based on the correspondent "flare_Mclass" prefixed text files. Seqs16 folder holds reference text files, in which each file contains a sequence of images that was pointed to the magnetogram_jpg folders. All SF_MViT... folders hold the model training codes itself (SF_MViT...py) and the corresponding job submission (jobMViT...), temporary input (Seq16_flare...), output (saida_MVIT... and MViT_S...), error (err_MViT...) and checkpoint files (sample-FLARE...ckpt). Executed model training codes generate output, error, and checkpoint files. There is also a folder called "lightning_logs" that stores logs of trained models. Naming pattern for the files:

magnetogram_jpg: follows the format "hmi.sharp_720s...magnetogram.fits.jpg" and Seqs16: follows the format "hmi.sharp_720s...to.", where:

hmi: is the instrument that captured the image
sharp_720s: is the database source of SDO/HMI.
is the identification of SHARP region, and can contain one or more solar ARs classified by the (NOAA).
is the date-time the instrument captured the image in the format yyyymmdd_hhnnss_TAI (y:year, m:month, d:day, h:hours, n:minutes, s:seconds).
is the date-time when the sequence starts, and follow the same format of .

is the date-time when the sequence ends, and follow the same format of . Reference text files in M24 and M48 or inside SF_MViT... folders follows the format "flare_Mclass_.txt", where:

is Seq16 if refers to a sequence, or void if refers direct to images.

"24h" or "48h".

is "TrainVal" or "Test". The refers to the split of Train/Val.

void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. All SF_MViT...folders:

Model training codes: "SF_MViT_M+_", where:

void or "oT" (over Train) or "oTV" (over Train and Val) or "oTV_Test" (over Train, Val and Test);

"24h" or "48h";

"oneSplit" for a specific split or "allSplits" if run all splits.

void is default to run 1 GPU or "2gpu" to run into 2 gpus systems; Job submission files: "jobMViT_", where:

point the queue in Lovelace environment hosted on CENAPAD-SP (https://www.cenapad.unicamp.br/parque/jobsLovelace) Temporary inputs: "Seq16_flare_Mclass_.txt:

train or val;

void or "_over" after the extension (...txt_over): means temporary input reference that was over-sampled by a training model. Outputs: "saida_MViT_Adam_10-7", where:

k0 to k4, means the correlated split of the output, or void if the output is from all splits. Error files: "err_MViT_Adam_10-7", where:

k0 to k4, means the correlated split of the error log file, or void if the error file is from all splits. Checkpoint files: "sample-FLARE_MViT_S_10-7-epoch=-valid_loss=-Wloss_k=.ckpt", where:

epoch number of the checkpoint;

corresponding valid loss;

0 to 4.

Search
Clear search
Close search
Google apps
Main menu