100+ datasets found
  1. h

    2025-24679-hw1-text-dataset-mkarthik

    • huggingface.co
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madhav Karthikeyakannan (2025). 2025-24679-hw1-text-dataset-mkarthik [Dataset]. https://huggingface.co/datasets/madhavkarthi/2025-24679-hw1-text-dataset-mkarthik
    Explore at:
    Dataset updated
    Oct 2, 2025
    Authors
    Madhav Karthikeyakannan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Synthetic Text Dataset - Augmentation Example

      Dataset Summary
    

    This dataset demonstrates text data augmentation. Starting from 100 original short text samples, multiple augmentation techniques were applied to expand the dataset to 1,000 samples.

      Purpose
    

    The dataset was created as part of a course exercise to explore text augmentation and its effect on classification tasks.

      Composition
    

    Instances: 100 original + 1200 augmented = 1,300… See the full description on the dataset page: https://huggingface.co/datasets/madhavkarthi/2025-24679-hw1-text-dataset-mkarthik.

  2. augmentation_example

    • kaggle.com
    zip
    Updated Dec 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas S Visser (2021). augmentation_example [Dataset]. https://www.kaggle.com/thomassvisser/augmentation-example
    Explore at:
    zip(54513 bytes)Available download formats
    Dataset updated
    Dec 20, 2021
    Authors
    Thomas S Visser
    Description

    Dataset

    This dataset was created by Thomas S Visser

    Contents

  3. R

    Examples Dataset

    • universe.roboflow.com
    zip
    Updated Apr 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Augmentation (2024). Examples Dataset [Dataset]. https://universe.roboflow.com/data-augmentation-ngnku/examples-nan9p/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2024
    Dataset authored and provided by
    Data Augmentation
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Ex Bounding Boxes
    Description

    Examples

    ## Overview
    
    Examples is a dataset for object detection tasks - it contains Ex annotations for 902 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  4. SVD-Generated Video Dataset

    • kaggle.com
    zip
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afnan Algharbi (2025). SVD-Generated Video Dataset [Dataset]. https://www.kaggle.com/datasets/afnanalgarby/svd-generated-video-dataset
    Explore at:
    zip(102546508 bytes)Available download formats
    Dataset updated
    May 11, 2025
    Authors
    Afnan Algharbi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains synthetic video samples generated from a 10-class subset of Tiny ImageNet using Stable Video Diffusion (SVD). It is designed to evaluate the impact of generative temporal augmentation on image classification performance.

    Each training and validation video corresponds to a single image augmented into a sequence of frames.

    Videos are stored in .mp4 format and labeled via train.csv and val.csv.

    Sources:

    Tiny ImageNet: Stanford CS231n

    SVD model: Stable Video Diffusion

    License: Creative Commons Attribution 4.0 International (CC BY 4.0)

  5. G

    Data Augmentation Tools Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Augmentation Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-augmentation-tools-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Aug 23, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Augmentation Tools Market Outlook



    As per our latest research, the global Data Augmentation Tools market size reached USD 1.47 billion in 2024, reflecting the rapidly increasing adoption of artificial intelligence and machine learning across diverse sectors. The market is experiencing robust momentum, registering a CAGR of 25.3% from 2025 to 2033. By the end of 2033, the Data Augmentation Tools market is forecasted to reach a substantial value of USD 11.6 billion. This impressive growth is primarily driven by the escalating need for high-quality, diverse datasets to train advanced AI models, coupled with the proliferation of digital transformation initiatives across industries.




    The primary growth factor fueling the Data Augmentation Tools market is the exponential rise in AI and machine learning applications, which require vast amounts of labeled data for effective training. As organizations strive to develop more accurate and robust models, the demand for data augmentation solutions that can synthetically expand and diversify datasets has surged. This trend is particularly pronounced in sectors such as healthcare, automotive, and retail, where the quality and quantity of data directly impact the performance and reliability of AI systems. The market is further propelled by the increasing complexity of data types, including images, text, audio, and video, necessitating sophisticated augmentation tools capable of handling multimodal data.




    Another significant driver is the growing focus on reducing model bias and improving generalization capabilities. Data augmentation tools enable organizations to generate synthetic samples that account for various real-world scenarios, thereby minimizing overfitting and enhancing the robustness of AI models. This capability is critical in regulated industries like BFSI and healthcare, where the consequences of biased or inaccurate models can be severe. Furthermore, the rise of edge computing and IoT devices has expanded the scope of data augmentation, as organizations seek to deploy AI solutions in resource-constrained environments that require optimized and diverse training datasets.




    The proliferation of cloud-based solutions has also played a pivotal role in shaping the trajectory of the Data Augmentation Tools market. Cloud deployment offers scalability, flexibility, and cost-effectiveness, allowing organizations of all sizes to access advanced augmentation capabilities without significant infrastructure investments. Additionally, the integration of data augmentation tools with popular machine learning frameworks and platforms has streamlined adoption, enabling seamless workflow integration and accelerating time-to-market for AI-driven products and services. These factors collectively contribute to the sustained growth and dynamism of the global Data Augmentation Tools market.




    From a regional perspective, North America currently dominates the Data Augmentation Tools market, accounting for the largest revenue share in 2024, followed closely by Europe and Asia Pacific. The strong presence of leading technology companies, robust investment in AI research, and early adoption of digital transformation initiatives have established North America as a key hub for data augmentation innovation. Meanwhile, Asia Pacific is poised for the fastest growth over the forecast period, driven by the rapid expansion of the IT and telecommunications sector, burgeoning e-commerce industry, and increasing government initiatives to promote AI adoption. Europe also maintains a significant market presence, supported by stringent data privacy regulations and a strong focus on ethical AI development.





    Component Analysis



    The Component segment of the Data Augmentation Tools market is bifurcated into Software and Services, each playing a critical role in enabling organizations to leverage data augmentation for AI and machine learning initiatives. The software sub-segment comprises

  6. R

    Hard Hat Workers Object Detection Dataset - resize-416x416-reflectEdges

    • public.roboflow.com
    zip
    Updated Sep 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Northeastern University - China (2022). Hard Hat Workers Object Detection Dataset - resize-416x416-reflectEdges [Dataset]. https://public.roboflow.com/object-detection/hard-hat-workers/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 30, 2022
    Dataset authored and provided by
    Northeastern University - China
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Bounding Boxes of Workers
    Description

    Overview

    The Hard Hat dataset is an object detection dataset of workers in workplace settings that require a hard hat. Annotations also include examples of just "person" and "head," for when an individual may be present without a hard hart.

    The original dataset has a 75/25 train-test split.

    Example Image: https://i.imgur.com/7spoIJT.png" alt="Example Image">

    Use Cases

    One could use this dataset to, for example, build a classifier of workers that are abiding safety code within a workplace versus those that may not be. It is also a good general dataset for practice.

    Using this Dataset

    Use the fork or Download this Dataset button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.

    Dataset Versions:

    Image Preprocessing | Image Augmentation | Modify Classes * v1 (resize-416x416-reflect): generated with the original 75/25 train-test split | No augmentations * v2 (raw_75-25_trainTestSplit): generated with the original 75/25 train-test split | These are the raw, original images * v3 (v3): generated with the original 75/25 train-test split | Modify Classes used to drop person class | Preprocessing and Augmentation applied * v5 (raw_HeadHelmetClasses): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop person class * v8 (raw_HelmetClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop head and person classes * v9 (raw_PersonClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop head and helmet classes * v10 (raw_AllClasses): generated with a 70/20/10 train/valid/test split | These are the raw, original images * v11 (augmented3x-AllClasses-FastModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied | 3x image generation | Trained with Roboflow's Fast Model * v12 (augmented3x-HeadHelmetClasses-FastModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied, Modify Classes used to drop person class | 3x image generation | Trained with Roboflow's Fast Model * v13 (augmented3x-HeadHelmetClasses-AccurateModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied, Modify Classes used to drop person class | 3x image generation | Trained with Roboflow's Accurate Model * v14 (raw_HeadClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop person class, and remap/relabel helmet class to head

    Choosing Between Computer Vision Model Sizes | Roboflow Train

    About Roboflow

    Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

    Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

    Roboflow Workmark

  7. h

    kaggle-mbti-cleaned-augmented

    • huggingface.co
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shunian Chen (2023). kaggle-mbti-cleaned-augmented [Dataset]. https://huggingface.co/datasets/Shunian/kaggle-mbti-cleaned-augmented
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2023
    Authors
    Shunian Chen
    Description

    Dataset Card for "kaggle-mbti-cleaned-augmented"

    This dataset is built upon Shunian/kaggle-mbti-cleaned to address the sample imbalance problem. Thanks to the Parrot Paraphraser and NLP AUG, some of the skewness issue are addressed in the training data, make it grows from 328,660 samples to 478,389 samples in total. View GitHub for more information

  8. Data augmentation process example

    • figshare.com
    pdf
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonia Egli; Theo Lynn; Gary Sinclair; Pierangelo Rosati; Guto Leoni Santos; Vitor Gaboardi dos Santos (2025). Data augmentation process example [Dataset]. http://doi.org/10.6084/m9.figshare.29425658.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 27, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Antonia Egli; Theo Lynn; Gary Sinclair; Pierangelo Rosati; Guto Leoni Santos; Vitor Gaboardi dos Santos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An example and methodological note describing the process of generating paraphrases on the basis of a single tweet.

  9. R

    Car Highway Dataset

    • universe.roboflow.com
    zip
    Updated Sep 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sallar (2023). Car Highway Dataset [Dataset]. https://universe.roboflow.com/sallar/car-highway/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 13, 2023
    Dataset authored and provided by
    Sallar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Vehicles Bounding Boxes
    Description

    Car-Highway Data Annotation Project

    Introduction

    In this project, we aim to annotate car images captured on highways. The annotated data will be used to train machine learning models for various computer vision tasks, such as object detection and classification.

    Project Goals

    • Collect a diverse dataset of car images from highway scenes.
    • Annotate the dataset to identify and label cars within each image.
    • Organize and format the annotated data for machine learning model training.

    Tools and Technologies

    For this project, we will be using Roboflow, a powerful platform for data annotation and preprocessing. Roboflow simplifies the annotation process and provides tools for data augmentation and transformation.

    Annotation Process

    1. Upload the raw car images to the Roboflow platform.
    2. Use the annotation tools in Roboflow to draw bounding boxes around each car in the images.
    3. Label each bounding box with the corresponding class (e.g., car).
    4. Review and validate the annotations for accuracy.

    Data Augmentation

    Roboflow offers data augmentation capabilities, such as rotation, flipping, and resizing. These augmentations can help improve the model's robustness.

    Data Export

    Once the data is annotated and augmented, Roboflow allows us to export the dataset in various formats suitable for training machine learning models, such as YOLO, COCO, or TensorFlow Record.

    Milestones

    1. Data Collection and Preprocessing
    2. Annotation of Car Images
    3. Data Augmentation
    4. Data Export
    5. Model Training

    Conclusion

    By completing this project, we will have a well-annotated dataset ready for training machine learning models. This dataset can be used for a wide range of applications in computer vision, including car detection and tracking on highways.

  10. f

    The samples of a single category in training set before and after...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei, Biyun; Shen, Xiaole; Wang, Hongfeng; Cao, Jinzhou (2023). The samples of a single category in training set before and after augmentation. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000942207
    Explore at:
    Dataset updated
    Jun 7, 2023
    Authors
    Wei, Biyun; Shen, Xiaole; Wang, Hongfeng; Cao, Jinzhou
    Description

    The samples of a single category in training set before and after augmentation.

  11. Examples of SA selection rules (negative results).

    • plos.figshare.com
    xls
    Updated Sep 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda (2024). Examples of SA selection rules (negative results). [Dataset]. http://doi.org/10.1371/journal.pone.0310707.t016
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Examples of SA selection rules (negative results).

  12. h

    hw1-24679-tabular-dataset

    • huggingface.co
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mary Zhang (2025). hw1-24679-tabular-dataset [Dataset]. https://huggingface.co/datasets/maryzhang/hw1-24679-tabular-dataset
    Explore at:
    Dataset updated
    Sep 20, 2025
    Authors
    Mary Zhang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Shoe Size Measurements Tabular Dataset

      Dataset Summary
    

    Purpose: This dataset was created for tabular data analysis and prediction tasks involving shoe measurements, developed as part of CMU 24-679 coursework to explore tabular data augmentation techniques. Quick Stats:

    338 total samples (30 original + 308 augmented) 3 numerical features + 3 categorical features High correlation between size measurements (>0.97) ~10x augmentation factor

    Contact: maryzhang@cmu.edu… See the full description on the dataset page: https://huggingface.co/datasets/maryzhang/hw1-24679-tabular-dataset.

  13. Multilingual Handwritten Digits Datasets

    • kaggle.com
    zip
    Updated Feb 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayush Singh (2023). Multilingual Handwritten Digits Datasets [Dataset]. https://www.kaggle.com/datasets/ayu8sh/hdr-datasets/code
    Explore at:
    zip(427659986 bytes)Available download formats
    Dataset updated
    Feb 22, 2023
    Authors
    Ayush Singh
    Description

    Collection of handwritten digits from various languages like English, Hindi, Telugu and Arabic. Each of these 4 datasets contains 4 files: x_train.npy, x_test.npy, y_train.npy and y_test.npy

    Training data has 80,000 samples and testing data has 20,000 samples. Total 100,000 samples (10,000 for each digit).

    English dataset was obtained from MNIST (70k samples). Hindi and Telugu were obtained from CMATERdb (3k samples each) Arabic was obtained from MADBase (70k samples)

    Then data augmentation was applied to each dataset to add new samples to make a total of 100,000 samples of each.

    Then finally, these 4 were combined to form a new dataset THEA (Telugu, Hindi, English, Arabic) which has 400,000 samples out of which, 320,000 samples are of training set and rest 80,000 samples are of testing set.

  14. Data from: Duck Hunt

    • kaggle.com
    zip
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugo Zanini (2025). Duck Hunt [Dataset]. https://www.kaggle.com/datasets/hugozanini1/duck-hunt
    Explore at:
    zip(7379197 bytes)Available download formats
    Dataset updated
    Jul 26, 2025
    Authors
    Hugo Zanini
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Duck Hunt Object Detection Dataset

    This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt" (1984), specifically prepared for YOLO (You Only Look Once) object detection training. The dataset includes sprites of the iconic hunting dog and ducks in various states, augmented to provide a balanced and comprehensive training set for computer vision models.

    Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes

    🎯 Dataset Statistics

    MetricValue
    Total Images1,004
    Dataset Size12 MB
    Image FormatPNG
    Annotation FormatYOLO (.txt)
    Classes4
    Train/Val Split711/260 (73%/27%)

    Class Distribution

    Class IDClass NameCountDescription
    0dog252The hunting dog in various poses (jumping, laughing, sniffing, etc.)
    1duck_dead256Dead ducks (both black and red variants)
    2duck_shot248Ducks in the moment of being shot
    3duck_flying248Flying ducks in all directions (left, right, diagonal)

    📁 Dataset Structure

    yolo_dataset_augmented/
    ├── images/
    │  ├── train/      # 711 training images
    │  └── val/       # 260 validation images
    ├── labels/
    │  ├── train/      # 711 YOLO annotation files
    │  └── val/       # 260 YOLO annotation files
    ├── classes.txt     # Class names mapping
    ├── dataset.yaml     # YOLO configuration file
    └── augmented_dataset_stats.json # Detailed statistics
    

    🔧 Data Augmentation Details

    The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:

    Augmentation Techniques Applied:

    • Geometric Transformations: Rotation (±15°), horizontal/vertical flipping, scaling (0.8-1.2x), translation
    • Color Adjustments: Brightness (0.7-1.3x), contrast (0.8-1.2x), saturation (0.8-1.2x)
    • Quality Variations: Gaussian noise, slight blur for robustness
    • Advanced Techniques: Mosaic augmentation (YOLO-style 4-image combination)

    Augmentation Parameters:

    {
      'rotation_range': (-15, 15),    # Small rotations for game sprites
      'brightness_range': (0.7, 1.3),  # Brightness variations
      'contrast_range': (0.8, 1.2),   # Contrast adjustments
      'saturation_range': (0.8, 1.2),  # Color saturation
      'noise_intensity': 0.02,      # Gaussian noise
      'horizontal_flip_prob': 0.5,    # 50% chance horizontal flip
      'scaling_range': (0.8, 1.2),    # Scale variations
    }
    

    🚀 Usage Examples

    Loading with YOLOv8 (Ultralytics)

    from ultralytics import YOLO
    
    # Load and train
    model = YOLO('yolov8n.pt') # Load pretrained model
    results = model.train(data='dataset.yaml', epochs=100, imgsz=640)
    
    # Validate
    metrics = model.val()
    
    # Predict
    results = model('path/to/test/image.png')
    

    Loading with PyTorch

    import torch
    from torch.utils.data import Dataset, DataLoader
    from PIL import Image
    import os
    
    class DuckHuntDataset(Dataset):
      def _init_(self, images_dir, labels_dir, transform=None):
        self.images_dir = images_dir
        self.labels_dir = labels_dir
        self.transform = transform
        self.images = os.listdir(images_dir)
      
      def _len_(self):
        return len(self.images)
      
      def _getitem_(self, idx):
        img_path = os.path.join(self.images_dir, self.images[idx])
        label_path = os.path.join(self.labels_dir, 
                     self.images[idx].replace('.png', '.txt'))
        
        image = Image.open(img_path)
        # Load YOLO annotations
        with open(label_path, 'r') as f:
          labels = f.readlines()
        
        if self.transform:
          image = self.transform(image)
          
        return image, labels
    
    # Usage
    dataset = DuckHuntDataset('images/train', 'labels/train')
    dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
    

    YOLO Annotation Format

    Each .txt file contains one line per object: class_id center_x center_y width height

    Example annotation: 0 0.492 0.403 0.212 0.315 Where values are normalized (0-1) relative to image dimensions.

    📊 Technical Specifications

    • Image Dimensions: Variable (original sprite sizes preserved)
    • Color Channels: RGB (3 channels)
    • Annotation Precision: Float32 (normalized coordinates)
    • File Naming: Descriptive names indicating class and augmentation type
    • Quality: High-resolution pixel art sprites

    🎮 Dataset Context

    This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured:

    • The Dog: Your hunting companion who retrieves ducks and ...
  15. Data from: S1 Dataset -

    • plos.figshare.com
    xlsx
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tianyi Deng; Chengqi Xue; Gengpei Zhang (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0305038.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 10, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tianyi Deng; Chengqi Xue; Gengpei Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The meta-learning method proposed in this paper addresses the issue of small-sample regression in the application of engineering data analysis, which is a highly promising direction for research. By integrating traditional regression models with optimization-based data augmentation from meta-learning, the proposed deep neural network demonstrates excellent performance in optimizing glass fiber reinforced plastic (GFRP) for wrapping concrete short columns. When compared with traditional regression models, such as Support Vector Regression (SVR), Gaussian Process Regression (GPR), and Radial Basis Function Neural Networks (RBFNN), the meta-learning method proposed here performs better in modeling small data samples. The success of this approach illustrates the potential of deep learning in dealing with limited amounts of data, offering new opportunities in the field of material data analysis.

  16. R

    Data from: Anotado Dataset

    • universe.roboflow.com
    zip
    Updated Oct 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    new-workspace-lzcyx (2021). Anotado Dataset [Dataset]. https://universe.roboflow.com/new-workspace-lzcyx/anotado/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 9, 2021
    Dataset authored and provided by
    new-workspace-lzcyx
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Anotado Bounding Boxes
    Description

    https://www.youtube.com/watch?v=4MA_6oZQz7s&ab_channel=tektronix475

    Spotted caps, are the normal OK class (fully closed). Clean caps, are the bad or anomally target class (partially closed). One double prediction at 3:59. 100x100 classification accuracy, out of 200 samples. Inference over unseen test dataset. 150 epochs training. 700 samples training dataset, no data augmentation.

    PREPROCESSING Auto-Orient: Applied Resize: Stretch to 416x416 Grayscale: Applied AUGMENTATIONS No augmentations were applied.

    Anomaly detection with: Roboflow, tensorflow, google colab, Ultralytics, yolo v5, cvat,

  17. h

    rag-dataset-12000

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neural Bridge AI, rag-dataset-12000 [Dataset]. https://huggingface.co/datasets/neural-bridge/rag-dataset-12000
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Neural Bridge AI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Retrieval-Augmented Generation (RAG) Dataset 12000

    Retrieval-Augmented Generation (RAG) Dataset 12000 is an English dataset designed for RAG-optimized models, built by Neural Bridge AI, and released under Apache license 2.0.

      Dataset Description
    
    
    
    
    
      Dataset Summary
    

    Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by allowing them to consult an external authoritative knowledge base before generating responses. This approach significantly… See the full description on the dataset page: https://huggingface.co/datasets/neural-bridge/rag-dataset-12000.

  18. Examples of EA selection rules (positive results).

    • plos.figshare.com
    xls
    Updated Sep 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda (2024). Examples of EA selection rules (positive results). [Dataset]. http://doi.org/10.1371/journal.pone.0310707.t013
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Examples of EA selection rules (positive results).

  19. Augmented Alzheimer MRI Dataset

    • kaggle.com
    Updated Sep 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    uraninjo (2022). Augmented Alzheimer MRI Dataset [Dataset]. https://www.kaggle.com/datasets/uraninjo/augmented-alzheimer-mri-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    uraninjo
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    The data consists of MRI images. The data has four classes of images both in training as well as a testing set:

    1. Mild Demented
    2. Moderate Demented
    3. Non Demented
    4. Very Mild Demented

    The data contains two folders. One of them is augmented ones and the other one is originals. Originals could be used for validation or test dataset...

    Data is augmented from an existing dataset. Original images can be seen in Data Explorer. https://www.kaggle.com/datasets/tourist55/alzheimers-dataset-4-class-of-images

    My purpose of the publish this dataset is to the usage of augmented images as well as originals. The importance of augmentation is can be a little underrated.

  20. Datasets GO ID/attribute p-value q-value.

    • figshare.com
    xls
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sifan Feng; Zhenyou Wang; Yinghua Jin; Shengbin Xu (2024). Datasets GO ID/attribute p-value q-value. [Dataset]. http://doi.org/10.1371/journal.pone.0305857.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sifan Feng; Zhenyou Wang; Yinghua Jin; Shengbin Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Traditional differential expression genes (DEGs) identification models have limitations in small sample size datasets because they require meeting distribution assumptions, otherwise resulting high false positive/negative rates due to sample variation. In contrast, tabular data model based on deep learning (DL) frameworks do not need to consider the data distribution types and sample variation. However, applying DL to RNA-Seq data is still a challenge due to the lack of proper labeling and the small sample size compared to the number of genes. Data augmentation (DA) extracts data features using different methods and procedures, which can significantly increase complementary pseudo-values from limited data without significant additional cost. Based on this, we combine DA and DL framework-based tabular data model, propose a model TabDEG, to predict DEGs and their up-regulation/down-regulation directions from gene expression data obtained from the Cancer Genome Atlas database. Compared to five counterpart methods, TabDEG has high sensitivity and low misclassification rates. Experiment shows that TabDEG is robust and effective in enhancing data features to facilitate classification of high-dimensional small sample size datasets and validates that TabDEG-predicted DEGs are mapped to important gene ontology terms and pathways associated with cancer.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Madhav Karthikeyakannan (2025). 2025-24679-hw1-text-dataset-mkarthik [Dataset]. https://huggingface.co/datasets/madhavkarthi/2025-24679-hw1-text-dataset-mkarthik

2025-24679-hw1-text-dataset-mkarthik

Synthetic Text Dataset - Augmentation Example

madhavkarthi/2025-24679-hw1-text-dataset-mkarthik

Explore at:
Dataset updated
Oct 2, 2025
Authors
Madhav Karthikeyakannan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for Synthetic Text Dataset - Augmentation Example

  Dataset Summary

This dataset demonstrates text data augmentation. Starting from 100 original short text samples, multiple augmentation techniques were applied to expand the dataset to 1,000 samples.

  Purpose

The dataset was created as part of a course exercise to explore text augmentation and its effect on classification tasks.

  Composition

Instances: 100 original + 1200 augmented = 1,300… See the full description on the dataset page: https://huggingface.co/datasets/madhavkarthi/2025-24679-hw1-text-dataset-mkarthik.

Search
Clear search
Close search
Google apps
Main menu