53 datasets found
  1. audiomentations

    • kaggle.com
    zip
    Updated Jan 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HyeongChan Kim (2023). audiomentations [Dataset]. https://www.kaggle.com/kozistr/audiomentations
    Explore at:
    zip(35911 bytes)Available download formats
    Dataset updated
    Jan 10, 2023
    Authors
    HyeongChan Kim
    Description

    Audiomentations

    A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

    official : https://github.com/iver56/audiomentations

  2. A literature review of data augmentation techniques for audio...

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jane Saldanha; Shaunak Chakraborty; Shruti Patil; Ketan Kotecha; Satish Kumar; Anand Nayyar (2023). A literature review of data augmentation techniques for audio classification. [Dataset]. http://doi.org/10.1371/journal.pone.0266467.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jane Saldanha; Shaunak Chakraborty; Shruti Patil; Ketan Kotecha; Satish Kumar; Anand Nayyar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A literature review of data augmentation techniques for audio classification.

  3. G

    Data Augmentation Tools Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Augmentation Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-augmentation-tools-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Aug 23, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Augmentation Tools Market Outlook



    As per our latest research, the global Data Augmentation Tools market size reached USD 1.47 billion in 2024, reflecting the rapidly increasing adoption of artificial intelligence and machine learning across diverse sectors. The market is experiencing robust momentum, registering a CAGR of 25.3% from 2025 to 2033. By the end of 2033, the Data Augmentation Tools market is forecasted to reach a substantial value of USD 11.6 billion. This impressive growth is primarily driven by the escalating need for high-quality, diverse datasets to train advanced AI models, coupled with the proliferation of digital transformation initiatives across industries.




    The primary growth factor fueling the Data Augmentation Tools market is the exponential rise in AI and machine learning applications, which require vast amounts of labeled data for effective training. As organizations strive to develop more accurate and robust models, the demand for data augmentation solutions that can synthetically expand and diversify datasets has surged. This trend is particularly pronounced in sectors such as healthcare, automotive, and retail, where the quality and quantity of data directly impact the performance and reliability of AI systems. The market is further propelled by the increasing complexity of data types, including images, text, audio, and video, necessitating sophisticated augmentation tools capable of handling multimodal data.




    Another significant driver is the growing focus on reducing model bias and improving generalization capabilities. Data augmentation tools enable organizations to generate synthetic samples that account for various real-world scenarios, thereby minimizing overfitting and enhancing the robustness of AI models. This capability is critical in regulated industries like BFSI and healthcare, where the consequences of biased or inaccurate models can be severe. Furthermore, the rise of edge computing and IoT devices has expanded the scope of data augmentation, as organizations seek to deploy AI solutions in resource-constrained environments that require optimized and diverse training datasets.




    The proliferation of cloud-based solutions has also played a pivotal role in shaping the trajectory of the Data Augmentation Tools market. Cloud deployment offers scalability, flexibility, and cost-effectiveness, allowing organizations of all sizes to access advanced augmentation capabilities without significant infrastructure investments. Additionally, the integration of data augmentation tools with popular machine learning frameworks and platforms has streamlined adoption, enabling seamless workflow integration and accelerating time-to-market for AI-driven products and services. These factors collectively contribute to the sustained growth and dynamism of the global Data Augmentation Tools market.




    From a regional perspective, North America currently dominates the Data Augmentation Tools market, accounting for the largest revenue share in 2024, followed closely by Europe and Asia Pacific. The strong presence of leading technology companies, robust investment in AI research, and early adoption of digital transformation initiatives have established North America as a key hub for data augmentation innovation. Meanwhile, Asia Pacific is poised for the fastest growth over the forecast period, driven by the rapid expansion of the IT and telecommunications sector, burgeoning e-commerce industry, and increasing government initiatives to promote AI adoption. Europe also maintains a significant market presence, supported by stringent data privacy regulations and a strong focus on ethical AI development.





    Component Analysis



    The Component segment of the Data Augmentation Tools market is bifurcated into Software and Services, each playing a critical role in enabling organizations to leverage data augmentation for AI and machine learning initiatives. The software sub-segment comprises

  4. D

    Data Augmentation Tools Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Data Augmentation Tools Market Research Report 2033 [Dataset]. https://dataintelo.com/report/data-augmentation-tools-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Augmentation Tools Market Outlook



    According to our latest research, the global Data Augmentation Tools market size reached USD 1.62 billion in 2024, with a robust year-on-year growth trajectory. The market is poised for accelerated expansion, projected to achieve a CAGR of 26.4% from 2025 to 2033. By the end of 2033, the market is forecasted to reach approximately USD 12.34 billion. This dynamic growth is primarily driven by the rising demand for artificial intelligence (AI) and machine learning (ML) applications across diverse industry verticals, which necessitate vast quantities of high-quality training data. The proliferation of data-centric AI models and the increasing complexity of real-world datasets are compelling enterprises to invest in advanced data augmentation tools to enhance data diversity and model robustness, as per the latest research insights.




    One of the principal growth factors fueling the Data Augmentation Tools market is the intensifying adoption of AI-driven solutions across industries such as healthcare, automotive, retail, and finance. Organizations are increasingly leveraging data augmentation to overcome the challenges posed by limited or imbalanced datasets, which are often a bottleneck in developing accurate and reliable AI models. By synthetically expanding training datasets through augmentation techniques, enterprises can significantly improve the generalization capabilities of their models, leading to enhanced performance and reduced risk of overfitting. Furthermore, the surge in computer vision, natural language processing, and speech recognition applications is creating a fertile environment for the adoption of specialized augmentation tools tailored to image, text, and audio data.




    Another significant factor contributing to market growth is the rapid evolution of augmentation technologies themselves. Innovations such as Generative Adversarial Networks (GANs), automated data labeling, and domain-specific augmentation pipelines are making it easier for organizations to deploy and scale data augmentation strategies. These advancements are not only reducing the manual effort and expertise required but also enabling the generation of highly realistic synthetic data that closely mimics real-world scenarios. As a result, businesses across sectors are able to accelerate their AI/ML development cycles, reduce costs associated with data collection and labeling, and maintain compliance with stringent data privacy regulations by minimizing the need to use sensitive real-world data.




    The growing integration of data augmentation tools within cloud-based AI development platforms is also acting as a major catalyst for market expansion. Cloud deployment offers unparalleled scalability, accessibility, and collaboration capabilities, allowing organizations of all sizes to harness the power of data augmentation without significant upfront infrastructure investments. This democratization of advanced data engineering tools is especially beneficial for small and medium enterprises (SMEs) and academic research institutes, which often face resource constraints. The proliferation of cloud-native augmentation solutions is further supported by strategic partnerships between technology vendors and cloud service providers, driving broader market penetration and innovation.




    From a regional perspective, North America continues to dominate the Data Augmentation Tools market, driven by the presence of leading AI technology companies, a mature digital infrastructure, and substantial investments in research and development. However, the Asia Pacific region is emerging as the fastest-growing market, fueled by rapid digital transformation initiatives, a burgeoning startup ecosystem, and increasing government support for AI innovation. Europe also holds a significant share, underpinned by strong regulatory frameworks and a focus on ethical AI development. Meanwhile, Latin America and the Middle East & Africa are witnessing steady adoption, particularly in sectors such as BFSI and healthcare, where data-driven insights are becoming increasingly critical.



    Component Analysis



    The Data Augmentation Tools market by component is bifurcated into Software and Services. The software segment currently accounts for the largest share of the market, owing to the widespread deployment of standalone and integrated augmentation solutions across enterprises and research institutions. These software plat

  5. Audio Data Preparation and Augmentation

    • kaggle.com
    zip
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subho117 (2024). Audio Data Preparation and Augmentation [Dataset]. https://www.kaggle.com/datasets/subho117/audio-data-preparation-and-augmentation
    Explore at:
    zip(109957 bytes)Available download formats
    Dataset updated
    Aug 28, 2024
    Authors
    Subho117
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Subho117

    Released under MIT

    Contents

  6. Z

    Audio Piano Triads Dataset

    • data.niaid.nih.gov
    Updated Aug 18, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agustín Macaya (2021). Audio Piano Triads Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4740876
    Explore at:
    Dataset updated
    Aug 18, 2021
    Dataset provided by
    Pontificia Universidad Católica de Chile
    Authors
    Agustín Macaya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Created by: Agustín Macaya Valladares Date: May 5th, 2021

    • Dataset contains 43.200 examples of piano triads in .wav format.

    Details: - Sample rate: 16000 Hz. - Data type: 16-bit PCM (int16). - File size: Each example has a file size of 128 kB (5.53 GB for complete dataset). - Duration: 4 seconds. - Sound: Piano (digital). - Chords were played by a human on a velocity-sensitive piano keyboard. - 3 seconds pressed, 1 second released.

    • 3 octaves (2,3,4).
    • 12 base notes per octave: Cn, Df, Dn, Ef, En, Fn, Gf, Gn, Af, An, Bf, Bn. (n is natural, f is flat).
    • 4 triad types per note: major (j), minor (n), diminished (d), augmented (a). No inversions.
    • 3 volumes per triad: forte (f), metsoforte (m), piano (p).

    • 10 original examples per combination of octave, base note, triad type, and volume. (103*12*4*3 = 4.320 examples).

    • x10 data augmentation for each example (4.320 * 10 = 43.200 total examples).

    • Data augmentation through random temporal and amplitude shifts.

    • Metadata is in the name of the chord. For example: "piano_3_Af_d_m_45.wav" is a piano chord, (3) 3rd octave, (Af) A flat base note, (d) diminished, (m) metsoforte, 45th example.

    Note: - The audios are in 16-bit PCM (int16) data type to reduce the file size. This means that the dynamic range of values in the array is -32768 to 32768, integers. To normalize the audios in the range -1 to 1 just divide by 32768.

  7. AugLy Facebook Research

    • kaggle.com
    zip
    Updated Dec 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathurin Aché (2021). AugLy Facebook Research [Dataset]. https://www.kaggle.com/mathurinache/augly-facebook-research
    Explore at:
    zip(87198042 bytes)Available download formats
    Dataset updated
    Dec 6, 2021
    Authors
    Mathurin Aché
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    AugLy: A new data augmentation library to help build more robust AI models

    Content

    AugLy was developed by Facebook researchers and engineers across the globe in offices based at our Seattle and Paris offices. It has four sub-libraries, each corresponding to a different modality. Each library follows the same interface: We provide transforms in both function-based and class-based formats, and we provide intensity functions that help you understand how intense a transformation is (based on the given parameters). AugLy can also generate useful metadata to help you understand how your data was transformed.

    Acknowledgements

    Code comes from https://github.com/facebookresearch/AugLy" alt="Facebook research Github">

    Inspiration

    See Notebooks to see in practice Augly

  8. Firearms Audio Dataset – 58 Gun Types

    • kaggle.com
    zip
    Updated Aug 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZACKY_ZAC (2025). Firearms Audio Dataset – 58 Gun Types [Dataset]. https://www.kaggle.com/zackyzac/firearms-audio-dataset-58-gun-types
    Explore at:
    zip(55975198 bytes)Available download formats
    Dataset updated
    Aug 30, 2025
    Authors
    ZACKY_ZAC
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🔫 Firearms Audio Classification Dataset – 58 Gun Types (Standardized & Augmented)

    This dataset is a comprehensive collection of firearm audio recordings curated and standardized from multiple open-source datasets, including:

    📌 Dataset Highlights

    • 58 firearm classes (e.g., AK-47, M16, Glock, Desert Eagle, MP5, M249, etc.).
    • Each class contains 300 standardized audio files.
    • Audio standardized to:
      • 0.5 seconds per clip
      • 44.1 kHz sampling rate
      • .wav format (mono)
    • Data augmentation applied (pitch shifting, time stretching, noise injection) to balance classes.
    • Focused on peak-segment extraction (0.5s window around the loudest gunshot).

    🎯 Purpose

    The main goal is to provide a clean and balanced dataset for firearm audio classification and clustering tasks.

    Potential applications include:
    - 🔊 Gunshot sound recognition
    - 🎯 Firearm type classification
    - 📊 Acoustic clustering of firearms
    - 🛡️ Forensic audio analysis
    - 🤖 Deep learning experiments (CNNs, RNNs, Transformers on audio features)

    ⚡ Usage Notes

    • Recommended preprocessing: MFCC, Mel-Spectrograms, or Wave2Vec embeddings.
    • Each audio file contains a single gunshot or burst segment, normalized in duration.
    • Designed for both supervised classification and unsupervised clustering.

    📊 Dataset Size

    • 58 classes × 300 files each = 17,400 audio files
    • Total duration ≈ 2.5 hours of gunshot audio

    🚀 Example Workflow

    1. Load an audio file with librosa.
    2. Extract MFCCs or Mel-spectrograms.
    3. Train a classifier or cluster embeddings.
    import librosa, librosa.display
    import matplotlib.pyplot as plt
    
    # Load audio
    y, sr = librosa.load("Firearms_Dataset/ak-47/ak-47_001.wav", sr=44100)
    
    # Extract MFCCs
    mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
    
    # Plot
    plt.figure(figsize=(10, 4))
    librosa.display.specshow(mfccs, x_axis='time')
    plt.colorbar()
    plt.title("MFCC - AK-47 Gunshot")
    plt.tight_layout()
    plt.show()
    
  9. Deepfake Audio Mel-spectrograms

    • kaggle.com
    zip
    Updated Nov 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    codeofwhite (2024). Deepfake Audio Mel-spectrograms [Dataset]. https://www.kaggle.com/datasets/hejuncheung/mel-plot-processed
    Explore at:
    zip(440039753 bytes)Available download formats
    Dataset updated
    Nov 12, 2024
    Authors
    codeofwhite
    Description

    Folder Structure

    1. output_hifigen_augmentated

    This subdirectory contains Mel-spectrograms generated using Hifi-GAN augmented data. The data is split into training, validation, and test subsets:

    mel_spectrograms ——test

    ————fake

    ————real

    ——train

    ————fake

    ————real

    ——val

    ————fake

    ————real

    2. output_jp_augmentated

    This subdirectory includes Mel-spectrograms with a different augmentation strategy (likely JP augmentation). It follows the same structure as above:

    mel_spectrograms ——test ——train ——val

    Highlights

    Custom Mel-Spectrograms: All audio samples are represented in Mel-spectrogram format, facilitating their use in deep learning tasks like binary classification of fake vs. real audio.

    Augmented Data: Two augmentation techniques are applied, offering diverse data for robust model training and evaluation.

    Balanced Splits: Each dataset partition (train, val, test) includes separate folders for fake and real samples to simplify dataset handling.

  10. Z

    WaivOps WRLD-SMB: Open Audio Resources for Machine Learning in Music

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Oct 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patchbanks; WaivOps (2024). WaivOps WRLD-SMB: Open Audio Resources for Machine Learning in Music [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_13921289
    Explore at:
    Dataset updated
    Oct 12, 2024
    Authors
    Patchbanks; WaivOps
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    WRLD-SMB Dataset

    WRLD-SMB is an open audio dataset featuring a collection of synthetic drum recordings in the style of Brazilian samba music. It includes 1,100 audio loops recorded in uncompressed stereo WAV format, along with paired JSON files intended for the supervised training of generative AI audio models.

    Overview

    This dataset was developed using multi-velocity audio samples and a paired MIDI dataset. The intended use of this dataset is to train or fine-tune AI models in learning high-performance drum notations, aiming to replicate the live sound of a small drum ensemble. To facilitate augmentation and supervised training with labeled audio data, a dropout technique was employed on the rendered audio files to generate variational mixes of the drum tracks.

    The primary purpose of this dataset is to provide accessible content for machine learning applications in music and audio. Potential use cases include generative music, feature extraction, tempo detection, audio classification, rhythm analysis, drum synthesis, music information retrieval (MIR), sound design and signal processing.

    Specifications

    1,100 audio loops (approximately 5.5 hours)

    16-bit 44.1kHz WAV format

    Tempo range: 90–120 BPM

    Paired label data (WAV + JSON)

    Variational drum patterns

    Subgenre styles (Traditional and modern samba, bossa nova, fusion)

    A JSON file is provided for referencing and converting MIDI note numbers to text labels. You can update the text labels to suit your preferences.

    License

    This dataset was compiled by WaivOps, a crowdsourced music project managed by the sound label company Patchbanks. All recordings have been compiled by verified sources for copyright clearance.

    The WRLD-SMB dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).

    Additional Info

    For audio examples or more information about this dataset, please refer to the GitHub repository.

  11. Audio Cartography

    • openneuro.org
    Updated Aug 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megen Brittell (2020). Audio Cartography [Dataset]. http://doi.org/10.18112/openneuro.ds001415.v1.0.0
    Explore at:
    Dataset updated
    Aug 8, 2020
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Megen Brittell
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Audio Cartography project investigated the influence of temporal arrangement on the interpretation of information from a simple spatial data set. I designed and implemented three auditory map types (audio types), and evaluated differences in the responses to those audio types.

    The three audio types represented simplified raster data (eight rows x eight columns). First, a "sequential" representation read values one at a time from each cell of the raster, following an English reading order, and encoded the data value as loudness of a single fixed-duration and fixed-frequency note. Second, an augmented-sequential ("augmented") representation used the same reading order, but encoded the data value as volume, the row as frequency, and the column as the rate of the notes play (constant total cell duration). Third, a "concurrent" representation used the same encoding as the augmented type, but allowed the notes to overlap in time.

    Participants completed a training session in a computer-lab setting, where they were introduced to the audio types and practiced making a comparison between data values at two locations within the display based on what they heard. The training sessions, including associated paperwork, lasted up to one hour. In a second study session, participants listened to the auditory maps and made decisions about the data they represented while the fMRI scanner recorded digital brain images.

    The task consisted of listening to an auditory representation of geospatial data ("map"), and then making a decision about the relative values of data at two specified locations. After listening to the map ("listen"), a graphic depicted two locations within a square (white background). Each location was marked with a small square (size: 2x2 grid cells); one square had a black solid outline and transparent black fill, the other had a red dashed outline and transparent red fill. The decision ("response") was made under one of two conditions. Under the active listening condition ("active") the map was played a second time while participants made their decision; in the memory condition ("memory"), a decision was made in relative quiet (general scanner noises and intermittent acquisition noise persisted). During the initial map listening, participants were aware of neither the locations of the response options within the map extent, nor the response conditions under which they would make their decision. Participants could respond any time after the graphic was displayed; once a response was entered, the playback stopped (active response condition only) and the presentation continued to the next trial.

    Data was collected in accordance with a protocol approved by the Institutional Review Board at the University of Oregon.

    • Additional details about the specific maps used in this are available through University of Oregon's ScholarsBank (DOI 10.7264/3b49-tr85).

    • Details of the design process and evaluation are provided in the associated dissertation, which is available from ProQuest and University of Oregon's ScholarsBank.

    • Scripts that created the experimental stimuli and automated processing are available through University of Oregon's ScholarsBank (DOI 10.7264/3b49-tr85).

    Preparation of fMRI Data

    Conversion of the DICOM files produced by the scanner to NiFTi format was performed by MRIConvert (LCNI). Orientation to standard axes was performed and recorded in the NiFTi header (FMRIB, fslreorient2std). The excess slices in the anatomical images that represented tissue in the next were trimmed (FMRIB, robustfov). Participant identity was protected through automated defacing of the anatomical data (FreeSurfer, mri_deface), with additional post-processing to ensure that no brain voxels were erroneously removed from the image (FMRIB, BET; brain mask dilated with three iterations "fslmaths -dilM").

    Preparation of Metadata

    The dcm2niix tool (Rorden) was used to create draft JSON sidecar files with metadata extracted from the DICOM headers. The draft sidecar file were revised to augment the JSON elements with additional tags (e.g., "Orientation" and "TaskDescription") and to make a more human-friendly version of tag contents (e.g., "InstitutionAddress" and "DepartmentName"). The device serial number was constant throughout the data collection (i.e., all data collection was conducted on the same scanner), and the respective metadata values were replaced with an anonymous identifier: "Scanner1".

    Preparation of Behavioral Data

    The stimuli consisted of eighteen auditory maps. Spatial data were generated with the rgeos, sp, and spatstat libraries in R; auditory maps were rendered with the Pyo (Belanger) library for Python and prepared for presentation in Audacity. Stimuli were presented using PsychoPy (Peirce, 2007), which produced log files from which event details were extracted. The log files included timestamped entries for stimulus timing and trigger pulses from the scanner.

    • Log files are available in "sourcedata/behavioral".
    • Extracted event details accompany BOLD images in "sub-NN/func/*events.tsv".
    • Three column explanatory variable files are in "derivatives/ev/sub-NN".

    References

    Audacity® software is copyright © 1999-2018 Audacity Team. Web site: https://audacityteam.org/. The name Audacity® is a registered trademark of Dominic Mazzoni.

    FMRIB (Functional Magnetic Resonance Imaging of the Brain). FMRIB Software Library (FSL; fslreorient2std, robustfov, BET). Oxford, v5.0.9, Available: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/

    FreeSurfer (mri_deface). Harvard, v1.22, Available: https://surfer.nmr.mgh.harvard.edu/fswiki/AutomatedDefacingTools)

    LCNI (Lewis Center for Neuroimaging). MRIConvert (mcverter), v2.1.0 build 440, Available: https://lcni.uoregon.edu/downloads/mriconvert/mriconvert-and-mcverter

    Peirce, JW. PsychoPy–psychophysics software in Python. Journal of Neuroscience Methods, 162(1–2):8 – 13, 2007. Software Available: http://www.psychopy.org/

    Python software is copyright © 2001-2015 Python Software Foundation. Web site: https://www.python.org

    Pyo software is copyright © 2009-2015 Olivier Belanger. Web site: http://ajaxsoundstudio.com/software/pyo/.

    R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available: https://www.R-project.org/.

    rgeos software is copyright © 2016 Bivand and Rundel. Web site: https://CRAN.R-project.org/package=rgeos

    Rorden, C. dcm2niix, v1.0.20171215, Available: https://github.com/rordenlab/dcm2niix

    spatstat software is copyright © 2016 Baddeley, Rubak, and Turner. Web site: https://CRAN.R-project.org/package=spatstat

    sp software is copyright © 2016 Pebesma and Bivand. Web site: https://CRAN.R-project.org/package=sp

  12. Data from: Birds of a Feather Augment Together: Exploring Sonic Links...

    • zenodo.org
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Bhattacharyya; Jacob Bhattacharyya (2025). Birds of a Feather Augment Together: Exploring Sonic Links Between Real and Virtual Worlds in Audio Augmented Reality [Dataset]. http://doi.org/10.5281/zenodo.16614796
    Explore at:
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jacob Bhattacharyya; Jacob Bhattacharyya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quantitative dataset presented in ISMAR 2025 submission "Birds of a Feather Augment Together: Exploring Sonic Links Between Real and Virtual Worlds in Audio Augmented Reality".

    Dataset covers the evaluation questionnaires presented to participants. Data has been pre-processed to flip negatively phrased questions as appropriate, and ensure that 7-step Likert data is scaled from -3 to 3.

  13. h

    MIT_environmental_impulse_responses

    • huggingface.co
    • opendatalab.com
    Updated Aug 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Scripka (2023). MIT_environmental_impulse_responses [Dataset]. https://huggingface.co/datasets/davidscripka/MIT_environmental_impulse_responses
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2023
    Authors
    David Scripka
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    MIT Environmental Impulse Response Dataset The audio recordings in this dataset are originally created by the Computational Audition Lab at MIT. The source of the data can be found at: https://mcdermottlab.mit.edu/Reverb/IR_Survey.html. The audio files in the dataset have been resampled to a sampling rate of 16 kHz. This resampling was done to reduce the size of the dataset while making it more suitable for various tasks, including data augmentation. The dataset consists of 271 audio files… See the full description on the dataset page: https://huggingface.co/datasets/davidscripka/MIT_environmental_impulse_responses.

  14. G

    Synthetic Training Data Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Synthetic Training Data Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/synthetic-training-data-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Synthetic Training Data Market Outlook



    According to our latest research, the global synthetic training data market size in 2024 is valued at USD 1.45 billion, demonstrating robust momentum as organizations increasingly adopt artificial intelligence and machine learning solutions. The market is projected to grow at a remarkable CAGR of 38.7% from 2025 to 2033, reaching an estimated USD 22.46 billion by 2033. This exponential growth is primarily driven by the rising demand for high-quality, diverse, and privacy-compliant datasets that fuel advanced AI models, as well as the escalating need for scalable data solutions across various industries.




    One of the primary growth factors propelling the synthetic training data market is the escalating complexity and diversity of AI and machine learning applications. As organizations strive to develop more accurate and robust AI models, the need for vast amounts of annotated and high-quality training data has surged. Traditional data collection methods are often hampered by privacy concerns, high costs, and time-consuming processes. Synthetic training data, generated through advanced algorithms and simulation tools, offers a compelling alternative by providing scalable, customizable, and bias-mitigated datasets. This enables organizations to accelerate model development, improve performance, and comply with evolving data privacy regulations such as GDPR and CCPA, thus driving widespread adoption across sectors like healthcare, finance, autonomous vehicles, and robotics.




    Another significant driver is the increasing adoption of synthetic data for data augmentation and rare event simulation. In sectors such as autonomous vehicles, manufacturing, and robotics, real-world data for edge-case scenarios or rare events is often scarce or difficult to capture. Synthetic training data allows for the generation of these critical scenarios at scale, enabling AI systems to learn and adapt to complex, unpredictable environments. This not only enhances model robustness but also reduces the risk associated with deploying AI in safety-critical applications. The flexibility to generate diverse data types, including images, text, audio, video, and tabular data, further expands the applicability of synthetic data solutions, making them indispensable tools for innovation and competitive advantage.




    The synthetic training data market is also experiencing rapid growth due to the heightened focus on data privacy and regulatory compliance. As data protection regulations become more stringent worldwide, organizations face increasing challenges in accessing and utilizing real-world data for AI training without violating user privacy. Synthetic data addresses this challenge by creating realistic yet entirely artificial datasets that preserve the statistical properties of original data without exposing sensitive information. This capability is particularly valuable for industries such as BFSI, healthcare, and government, where data sensitivity and compliance requirements are paramount. As a result, the adoption of synthetic training data is expected to accelerate further as organizations seek to balance innovation with ethical and legal responsibilities.




    From a regional perspective, North America currently leads the synthetic training data market, driven by the presence of major technology companies, robust R&D investments, and early adoption of AI technologies. However, the Asia Pacific region is anticipated to witness the highest growth rate during the forecast period, fueled by expanding AI initiatives, government support, and the rapid digital transformation of industries. Europe is also emerging as a key market, particularly in sectors where data privacy and regulatory compliance are critical. Latin America and the Middle East & Africa are gradually increasing their market share as awareness and adoption of synthetic data solutions grow. Overall, the global landscape is characterized by dynamic regional trends, with each region contributing uniquely to the marketÂ’s expansion.



    The introduction of a Synthetic Data Generation Engine has revolutionized the way organizations approach data creation and management. This engine leverages cutting-edge algorithms to produce high-quality synthetic datasets that mirror real-world data without compromising privacy. By sim

  15. Audiomentations

    • kaggle.com
    zip
    Updated Apr 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    atfujita (2022). Audiomentations [Dataset]. https://www.kaggle.com/datasets/atsunorifujita/audiomentations
    Explore at:
    zip(62619 bytes)Available download formats
    Dataset updated
    Apr 22, 2022
    Authors
    atfujita
    Description

    A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.

    Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!

  16. h

    multi_accent_speech

    • huggingface.co
    Updated Sep 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cagatay Nufer (2025). multi_accent_speech [Dataset]. https://huggingface.co/datasets/cagatayn/multi_accent_speech
    Explore at:
    Dataset updated
    Sep 10, 2025
    Authors
    Cagatay Nufer
    Description

    Multi-Accent English Speech Corpus (Augmented & Speaker-Disjoint)

    This dataset is a curated and augmented multi-accent English speech corpus designed for speech recognition, accent classification, and representation learning.It consolidates multiple open-source accent corpora, converts all audio to a unified format, applies targeted data augmentation, and exports in a tidy, Hugging Face–ready structure.

      ✨ Key Features
    

    Accents covered (12 total):american_english… See the full description on the dataset page: https://huggingface.co/datasets/cagatayn/multi_accent_speech.

  17. D

    Dataset Licensing For AI Training Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Dataset Licensing For AI Training Market Research Report 2033 [Dataset]. https://dataintelo.com/report/dataset-licensing-for-ai-training-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Dataset Licensing for AI Training Market Outlook



    According to our latest research, the global Dataset Licensing for AI Training market size reached USD 2.1 billion in 2024, with a robust CAGR of 22.4% projected through the forecast period. By 2033, the market is expected to achieve a value of USD 15.2 billion. This remarkable growth is primarily fueled by the exponential rise in demand for high-quality, diverse, and ethically sourced datasets required to train increasingly sophisticated artificial intelligence (AI) models across industries. As organizations continue to scale their AI initiatives, the need for compliant, scalable, and customizable licensing solutions has never been more critical, driving significant investments and innovation in the dataset licensing ecosystem.




    A primary growth factor for the Dataset Licensing for AI Training market is the proliferation of AI applications across sectors such as healthcare, finance, automotive, and government. As AI models become more complex, their hunger for diverse and representative datasets intensifies, making data acquisition and licensing a strategic priority for enterprises. The increasing adoption of machine learning, deep learning, and generative AI technologies further amplifies the need for specialized datasets, pushing both data providers and consumers to seek flexible and secure licensing arrangements. Additionally, regulatory developments such as GDPR in Europe and similar data privacy frameworks worldwide are compelling organizations to prioritize licensed, compliant datasets over ad hoc or unlicensed data sources, further accelerating market growth.




    Another significant driver is the growing sophistication of dataset licensing models themselves. Vendors are moving beyond traditional open-source or proprietary licenses, introducing hybrid, creative commons, and custom-negotiated agreements tailored to specific use cases and industries. This evolution is enabling AI developers to access a broader variety of data types—text, image, audio, video, and multimodal—while ensuring legal clarity and minimizing risk. Moreover, the rise of data marketplaces and third-party platforms is streamlining the process of dataset discovery, negotiation, and compliance monitoring, making it easier for organizations of all sizes to source and license the data they need for AI training at scale.




    The surging demand for high-quality annotated datasets is also fostering partnerships between data providers, annotation service vendors, and AI developers. These collaborations are leading to the creation of bespoke datasets that cater to niche applications, such as autonomous driving, medical diagnostics, and advanced robotics. At the same time, advances in synthetic data generation and data augmentation are expanding the universe of licensable datasets, offering new avenues for licensing and monetization. As the market matures, we expect to see increased standardization, transparency, and interoperability in licensing frameworks, further lowering barriers to entry and accelerating innovation in AI model development.




    Regionally, North America continues to dominate the Dataset Licensing for AI Training market, accounting for the largest share in 2024, driven by the presence of leading technology companies, robust regulatory frameworks, and a mature AI ecosystem. Europe follows closely, with significant investments in ethical AI and data governance initiatives. Asia Pacific is emerging as a high-growth region, fueled by rapid digital transformation, government-backed AI strategies, and a burgeoning startup landscape. Latin America and the Middle East & Africa are also witnessing increased adoption of licensed datasets, particularly in sectors such as healthcare and public administration, although their market shares remain comparatively smaller. This global momentum underscores the universal need for high-quality, licensed datasets as the foundation of responsible and effective AI training.



    License Type Analysis



    The License Type segment in the Dataset Licensing for AI Training market is characterized by a diverse range of options, including Open Source, Proprietary, Creative Commons, and Custom/Negotiated licenses. Open source licenses have long been favored by academic and research communities due to their accessibility and collaborative ethos. However, their adoption in commercial AI projects is often tempered by concerns over data provenance, usage restrictions, a

  18. h

    singaporean_district_noise

    • huggingface.co
    Updated Jul 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DANG VAN THUC (2025). singaporean_district_noise [Dataset]. https://huggingface.co/datasets/thucdangvan020999/singaporean_district_noise
    Explore at:
    Dataset updated
    Jul 10, 2025
    Authors
    DANG VAN THUC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Singapore
    Description

    Singaporean district with noise

      Dataset Description
    

    Singaporean district speech dataset with controlled noise augmentation for ASR training

      Dataset Summary
    

    Language: EN Task: Automatic Speech Recognition
    Total Samples: 2,288 Audio Sample Rate: 16kHz Base Dataset: Custom dataset Processing: Noise-augmented

      Dataset Structure
    
    
    
    
    
      Data Fields
    

    audio: Audio file (16kHz WAV format) text: Transcription text noise_type: Type of background noise… See the full description on the dataset page: https://huggingface.co/datasets/thucdangvan020999/singaporean_district_noise.

  19. m

    Sudden Queen Loss Event in an Africanized Honeybee Colony

    • data.mendeley.com
    Updated May 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ícaro de Lima Rodrigues (2024). Sudden Queen Loss Event in an Africanized Honeybee Colony [Dataset]. http://doi.org/10.17632/j97khfj656.1
    Explore at:
    Dataset updated
    May 21, 2024
    Authors
    Ícaro de Lima Rodrigues
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of 23 features extracted from audio recordings of an Africanized honeybee hive in Fortaleza-CE, Brazil. The first feature is the recording date, and the last is the label indicating the queen's presence status. The label can take two values: "QR" for queenright (presence of queen) or "QL" for queenless (absence of queen). The remaining features are directly extracted from the audio signal, divided into three groups: time-domain features (zcr, energy, and energy entropy), spectral features (centroid, spread, entropy, flux, and rolloff), and 13 MFCC coefficients. For further details on the meaning of each feature, please refer to https://doi.org/10.1371/journal.pone.0144610.t002.

    The data were collected from daily recordings over a 6-day period, with the queen bee removed from the dataset on the last day. Consequently, the QR and QL classes are unbalanced, with QL representing only 1/6 of the data. This situation is common in this type of monitoring, where the hive's functioning is expected to remain within normal well-being parameters most of the time. Naturally, anomalies such as the sudden queen loss are uncommon and therefore represent a smaller portion of the data. The experiment and the data aim to replicate and incorporate these conditions for greater fidelity to the addressed problem.

    Such issues can be addressed using techniques such as anomaly detection, one-class classification, or incremental learning. Additionally, techniques for handling unbalanced data in classification problems, such as data augmentation and resampling, can be employed. Using OC-SVM, we achieved results with 96% accuracy and 99% precision.

  20. D

    Data from: ARAUS: A Large-Scale Dataset and Baseline Models of Affective...

    • researchdata.ntu.edu.sg
    zip
    Updated Nov 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenneth Ooi; Kenneth Ooi; Zhen-Ting Ong; Zhen-Ting Ong; Karn N. Watcharasupat; Karn N. Watcharasupat; Bhan Lam; Bhan Lam; Joo Young Hong; Joo Young Hong; Woon-Seng Gan; Woon-Seng Gan (2023). ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes [Dataset]. http://doi.org/10.21979/N9/9OTEVX
    Explore at:
    zip(9362005), zip(29322354), zip(1854668318), zip(586577999), zip(221886207), zip(12150441), zip(3891054), zip(2242857218)Available download formats
    Dataset updated
    Nov 1, 2023
    Dataset provided by
    DR-NTU (Data)
    Authors
    Kenneth Ooi; Kenneth Ooi; Zhen-Ting Ong; Zhen-Ting Ong; Karn N. Watcharasupat; Karn N. Watcharasupat; Bhan Lam; Bhan Lam; Joo Young Hong; Joo Young Hong; Woon-Seng Gan; Woon-Seng Gan
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Dataset funded by
    Ministry of National Development (MND)
    National Research Foundation (NRF)
    Description

    This repository contains the ARAUS dataset, a publicly-available dataset (comprising a 5-fold training/validation set and an independent test set) of 25,440 unique subjective perceptual responses to augmented soundscapes presented as audio-visual stimuli. Each augmented soundscape is made by digitally adding "maskers" (bird, water, wind, traffic, construction, or silence) to urban soundscape recordings at fixed soundscape-to-masker ratios. This mimics a real-life soundscape augmentation system, whereby a speaker (or some other sound source) is used to add "maskers" to an actual urban soundscape. Responses were then collected by asking participants to rate how pleasant, annoying, eventful, uneventful, vibrant, monotonous, chaotic, calm, and appropriate each augmented soundscape was. The data in this repository aims to form a benchmark for fair comparisons of models for the prediction and analysis of perceptual attributes of soundscapes. Please refer to our publication submitted to IEEE Transactions on Affective Computing for more details regarding the data collection, annotation, and processing methodologies for the creation of the dataset: Kenneth Ooi, Zhen-Ting Ong, Karn N. Watcharasupat, Bhan Lam, Joo Young Hong, Woon-Seng Gan, ARAUS: A large-scale dataset and baseline models of affective responses to augmented urban soundscapes, IEEE Transactions on Affective Computing, doi: 10.1109/TAFFC.2023.3247914. Replication code and baseline models that we have trained using the ARAUS dataset can be found at our GitHub repository: https://github.com/ntudsp/araus-dataset-baseline-models

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
HyeongChan Kim (2023). audiomentations [Dataset]. https://www.kaggle.com/kozistr/audiomentations
Organization logo

audiomentations

A Python library for audio data augmentation

Explore at:
zip(35911 bytes)Available download formats
Dataset updated
Jan 10, 2023
Authors
HyeongChan Kim
Description

Audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

official : https://github.com/iver56/audiomentations

Search
Clear search
Close search
Google apps
Main menu