Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
frugal-ai-challenge/public-leaderboard-audio dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RGTM-PNO Dataset
RGTM-PNO is an open audio dataset featuring a collection of vintage piano songs in the style of ragtime, a genre that flourished around the turn of the 20th century. The dataset contains 262 audio tracks recorded in uncompressed stereo WAV format, synthetically generated using a custom soundfont and MIDI files sourced from public resources online.
Dataset
The primary objective of this dataset is to provide accessible content for machine learning applications in music and audio research. Some potential use cases for this dataset include audio classification, automatic music transcription (ADT), music information retrieval (MIR), melody analysis, AI music generation, sound design and signal processing.
Specifications
262 piano songs (approximately 13.5 hours)
16-bit WAV format
Tempo: 120bpm (live performance in absolute time)
Variational chorus detuning (vintage piano sound)
Paired audio and MIDI data
License
This dataset was compiled by WaivOps, a crowdsourced music project managed by the sound label company Patchbanks. The audio recordings were sonified from MIDI files containing historical musical compositions believed to be in the public domain and copyright free.
The RGTM-PNO dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Additional Info
For audio examples or more information about this dataset, please refer to the GitHub repository.
As the burden of respiratory diseases continues to fall on society worldwide, this paper proposes a high-quality and reliable dataset of human sounds for studying respiratory illnesses, including pneumonia and COVID-19. It consists of coughing, mouth breathing, and nose breathing sounds together with metadata on related clinical characteristics. We also develop a proof-of-concept system for establishing baselines and benchmarking against multiple datasets, such as Coswara and COUGHVID. Our comprehensive experiments show that the Sound-Dr dataset has richer features, better performance, and is more robust to dataset shifts in various machine learning tasks. It is promising for a wide range of real-time applications on mobile devices. The proposed dataset and system will serve as practical tools to support healthcare professionals in diagnosing respiratory disorders. The dataset and code are publicly available here: https://github.com/ReML-AI/Sound-Dr/.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
In this dataset, we have scraped the public availabile playlist that can found through searching the search tab of Suno. We were able to identify 98 playlists and 4418 songs. We have also shared the video links in another subset of this dataset. Please check that-out.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global AI training dataset market size was valued at approximately USD 1.2 billion in 2023 and is projected to reach USD 6.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 20.5% from 2024 to 2032. This substantial growth is driven by the increasing adoption of artificial intelligence across various industries, the necessity for large-scale and high-quality datasets to train AI models, and the ongoing advancements in AI and machine learning technologies.
One of the primary growth factors in the AI training dataset market is the exponential increase in data generation across multiple sectors. With the proliferation of internet usage, the expansion of IoT devices, and the digitalization of industries, there is an unprecedented volume of data being generated daily. This data is invaluable for training AI models, enabling them to learn and make more accurate predictions and decisions. Moreover, the need for diverse and comprehensive datasets to improve AI accuracy and reliability is further propelling market growth.
Another significant factor driving the market is the rising investment in AI and machine learning by both public and private sectors. Governments around the world are recognizing the potential of AI to transform economies and improve public services, leading to increased funding for AI research and development. Simultaneously, private enterprises are investing heavily in AI technologies to gain a competitive edge, enhance operational efficiency, and innovate new products and services. These investments necessitate high-quality training datasets, thereby boosting the market.
The proliferation of AI applications in various industries, such as healthcare, automotive, retail, and finance, is also a major contributor to the growth of the AI training dataset market. In healthcare, AI is being used for predictive analytics, personalized medicine, and diagnostic automation, all of which require extensive datasets for training. The automotive industry leverages AI for autonomous driving and vehicle safety systems, while the retail sector uses AI for personalized shopping experiences and inventory management. In finance, AI assists in fraud detection and risk management. The diverse applications across these sectors underline the critical need for robust AI training datasets.
As the demand for AI applications continues to grow, the role of Ai Data Resource Service becomes increasingly vital. These services provide the necessary infrastructure and tools to manage, curate, and distribute datasets efficiently. By leveraging Ai Data Resource Service, organizations can ensure that their AI models are trained on high-quality and relevant data, which is crucial for achieving accurate and reliable outcomes. The service acts as a bridge between raw data and AI applications, streamlining the process of data acquisition, annotation, and validation. This not only enhances the performance of AI systems but also accelerates the development cycle, enabling faster deployment of AI-driven solutions across various sectors.
Regionally, North America currently dominates the AI training dataset market due to the presence of major technology companies and extensive R&D activities in the region. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid technological advancements, increasing investments in AI, and the growing adoption of AI technologies across various industries in countries like China, India, and Japan. Europe and Latin America are also anticipated to experience significant growth, supported by favorable government policies and the increasing use of AI in various sectors.
The data type segment of the AI training dataset market encompasses text, image, audio, video, and others. Each data type plays a crucial role in training different types of AI models, and the demand for specific data types varies based on the application. Text data is extensively used in natural language processing (NLP) applications such as chatbots, sentiment analysis, and language translation. As the use of NLP is becoming more widespread, the demand for high-quality text datasets is continually rising. Companies are investing in curated text datasets that encompass diverse languages and dialects to improve the accuracy and efficiency of NLP models.
Image data is critical for computer vision application
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sound datasets from real farming environments are scarce, prompting us to release a portion of 5-second data segments with labels after multiple rounds of data cleaning. We have disclosed 2,000 segments for each of the three categories (Healthy, Sick, None - no chicken sound), totaling 6,000 five-second audio clips. We make this dataset publicly available, to contribute to the advancements in research related to the detection of respiratory diseases based on poultry vocalizations.
In large-scale poultry farming, respiratory diseases affect the health of chickens, leading to a decline in the quality and yield of both meat and eggs. Effective monitoring of these diseases is crucial to reducing their impact and enhancing the quality and yield. Currently, most monitoring methods still rely on manually monitoring chicken vocalizations, which are time-consuming, labor-intensive, and require specialized personnel, making 24/7 monitoring unfeasible. Existing intelligent methods are often limited to laboratory environments where individual chickens are monitored separately. These approaches do not meet the industrial and commercial requirements of poultry farms, where a diverse set of complex auditory signals may be captured. These signals include not only chicken vocalizations but also complex noises from cages, chicken behaviors, human activities, mechanical ventilation systems, and other backgrounds noises. In this study, we design a deep learning-based intelligent recognition algorithm capable of accurately distinguishing abnormal chicken vocalizations among complex sound signals. Furthermore, we integrate this algorithm into a distributed health monitoring system - SmartEars, enabling continuous collection of various sound signals and performing real-time recognitions, thereby providing round-the-clock monitoring of the respiratory diseases of chickens in real production environments. We collected 11,686 audio slices from actual farming environments, which were labeled through multiple rounds of annotations by veterinary experts, resulting in a high-quality dataset for model training. Additionally, we used Logfbank to capture critical audio features to assist model learning. We also designed five data augmentation techniques to prevent overfitting and improve model performance. Finally, we compared multiple models on an independent test dataset and selected RegNet as the best model, which achieved the highest accuracy of 96.03%. To validate the effectiveness of our approach, we compared the annotation results of SmartEars with seven veterinarians over the same dataset. The results demonstrated that SmartEars with an accuracy of 93% outperformed human veterinary experts with accuracy from 85% to 93%. SmartEars has been deployed in 3 large poultry farms located in Hebei, China, and it has successfully identified a number of outbursts of chicken diseases, such as a confirmed event around March 19, 2024, demonstrating the effectiveness of SmartEars.
The TAME Pain Dataset contains data collected during a study with 51 individuals. It encompasses a collection of 7,039 annotated utterances derived from 51 participants, totalling approximately 311 minutes of audio recordings. Each utterance within the dataset is labeled with a self-reported pain level on a 1-10 scale. These pain levels are further categorized into three distinct classifications: binary (No Pain vs. Pain), three-class (Mild, Moderate, Severe), and condition-based (Cold vs. Warm), facilitating diverse analytical approaches. By making this dataset publicly available, we aim to advance AI-driven pain assessment technologies by enabling the analysis of audio features to objectively identify pain.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator’s Location Sentiment Data for Mauritania
Techsalerator’s Location Sentiment Data for Mauritania provides deep insights into how people perceive different locations across urban, rural, and industrial areas. This dataset is crucial for businesses, researchers, and policymakers aiming to understand sentiment trends across various regions in Mauritania.
For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.
Techsalerator’s Location Sentiment Data for Mauritania offers a structured analysis of public sentiment across cities, towns, and remote areas. This dataset is essential for market research, urban development, AI sentiment analysis, and regional planning.
To obtain Techsalerator’s Location Sentiment Data for Mauritania, contact info@techsalerator.com with your specific requirements. Techsalerator provides customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.
For in-depth insights into public sentiment and regional perception in Mauritania, Techsalerator’s dataset is an invaluable resource for businesses, researchers, policymakers, and urban planners.
An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('speech_commands', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CrowdSpeech is a publicly available large-scale dataset of crowdsourced audio transcriptions. It contains annotations for more than 50 hours of English speech transcriptions from more than 1,000 crowd workers.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Off-the-shelf English Audio Dataset - Total Volume 700 hrs, Bifurcated into 16 khz Public domain Media & Podcasts audio/video coversations 700 hrs. Topics include Agriculture, Art, Aviation, Banking, Consumer, Crime, Culture, Delivery, Entertainment, Finance, Food, Gaming, Health, Hospitality, IT, Insurance, Legal, News, Oil, Politics, Real Estate, Religion, Retail, Spirituality, Sports, Technology, Telecom, Travel, Weather, Automotive. Audio Format .wav, Transcription Format .json.
Dataset Card for Myrtle/CAIMAN-ASR-BackgroundNoise
This dataset provides background noise audio, suitable for noise augmentation while training Myrtle.ai's CAIMAN-ASR models.
Dataset Details
Dataset Description
Curated by: Myrtle.ai License: Myrtle.ai's modifications to the source data are licensed under the CC BY 4.0 license. Some of the original data is under the CC BY 3.0 license; the rest is in the public domain. Please see the Source Data section… See the full description on the dataset page: https://huggingface.co/datasets/Myrtle/CAIMAN-ASR-BackgroundNoise.
We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system.nbsp;By capturing 2 female and 2 male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data.nbsp;Despite the existence of multi-view color datasets captured with the use of hardware (HW) synchronization, to the best of our knowledge, HUMAN4D is the first and only public resource that provides volumetric depth maps with high synchronization precision due to the use of intra- and inter-sensor HW-SYNC.nbsp;Moreover, a spatio-temporally aligned scanned and rigged 3D character complements HUMAN4D to enable joint research on time-varying and high-quality dynamic meshes.nbsp;We provide evaluation baselines by benchmarking HUMAN4D with state-of-the-art human pose estimation and 3D compression methods.nbsp;For the former, we apply 2D and 3D pose estimation algorithms both on single- and multi-view data cues.nbsp;For the latter, we benchmark open-source 3D codecs on volumetric data respecting online volumetric video encoding and steady bit-rates.nbsp;Furthermore, qualitative and quantitative visual comparison between mesh-based volumetric data reconstructed in different qualities showcases the available options with respect to 4D representations.nbsp;HUMAN4D is introduced to the computer vision and graphics research communities to enable joint research on spatio-temporally aligned pose, volumetric, mRGBD and audio data cues.The dataset and its code are available online.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Motiv: A Dataset of Latent Space Representations of Musical Phrase Motions This study introduces a novel approach for analyzing musical motions through the creation of the Motiv dataset. The Motiv dataset was constructed through a four-step process that involved selecting professional saxophonists, defining source materials, establishing parameters for musical motions, and modeling the musical phrases in a latent space. The study involved four highly skilled saxophonists performing mixed music works, particularly on the tenor saxophone. They recorded three musical phrases from ``Lamento'' by Jesús Villa-Rojo, each representing different emotional and technical characteristics. The saxophonists were guided to record variations of the original phrases, classified into three motion types—parallel, oblique, and contrary—based on specific guidelines that allowed for flexibility in interpretation. These transformations captured nuanced dynamics, articulation, pitch, and rhythm changes while maintaining temporal coherence. The dataset includes the recorded audio samples and their latent space representations, which were generated using a RAVE model. This model efficiently processes the audio and creates a structured representation of its spectral and temporal characteristics. Each sample in the dataset is annotated with details about the motion transformation and includes musical scores for reference. The data is organized in a comprehensive structure, stored in HDF5 format for easy management, and includes both the waveform and latent vector data. The dataset is intended for further analysis and is made publicly available for research purposes, enabling deeper exploration of musical motion and its interaction with latent space models. The Motiv dataset lays the groundwork for exploring the role of latent spaces in understanding and synthesizing thematic elaboration, with a specific focus on the geometric relationships between three motion types: parallel, oblique, and contrary. By utilizing a RAVE model to map the recorded audio into latent space, we present a structured representation of musical phrases that enables the analysis of these motion types and their variations.
Dataset Card for "lex_fridman_podcast"
Dataset Summary
This dataset contains transcripts from the Lex Fridman podcast (Episodes 1 to 325). The transcripts were generated using OpenAI Whisper (large model) and made publicly available at: https://karpathy.ai/lexicap/index.html.
Languages
English
Dataset Structure
The dataset contains around 803K entries, consisting of audio transcripts generated from episodes 1 to 325 of the Lex Fridman… See the full description on the dataset page: https://huggingface.co/datasets/nmac/lex_fridman_podcast.
https://www.techsciresearch.com/privacy-policy.aspxhttps://www.techsciresearch.com/privacy-policy.aspx
The market was valued at USD 1.76 billion in 2023 and is projected to register a compound annual growth rate of 23.59% during the forecast period 2029F.
Pages | 185 |
Market Size | 2023: USD 1.76 billion |
Forecast Market Size | 2029: USD 6.33 billion |
CAGR | 2024-2029:23.59% |
Fastest Growing Segment | BFSI |
Largest Market | North America |
Key Players | 1. Appen Limited 2. Cogito Tech LLC 3. Lionbridge Technologies, Inc 4. Google, LLC 5. Microsoft Corporation 6. Scale AI Inc. 7. Deep Vision Data 8. Anthropic, PBC. 9. CloudFactory Limited 10. Globalme Localization Inc |
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
We are sharing the videos of the subset of the audio we shared in our other dataset.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
This is the first public Sagaw Karen language ASR dataset in AI history.
Sagaw Karen ASR
This dataset contains audio recordings and aligned metadata in the Sagaw Karen language (ISO 639-3: ksw), a major Sgaw Karenic language spoken throughout southern and eastern Myanmar. The language is sometimes also referred to as Sgaw Karen or Sakaw Karen in English transliterations. All audio segments in this dataset were sourced from publicly available news broadcasts published by PVTV… See the full description on the dataset page: https://huggingface.co/datasets/freococo/sagaw_karen_asr.
https://choosealicense.com/licenses/pddl/https://choosealicense.com/licenses/pddl/
This is the first public Rohingya language ASR dataset in AI history.
Overview
This dataset contains broadcast audio recordings from the Voice of America (VOA) Rohingya Service. Each file represents a daily news segment, typically 30 minutes in length, automatically segmented into chunks of 5–15 seconds for use in self-supervised ASR, pretraining, language identification, and more. The content was aired publicly as part of VOA’s Rohingya-language radio program and is therefore… See the full description on the dataset page: https://huggingface.co/datasets/freococo/rohingya_asr_audio.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned. Acoustic models, trained on this data set, are available at icefall and language models, suitable for evaluation can be found at openslr. For more information, see the paper "LibriSpeech: an ASR corpus based on public domain audio… See the full description on the dataset page: https://huggingface.co/datasets/k2-fsa/LibriSpeech.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
frugal-ai-challenge/public-leaderboard-audio dataset hosted on Hugging Face and contributed by the HF Datasets community