25 datasets found

h
public-leaderboard-audio
huggingface.co
Updated Jan 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frugal AI Challenge (2025). public-leaderboard-audio [Dataset]. https://huggingface.co/datasets/frugal-ai-challenge/public-leaderboard-audio
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 31, 2025
Dataset authored and provided by
Frugal AI Challenge
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
frugal-ai-challenge/public-leaderboard-audio dataset hosted on Hugging Face and contributed by the HF Datasets community
WaivOps RGTM-PNO: Open Audio Resources for Machine Learning in Music
data.niaid.nih.gov
Updated Aug 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patchbanks (2024). WaivOps RGTM-PNO: Open Audio Resources for Machine Learning in Music [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13335847
Explore at:
Dataset updated
Aug 17, 2024
Dataset provided by
Patchbanks
WaivOps
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RGTM-PNO Dataset

RGTM-PNO is an open audio dataset featuring a collection of vintage piano songs in the style of ragtime, a genre that flourished around the turn of the 20th century. The dataset contains 262 audio tracks recorded in uncompressed stereo WAV format, synthetically generated using a custom soundfont and MIDI files sourced from public resources online.

Dataset

The primary objective of this dataset is to provide accessible content for machine learning applications in music and audio research. Some potential use cases for this dataset include audio classification, automatic music transcription (ADT), music information retrieval (MIR), melody analysis, AI music generation, sound design and signal processing.

Specifications

262 piano songs (approximately 13.5 hours)

16-bit WAV format

Tempo: 120bpm (live performance in absolute time)

Variational chorus detuning (vintage piano sound)

Paired audio and MIDI data

License

This dataset was compiled by WaivOps, a crowdsourced music project managed by the sound label company Patchbanks. The audio recordings were sonified from MIDI files containing historical musical compositions believed to be in the public domain and copyright free.

The RGTM-PNO dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).

Additional Info

For audio examples or more information about this dataset, please refer to the GitHub repository.
P
Sound-Dr Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sound-Dr Dataset [Dataset]. https://paperswithcode.com/dataset/sound-dr
Explore at:
Description
As the burden of respiratory diseases continues to fall on society worldwide, this paper proposes a high-quality and reliable dataset of human sounds for studying respiratory illnesses, including pneumonia and COVID-19. It consists of coughing, mouth breathing, and nose breathing sounds together with metadata on related clinical characteristics. We also develop a proof-of-concept system for establishing baselines and benchmarking against multiple datasets, such as Coswara and COUGHVID. Our comprehensive experiments show that the Sound-Dr dataset has richer features, better performance, and is more robust to dataset shifts in various machine learning tasks. It is promising for a wide range of real-time applications on mobile devices. The proposed dataset and system will serve as practical tools to support healthcare professionals in diagnosing respiratory disorders. The dataset and code are publicly available here: https://github.com/ReML-AI/Sound-Dr/.
h
Suno-Public-Playlist-small-audio
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sleeping AI, Suno-Public-Playlist-small-audio [Dataset]. https://huggingface.co/datasets/sleeping-ai/Suno-Public-Playlist-small-audio
Explore at:
Dataset authored and provided by
Sleeping AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
In this dataset, we have scraped the public availabile playlist that can found through searching the search tab of Suno. We were able to identify 98 playlists and 4418 songs. We have also shared the video links in another subset of this dataset. Please check that-out.
AI Training Dataset Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). AI Training Dataset Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-ai-training-dataset-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
AI Training Dataset Market Outlook

The global AI training dataset market size was valued at approximately USD 1.2 billion in 2023 and is projected to reach USD 6.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 20.5% from 2024 to 2032. This substantial growth is driven by the increasing adoption of artificial intelligence across various industries, the necessity for large-scale and high-quality datasets to train AI models, and the ongoing advancements in AI and machine learning technologies.

One of the primary growth factors in the AI training dataset market is the exponential increase in data generation across multiple sectors. With the proliferation of internet usage, the expansion of IoT devices, and the digitalization of industries, there is an unprecedented volume of data being generated daily. This data is invaluable for training AI models, enabling them to learn and make more accurate predictions and decisions. Moreover, the need for diverse and comprehensive datasets to improve AI accuracy and reliability is further propelling market growth.

Another significant factor driving the market is the rising investment in AI and machine learning by both public and private sectors. Governments around the world are recognizing the potential of AI to transform economies and improve public services, leading to increased funding for AI research and development. Simultaneously, private enterprises are investing heavily in AI technologies to gain a competitive edge, enhance operational efficiency, and innovate new products and services. These investments necessitate high-quality training datasets, thereby boosting the market.

The proliferation of AI applications in various industries, such as healthcare, automotive, retail, and finance, is also a major contributor to the growth of the AI training dataset market. In healthcare, AI is being used for predictive analytics, personalized medicine, and diagnostic automation, all of which require extensive datasets for training. The automotive industry leverages AI for autonomous driving and vehicle safety systems, while the retail sector uses AI for personalized shopping experiences and inventory management. In finance, AI assists in fraud detection and risk management. The diverse applications across these sectors underline the critical need for robust AI training datasets.

As the demand for AI applications continues to grow, the role of Ai Data Resource Service becomes increasingly vital. These services provide the necessary infrastructure and tools to manage, curate, and distribute datasets efficiently. By leveraging Ai Data Resource Service, organizations can ensure that their AI models are trained on high-quality and relevant data, which is crucial for achieving accurate and reliable outcomes. The service acts as a bridge between raw data and AI applications, streamlining the process of data acquisition, annotation, and validation. This not only enhances the performance of AI systems but also accelerates the development cycle, enabling faster deployment of AI-driven solutions across various sectors.

Regionally, North America currently dominates the AI training dataset market due to the presence of major technology companies and extensive R&D activities in the region. However, Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by rapid technological advancements, increasing investments in AI, and the growing adoption of AI technologies across various industries in countries like China, India, and Japan. Europe and Latin America are also anticipated to experience significant growth, supported by favorable government policies and the increasing use of AI in various sectors.

Data Type Analysis

The data type segment of the AI training dataset market encompasses text, image, audio, video, and others. Each data type plays a crucial role in training different types of AI models, and the demand for specific data types varies based on the application. Text data is extensively used in natural language processing (NLP) applications such as chatbots, sentiment analysis, and language translation. As the use of NLP is becoming more widespread, the demand for high-quality text datasets is continually rising. Companies are investing in curated text datasets that encompass diverse languages and dialects to improve the accuracy and efficiency of NLP models.

Image data is critical for computer vision application
m
SmartEars: A Practical Framework for Poultry Respiratory Monitoring via...
data.mendeley.com
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junxian Huang (2025). SmartEars: A Practical Framework for Poultry Respiratory Monitoring via Spectrogram-Based Audio Classification and AI-Assisted Labeling [Dataset]. http://doi.org/10.17632/dy6gtvt4mk.2
Explore at:
Unique identifier
https://doi.org/10.17632/dy6gtvt4mk.2
Dataset updated
Apr 21, 2025
Authors
Junxian Huang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sound datasets from real farming environments are scarce, prompting us to release a portion of 5-second data segments with labels after multiple rounds of data cleaning. We have disclosed 2,000 segments for each of the three categories (Healthy, Sick, None - no chicken sound), totaling 6,000 five-second audio clips. We make this dataset publicly available, to contribute to the advancements in research related to the detection of respiratory diseases based on poultry vocalizations.

In large-scale poultry farming, respiratory diseases affect the health of chickens, leading to a decline in the quality and yield of both meat and eggs. Effective monitoring of these diseases is crucial to reducing their impact and enhancing the quality and yield. Currently, most monitoring methods still rely on manually monitoring chicken vocalizations, which are time-consuming, labor-intensive, and require specialized personnel, making 24/7 monitoring unfeasible. Existing intelligent methods are often limited to laboratory environments where individual chickens are monitored separately. These approaches do not meet the industrial and commercial requirements of poultry farms, where a diverse set of complex auditory signals may be captured. These signals include not only chicken vocalizations but also complex noises from cages, chicken behaviors, human activities, mechanical ventilation systems, and other backgrounds noises. In this study, we design a deep learning-based intelligent recognition algorithm capable of accurately distinguishing abnormal chicken vocalizations among complex sound signals. Furthermore, we integrate this algorithm into a distributed health monitoring system - SmartEars, enabling continuous collection of various sound signals and performing real-time recognitions, thereby providing round-the-clock monitoring of the respiratory diseases of chickens in real production environments. We collected 11,686 audio slices from actual farming environments, which were labeled through multiple rounds of annotations by veterinary experts, resulting in a high-quality dataset for model training. Additionally, we used Logfbank to capture critical audio features to assist model learning. We also designed five data augmentation techniques to prevent overfitting and improve model performance. Finally, we compared multiple models on an independent test dataset and selected RegNet as the best model, which achieved the highest accuracy of 96.03%. To validate the effectiveness of our approach, we compared the annotation results of SmartEars with seven veterinarians over the same dataset. The results demonstrated that SmartEars with an accuracy of 93% outperformed human veterinary experts with accuracy from 85% to 93%. SmartEars has been deployed in 3 large poultry farms located in Hebei, China, and it has successfully identified a number of outbursts of chicken diseases, such as a confirmed event around March 19, 2024, demonstrating the effectiveness of SmartEars.
s
TAME Pain: Trustworthy AssessMEnt of Pain from Speech and Audio for the...
eprints.soton.ac.uk
physionet.org
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dao, Tu-Quyen; Schneiders, Eike; Williams, Jennifer; Bautista, John Robert; Seabrooke, Tina; Vigneswaran, Ganesh; Kolpekwar, Rishik; Vashistha, Ritwik; Farahi, Arya (2025). TAME Pain: Trustworthy AssessMEnt of Pain from Speech and Audio for the Empowerment of Patients [Dataset]. http://doi.org/10.13026/20e2-1g10
Explore at:
Unique identifier
https://doi.org/10.13026/20e2-1g10
Dataset updated
Jan 21, 2025
Dataset provided by
PhysioNet
Authors
Dao, Tu-Quyen; Schneiders, Eike; Williams, Jennifer; Bautista, John Robert; Seabrooke, Tina; Vigneswaran, Ganesh; Kolpekwar, Rishik; Vashistha, Ritwik; Farahi, Arya
Description
The TAME Pain Dataset contains data collected during a study with 51 individuals. It encompasses a collection of 7,039 annotated utterances derived from 51 participants, totalling approximately 311 minutes of audio recordings. Each utterance within the dataset is labeled with a self-reported pain level on a 1-10 scale. These pain levels are further categorized into three distinct classifications: binary (No Pain vs. Pain), three-class (Mild, Moderate, Severe), and condition-based (Cold vs. Warm), facilitating diverse analytical approaches. By making this dataset publicly available, we aim to advance AI-driven pain assessment technologies by enabling the analysis of audio features to objectively identify pain.
Sound and Audio Data in Mauritania
kaggle.com
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Techsalerator (2025). Sound and Audio Data in Mauritania [Dataset]. https://www.kaggle.com/datasets/techsalerator/sound-and-audio-data-in-mauritania/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Techsalerator
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Mauritania
Description
Techsalerator’s Location Sentiment Data for Mauritania

Techsalerator’s Location Sentiment Data for Mauritania provides deep insights into how people perceive different locations across urban, rural, and industrial areas. This dataset is crucial for businesses, researchers, and policymakers aiming to understand sentiment trends across various regions in Mauritania.

For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.

Techsalerator’s Location Sentiment Data for Mauritania

Techsalerator’s Location Sentiment Data for Mauritania offers a structured analysis of public sentiment across cities, towns, and remote areas. This dataset is essential for market research, urban development, AI sentiment analysis, and regional planning.

Top 5 Key Data Fields

Geographic Location – Provides the specific location of the sentiment data, enabling precise regional analysis.

Sentiment Score – Measures public sentiment (positive, neutral, negative) towards locations based on social media, reviews, and surveys.

Topic Categorization – Identifies key topics influencing sentiment, such as infrastructure, safety, economy, and tourism.

Demographic Sentiment Insights – Breaks down sentiment trends by age, gender, and occupation for a detailed understanding of public opinion.

Time-Based Sentiment Trends – Analyzes how sentiment fluctuates over time due to economic, social, or political events.

Top 5 Location Sentiment Trends in Mauritania

Urban Development Perception – Public sentiment varies between growing urban centers like Nouakchott and smaller towns, impacting investment decisions.

Tourism Sentiment – Increasing interest in Mauritania’s historical sites and desert landscapes influences sentiment trends in the travel industry.

Infrastructure and Transportation – Public feedback on road conditions, public transit, and utilities affects regional development planning.

Economic Sentiment – Insights into how local businesses, employment opportunities, and market conditions shape public perception.

Environmental and Climate Impact – Sentiment data on climate change, desertification, and conservation efforts inform sustainability initiatives.

Top 5 Applications of Location Sentiment Data in Mauritania

Urban and Regional Planning – Helps policymakers improve infrastructure and public services based on sentiment trends.

Market Research and Business Expansion – Assists businesses in identifying favorable locations for investment and expansion.

Tourism and Hospitality Industry – Provides insights into traveler sentiment to enhance tourism strategies and marketing.

AI and Machine Learning – Enhances location-based sentiment analysis for predictive modeling and decision-making.

Public Policy and Governance – Supports government agencies in assessing public opinion on policies and local initiatives.

Accessing Techsalerator’s Location Sentiment Data

To obtain Techsalerator’s Location Sentiment Data for Mauritania, contact info@techsalerator.com with your specific requirements. Techsalerator provides customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.

Included Data Fields

Geographic Location

Sentiment Score (Positive, Neutral, Negative)

Topic Categorization

Demographic Sentiment Insights

Time-Based Sentiment Trends

Public Review Sources (Social Media, Surveys, etc.)

Industry-Specific Sentiment Analysis

Economic and Business Sentiment

Environmental Sentiment Trends

Contact Information

For in-depth insights into public sentiment and regional perception in Mauritania, Techsalerator’s dataset is an invaluable resource for businesses, researchers, policymakers, and urban planners.
T
speech_commands
tensorflow.org
datasets.activeloop.ai
Updated Jan 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). speech_commands [Dataset]. http://identifiers.org/arxiv:1804.03209
Explore at:
Unique identifier
https://identifiers.org/arxiv:1804.03209
Dataset updated
Jan 13, 2023
Description
An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('speech_commands', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
CrowdSpeech
huggingface.co
Updated Sep 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Toloka (2021). CrowdSpeech [Dataset]. https://huggingface.co/datasets/toloka/CrowdSpeech
Explore at:
Dataset updated
Sep 23, 2021
Dataset authored and provided by
Tolokahttp://toloka.ai/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CrowdSpeech is a publicly available large-scale dataset of crowdsourced audio transcriptions. It contains annotations for more than 50 hours of English speech transcriptions from more than 1,000 crowd workers.
s
English Off-the-Shelf Datasets
bn.shaip.com
fi.shaip.com
+3more
json
Updated Jan 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2023). English Off-the-Shelf Datasets [Dataset]. https://bn.shaip.com/offerings/speech-data-catalog/
Explore at:
jsonAvailable download formats
Dataset updated
Jan 10, 2023
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Off-the-shelf English Audio Dataset - Total Volume 700 hrs, Bifurcated into 16 khz Public domain Media & Podcasts audio/video coversations 700 hrs. Topics include Agriculture, Art, Aviation, Banking, Consumer, Crime, Culture, Delivery, Entertainment, Finance, Food, Gaming, Health, Hospitality, IT, Insurance, Legal, News, Oil, Politics, Real Estate, Religion, Retail, Spirituality, Sports, Technology, Telecom, Travel, Weather, Automotive. Audio Format .wav, Transcription Format .json.
h
CAIMAN-ASR-BackgroundNoise
huggingface.co
Updated Apr 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Myrtle.ai (2024). CAIMAN-ASR-BackgroundNoise [Dataset]. https://huggingface.co/datasets/Myrtle/CAIMAN-ASR-BackgroundNoise
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2024
Dataset provided by
Myrtle.ai
Description
Dataset Card for Myrtle/CAIMAN-ASR-BackgroundNoise

This dataset provides background noise audio, suitable for noise augmentation while training Myrtle.ai's CAIMAN-ASR models.

Dataset Details Dataset Description

Curated by: Myrtle.ai License: Myrtle.ai's modifications to the source data are licensed under the CC BY 4.0 license. Some of the original data is under the CC BY 3.0 license; the rest is in the public domain. Please see the Source Data section… See the full description on the dataset page: https://huggingface.co/datasets/Myrtle/CAIMAN-ASR-BackgroundNoise.
o
HUMAN4D: A Human-Centric Multimodal Dataset for Motions Immersive Media
explore.openaire.eu
Updated Aug 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anargyros Chatzitofis; Leonidas Saroglou; Prodromos Boutis; Petros Drakoulis; Nikolaos Zioulis; Shishir Subramanyam; Bart Kevelham; Caecilia Charbonnier; Pablo Cesar; Dimitrios Zarpalas; Stefanos Kollias; Petros Daras (2020). HUMAN4D: A Human-Centric Multimodal Dataset for Motions Immersive Media [Dataset]. http://doi.org/10.21227/rv2m-wh93
Explore at:
Unique identifier
https://doi.org/10.21227/rv2m-wh93
Dataset updated
Aug 18, 2020
Authors
Anargyros Chatzitofis; Leonidas Saroglou; Prodromos Boutis; Petros Drakoulis; Nikolaos Zioulis; Shishir Subramanyam; Bart Kevelham; Caecilia Charbonnier; Pablo Cesar; Dimitrios Zarpalas; Stefanos Kollias; Petros Daras
Description
We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system.nbsp;By capturing 2 female and 2 male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data.nbsp;Despite the existence of multi-view color datasets captured with the use of hardware (HW) synchronization, to the best of our knowledge, HUMAN4D is the first and only public resource that provides volumetric depth maps with high synchronization precision due to the use of intra- and inter-sensor HW-SYNC.nbsp;Moreover, a spatio-temporally aligned scanned and rigged 3D character complements HUMAN4D to enable joint research on time-varying and high-quality dynamic meshes.nbsp;We provide evaluation baselines by benchmarking HUMAN4D with state-of-the-art human pose estimation and 3D compression methods.nbsp;For the former, we apply 2D and 3D pose estimation algorithms both on single- and multi-view data cues.nbsp;For the latter, we benchmark open-source 3D codecs on volumetric data respecting online volumetric video encoding and steady bit-rates.nbsp;Furthermore, qualitative and quantitative visual comparison between mesh-based volumetric data reconstructed in different qualities showcases the available options with respect to 4D representations.nbsp;HUMAN4D is introduced to the computer vision and graphics research communities to enable joint research on spatio-temporally aligned pose, volumetric, mRGBD and audio data cues.The dataset and its code are available online.
H
Motiv: A Dataset of Latent Space Representations of Musical Phrase Motions
dataverse.harvard.edu
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nádia Carvalho (2025). Motiv: A Dataset of Latent Space Representations of Musical Phrase Motions [Dataset]. http://doi.org/10.7910/DVN/RWCG4B
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/RWCG4B
Dataset updated
May 12, 2025
Dataset provided by
Harvard Dataverse
Authors
Nádia Carvalho
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Motiv: A Dataset of Latent Space Representations of Musical Phrase Motions This study introduces a novel approach for analyzing musical motions through the creation of the Motiv dataset. The Motiv dataset was constructed through a four-step process that involved selecting professional saxophonists, defining source materials, establishing parameters for musical motions, and modeling the musical phrases in a latent space. The study involved four highly skilled saxophonists performing mixed music works, particularly on the tenor saxophone. They recorded three musical phrases from ``Lamento'' by Jesús Villa-Rojo, each representing different emotional and technical characteristics. The saxophonists were guided to record variations of the original phrases, classified into three motion types—parallel, oblique, and contrary—based on specific guidelines that allowed for flexibility in interpretation. These transformations captured nuanced dynamics, articulation, pitch, and rhythm changes while maintaining temporal coherence. The dataset includes the recorded audio samples and their latent space representations, which were generated using a RAVE model. This model efficiently processes the audio and creates a structured representation of its spectral and temporal characteristics. Each sample in the dataset is annotated with details about the motion transformation and includes musical scores for reference. The data is organized in a comprehensive structure, stored in HDF5 format for easy management, and includes both the waveform and latent vector data. The dataset is intended for further analysis and is made publicly available for research purposes, enabling deeper exploration of musical motion and its interaction with latent space models. The Motiv dataset lays the groundwork for exploring the role of latent spaces in understanding and synthesizing thematic elaboration, with a specific focus on the geometric relationships between three motion types: parallel, oblique, and contrary. By utilizing a RAVE model to map the recorded audio into latent space, we present a structured representation of musical phrases that enables the analysis of these motion types and their variations.
h
lex_fridman_podcast
huggingface.co
Updated Feb 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nuno Machado (2023). lex_fridman_podcast [Dataset]. https://huggingface.co/datasets/nmac/lex_fridman_podcast
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 18, 2023
Authors
Nuno Machado
Description
Dataset Card for "lex_fridman_podcast"

Dataset Summary

This dataset contains transcripts from the Lex Fridman podcast (Episodes 1 to 325). The transcripts were generated using OpenAI Whisper (large model) and made publicly available at: https://karpathy.ai/lexicap/index.html.

Languages

English

Dataset Structure

The dataset contains around 803K entries, consisting of audio transcripts generated from episodes 1 to 325 of the Lex Fridman… See the full description on the dataset page: https://huggingface.co/datasets/nmac/lex_fridman_podcast.

Data AI Training Dataset Market Demand, Size and Competitive Analysis |...

techsciresearch.com

Updated May 15, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

TechSci Research (2024). Data AI Training Dataset Market Demand, Size and Competitive Analysis | TechSci Research [Dataset]. https://www.techsciresearch.com/report/data-ai-training-dataset-market/19499.html

Explore at:

Dataset updated

May 15, 2024

Dataset authored and provided by

TechSci Research

License

https://www.techsciresearch.com/privacy-policy.aspxhttps://www.techsciresearch.com/privacy-policy.aspx

Description

The market was valued at USD 1.76 billion in 2023 and is projected to register a compound annual growth rate of 23.59% during the forecast period 2029F.

Pages	185
Market Size	2023: USD 1.76 billion
Forecast Market Size	2029: USD 6.33 billion
CAGR	2024-2029:23.59%
Fastest Growing Segment	BFSI
Largest Market	North America
Key Players	1. Appen Limited 2. Cogito Tech LLC 3. Lionbridge Technologies, Inc 4. Google, LLC 5. Microsoft Corporation 6. Scale AI Inc. 7. Deep Vision Data 8. Anthropic, PBC. 9. CloudFactory Limited 10. Globalme Localization Inc

h
Suno-Public-Playlist-small-video
huggingface.co
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sleeping AI (2025). Suno-Public-Playlist-small-video [Dataset]. https://huggingface.co/datasets/sleeping-ai/Suno-Public-Playlist-small-video
Explore at:
Dataset updated
Jun 13, 2025
Dataset authored and provided by
Sleeping AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
We are sharing the videos of the subset of the audio we shared in our other dataset.
h
sagaw_karen_asr
huggingface.co
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wynn (2025). sagaw_karen_asr [Dataset]. https://huggingface.co/datasets/freococo/sagaw_karen_asr
Explore at:
Dataset updated
Jul 10, 2025
Authors
Wynn
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
This is the first public Sagaw Karen language ASR dataset in AI history.

Sagaw Karen ASR

This dataset contains audio recordings and aligned metadata in the Sagaw Karen language (ISO 639-3: ksw), a major Sgaw Karenic language spoken throughout southern and eastern Myanmar. The language is sometimes also referred to as Sgaw Karen or Sakaw Karen in English transliterations. All audio segments in this dataset were sourced from publicly available news broadcasts published by PVTV… See the full description on the dataset page: https://huggingface.co/datasets/freococo/sagaw_karen_asr.
h
rohingya_asr_audio
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wynn, rohingya_asr_audio [Dataset]. https://huggingface.co/datasets/freococo/rohingya_asr_audio
Explore at:
Authors
Wynn
License
https://choosealicense.com/licenses/pddl/https://choosealicense.com/licenses/pddl/
Description
This is the first public Rohingya language ASR dataset in AI history.

Overview

This dataset contains broadcast audio recordings from the Voice of America (VOA) Rohingya Service. Each file represents a daily news segment, typically 30 minutes in length, automatically segmented into chunks of 5–15 seconds for use in self-supervised ASR, pretraining, language identification, and more. The content was aired publicly as part of VOA’s Rohingya-language radio program and is therefore… See the full description on the dataset page: https://huggingface.co/datasets/freococo/rohingya_asr_audio.
h
LibriSpeech
huggingface.co
tensorflow.org
+2more
Updated Feb 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
k2-fsa (2024). LibriSpeech [Dataset]. https://huggingface.co/datasets/k2-fsa/LibriSpeech
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 1, 2024
Dataset authored and provided by
k2-fsa
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned. Acoustic models, trained on this data set, are available at icefall and language models, suitable for evaluation can be found at openslr. For more information, see the paper "LibriSpeech: an ASR corpus based on public domain audio… See the full description on the dataset page: https://huggingface.co/datasets/k2-fsa/LibriSpeech.

Facebook

Twitter

Click to copy link

Link copied

Cite

Frugal AI Challenge (2025). public-leaderboard-audio [Dataset]. https://huggingface.co/datasets/frugal-ai-challenge/public-leaderboard-audio

public-leaderboard-audio

frugal-ai-challenge/public-leaderboard-audio

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 31, 2025

Dataset authored and provided by

Frugal AI Challenge

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

frugal-ai-challenge/public-leaderboard-audio dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

public-leaderboard-audio

WaivOps RGTM-PNO: Open Audio Resources for Machine Learning in Music

Sound-Dr Dataset

Suno-Public-Playlist-small-audio

AI Training Dataset Market Report | Global Forecast From 2025 To 2033

AI Training Dataset Market Outlook

Data Type Analysis

SmartEars: A Practical Framework for Poultry Respiratory Monitoring via...

TAME Pain: Trustworthy AssessMEnt of Pain from Speech and Audio for the...

Sound and Audio Data in Mauritania

Techsalerator’s Location Sentiment Data for Mauritania

Top 5 Key Data Fields

Top 5 Location Sentiment Trends in Mauritania

Top 5 Applications of Location Sentiment Data in Mauritania

Accessing Techsalerator’s Location Sentiment Data

Included Data Fields

speech_commands

CrowdSpeech

English Off-the-Shelf Datasets

CAIMAN-ASR-BackgroundNoise

HUMAN4D: A Human-Centric Multimodal Dataset for Motions Immersive Media

Motiv: A Dataset of Latent Space Representations of Musical Phrase Motions

lex_fridman_podcast

Data AI Training Dataset Market Demand, Size and Competitive Analysis |...

Suno-Public-Playlist-small-video

sagaw_karen_asr

rohingya_asr_audio

LibriSpeech

public-leaderboard-audioSee More Versions

frugal-ai-challenge/public-leaderboard-audio

public-leaderboard-audio