MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Artificial Analysis Big Bench Audio
Dataset Summary
Big Bench Audio is an audio version of a subset of Big Bench Hard questions. The dataset can be used for evaluating the reasoning capabilities of models that support audio input. The dataset includes 1000 audio recordings for all questions from the following Big Bench Hard categories. Descriptions are taken from Suzgun et al. (2022):
Formal Fallacies Syllogisms Negation (Formal Fallacies) - 250 questions Given a context… See the full description on the dataset page: https://huggingface.co/datasets/ArtificialAnalysis/big_bench_audio.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for Test-Audio of CMI-Bench
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following 🔗 Paper (arXiv) 🧪 Evaluation Toolkit 📊 License: CC BY-NC 4.0
Dataset Summary
The CMI-Bench/test-audio dataset provides the complete test split audio files used in the CMI-Bench benchmark. CMI-Bench evaluates the instruction-following capabilities of audio-text large language models (LLMs) on a wide range of Music Information Retrieval (MIR) tasks.… See the full description on the dataset page: https://huggingface.co/datasets/nicolaus625/CMI-bench.
ADU-Bench: Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models
If you use ADU-Bench in your project, please kindly cite: @articles{adubench2025, title={Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models}, author={Anonymous ACL submission}, journal={Under Review}, year={2025} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BLAB: Brutally Long Audio Bench
Dataset Summary
Brutally Long Audio Bench (BLAB) is a challenging long-form audio benchmark that evaluates audio LMs on localization, duration estimation, emotion, and counting tasks using audio segments averaging 51 minutes in length. BLAB consists of 833+ hours of diverse, full-length Youtube audio clips, each paired with human-annotated, text-based natural language questions and answers. Our audio data were collected from permissively… See the full description on the dataset page: https://huggingface.co/datasets/oreva/blab_long_audio.
WildSpeech-Bench: Benchmarking Audio LLMs in Natural Speech Conversation
🤗 Dataset | 🐙 GitHub 📖 Arxiv
This repository contains the evaluation code for the paper "WildSpeech-Bench: Benchmarking Audio LLMs in Natural Speech Conversation".
🔔 Introduction
WildSpeech-Bench is the first end-to-end, systematic benchmark for evaluating the capabilities of audio-to-audio speech dialogue models. The dataset is designed with three key features:
Realistic and Diverse… See the full description on the dataset page: https://huggingface.co/datasets/tencent/WildSpeech-Bench.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The bench-top psophometer market is experiencing robust growth, driven by increasing demand across various sectors. While precise figures for market size and CAGR were not provided, a reasonable estimation, considering the involvement of established players like Siemens and Keysight Technologies and the consistent need for precision noise measurement in industries like telecommunications and audio engineering, suggests a market size of approximately $250 million in 2025. Considering typical growth trends in specialized testing equipment markets, a conservative Compound Annual Growth Rate (CAGR) of 7% is estimated for the forecast period (2025-2033). This growth is fueled by several key drivers including the rising adoption of 5G networks necessitating stringent noise level testing, advancements in audio technology demanding high-fidelity measurements, and a growing focus on regulatory compliance for electromagnetic interference (EMI) and noise emissions. The market is segmented by application (telecommunications, audio testing, industrial quality control, etc.) and geography, with North America and Europe currently holding significant market share. However, emerging economies in Asia-Pacific are expected to witness rapid growth owing to increased infrastructure development and industrialization. The competitive landscape is characterized by the presence of both established industry giants and specialized manufacturers. Key players are focusing on product innovation, strategic partnerships, and expanding their global reach to maintain their market position. Future growth will depend on continuous technological advancements such as improved accuracy, enhanced functionality, and the integration of smart features. Factors like the high initial investment cost of these instruments and the potential for substitute technologies could pose challenges to market expansion. However, the long-term outlook for the bench-top psophometer market remains positive, reflecting the increasing importance of precise noise level measurements in various applications.
We introduce a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises 44 simple multi-instrument classical music pieces assembled from coordinated but separately recorded performances of individual tracks. For each piece, we provide the musical score in MIDI format, the audio recordings of the individual tracks, the audio and video recording of the assembled mixture, and ground- truth annotation files including frame-level and note-level tran- scriptions. We describe our methodology for the creation of the dataset, particularly highlighting our approaches for addressing the challenges involved in maintaining synchronization and ex- pressiveness. We demonstrate the high quality of synchronization achieved with our proposed approach by comparing the dataset against existing widely-used music audio datasets. We anticipate that the dataset will be useful for the devel- opment and evaluation of existing music information retrieval (MIR) tasks, as well as for novel multi-modal tasks. We bench- mark two existing MIR tasks (multi-pitch analysis and score- informed source separation) on the dataset and compare against other existing music audio datasets. Additionally, we consider two novel multi-modal MIR tasks (visually informed multi-pitch analysis and polyphonic vibrato analysis) enabled by the dataset and provide evaluation measures and baseline systems for future comparisons (from our recent work). Finally, we propose several emerging research directions that the dataset enables.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
MusicBench Dataset
The MusicBench dataset is a music audio-text pair dataset that was designed for text-to-music generation purpose and released along with Mustango text-to-music model. MusicBench is based on the MusicCaps dataset, which it expands from 5,521 samples to 52,768 training and 400 test samples!
Dataset Details
MusicBench expands MusicCaps by:
Including music features of chords, beats, tempo, and key that are extracted from the audio. Describing these music… See the full description on the dataset page: https://huggingface.co/datasets/amaai-lab/MusicBench.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SynSpeech Dataset (Small Version) is an English-language synthetic speech dataset created using OpenVoice and LibriSpeech-100 for bench-marking disentangled speech representation learning methods. It includes 50 unique speakers, each with 500 distinct sentences spoken in a “default” style at a 16kHz sampling rate. Data is organized by speaker ID, with a synspeech_Small_Metadata.csv
file detailing speaker information, gender, speaking style, text, and file paths. This dataset is ideal for tasks in representation learning, speaker and content factorization, and TTS synthesis.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
SAVVY-Bench
This repository contains SAVVY-Bench, the first benchmark for dynamic 3D spatial reasoning in audio-visual environments, introduced in SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing.
SAVVY-Bench Dataset
The benchmark dataset is also available on Hugging Face: from datasets import load_dataset dataset = load_dataset("ZijunCui/SAVVY-Bench")
This repository provides both the benchmark data and tools to… See the full description on the dataset page: https://huggingface.co/datasets/ZijunCui/SAVVY-Bench.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
TTA-Bench Dataset
🎯 Overview
Welcome to TTA-Bench! This repository contains our comprehensive evaluation framework for text-to-audio (TTA) systems. We've carefully curated 2,999 prompts across six different evaluation dimensions, creating a standardized benchmark for assessing text-to-audio generation capabilities.
📚 Dataset Structure
Each prompt in our dataset contains these essential fields:
id: Unique identifier for each prompt (format: prompt_XXXX)… See the full description on the dataset page: https://huggingface.co/datasets/Hui519/TTA-Bench.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
AIR-Bench
Arxiv: https://arxiv.org/html/2402.07729v1This is the AIR-Bench dataset download page.AIR-Bench encompasses two dimensions: foundation and chat benchmarks. The former consists of 19 tasks with approximately 19k single-choice questions. The latter one contains 2k instances of open-ended question-and-answer data.For how to run AIR-Bench, Please refer to AIR-Bench github page(https://github.com/OFA-Sys/AIR-Bench)(will be public soon).
Data Sources(All come from… See the full description on the dataset page: https://huggingface.co/datasets/qyang1021/AIR-Bench-Dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chinese Speech Emotional Understanding Benchmark (CSEU-Bench)
The benchmark aims to evaluate the ability of understanding psycho-linguistic emotion labels in Chinese speech. It contains Chinese speech audios with diverse syntactic structures, and 83 psycho-linguistic emotion entities as classification labels.
Github: https://github.com/qiuchili/CSEU-Bench
CSEU-Bench Components:
CSEU-Bench.csv: all speech samples CSEU-monosyllabic.csv: speech samples with… See the full description on the dataset page: https://huggingface.co/datasets/smart9/CSEU-Bench.
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
by Ioan-Paul Ciobanu, Andrei-Iulian Hiji, Nicolae-Catalin Ristea, Paul Irofti, Cristian Rusu, Radu Tudor Ionescu
License
The source code and models are released under the Creative Common Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
Reference
If you use this dataset or code in your research, please cite the corresponding paper:
Ioan-Paul Ciobanu… See the full description on the dataset page: https://huggingface.co/datasets/unibuc-cs/XMAD-Bench.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Official dataset for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?". 🌟 For more details, please refer to the project page with data examples: https://av-odyssey.github.io/. [🌐 Webpage] [📖 Paper] [🤗 Huggingface AV-Odyssey Dataset] [🤗 Huggingface Deaftest Dataset] [🏆 Leaderboard]
🔥 News
2024.11.24 🌟 We release AV-Odyssey, the first-ever comprehensive evaluation benchmark to explore whether MLLMs really understand audio-visual… See the full description on the dataset page: https://huggingface.co/datasets/AV-Odyssey/AV_Odyssey_Bench.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Artificial Analysis Big Bench Audio
Dataset Summary
Big Bench Audio is an audio version of a subset of Big Bench Hard questions. The dataset can be used for evaluating the reasoning capabilities of models that support audio input. The dataset includes 1000 audio recordings for all questions from the following Big Bench Hard categories. Descriptions are taken from Suzgun et al. (2022):
Formal Fallacies Syllogisms Negation (Formal Fallacies) - 250 questions Given a context… See the full description on the dataset page: https://huggingface.co/datasets/ArtificialAnalysis/big_bench_audio.