5 datasets found

P
WenetSpeech Dataset
paperswithcode.com
opendatalab.com
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BinBin Zhang; Hang Lv; Pengcheng Guo; Qijie Shao; Chao Yang; Lei Xie; Xin Xu; Hui Bu; Xiaoyu Chen; Chenchen Zeng; Di wu; Zhendong Peng (2025). WenetSpeech Dataset [Dataset]. https://paperswithcode.com/dataset/wenetspeech
Explore at:
Dataset updated
Jan 23, 2025
Authors
BinBin Zhang; Hang Lv; Pengcheng Guo; Qijie Shao; Chao Yang; Lei Xie; Xin Xu; Hui Bu; Xiaoyu Chen; Chenchen Zeng; Di wu; Zhendong Peng
Description
WenetSpeech is a multi-domain Mandarin corpus consisting of 10,000+ hours high-quality labeled speech, 2,400+ hours weakly labelled speech, and about 10,000 hours unlabeled speech, with 22,400+ hours in total. The authors collected the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition (OCR) based method is introduced to generate the audio/text segmentation candidates for the YouTube data on its corresponding video captions.
h
WenetSpeech
huggingface.co
Updated Jun 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jingfei (2024). WenetSpeech [Dataset]. https://huggingface.co/datasets/foreveronly12/WenetSpeech
Explore at:
Dataset updated
Jun 11, 2024
Authors
jingfei
Description
foreveronly12/WenetSpeech dataset hosted on Hugging Face and contributed by the HF Datasets community
h
WenetSpeech
huggingface.co
Updated Nov 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WenetSpeech [Dataset]. https://huggingface.co/datasets/TwinkStart/WenetSpeech
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 21, 2024
Authors
Shi Qundong
Description
This dataset only contains test data, which is integrated into UltraEval-Audio(https://github.com/OpenBMB/UltraEval-Audio) framework.

python audio_evals/main.py --dataset WenetSpeech-test-meeting --model gpt4o_audio

python audio_evals/main.py --dataset WenetSpeech-test-net --model gpt4o_audio

🚀超凡体验，尽在UltraEval-Audio🚀

UltraEval-Audio——全球首个同时支持语音理解和语音生成评估的开源框架，专为语音大模型评估打造，集合了34项权威Benchmark，覆盖语音、声音、医疗及音乐四大领域，支持十种语言，涵盖十二类任务。选择UltraEval-Audio，您将体验到前所未有的便捷与高效：

一键式基准管理… See the full description on the dataset page: https://huggingface.co/datasets/TwinkStart/WenetSpeech.
h
wenetspeech-subset-S
huggingface.co
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pengyizhou (2025). wenetspeech-subset-S [Dataset]. https://huggingface.co/datasets/pengyizhou/wenetspeech-subset-S
Explore at:
Dataset updated
Jun 8, 2025
Authors
pengyizhou
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
pengyizhou/wenetspeech-subset-S dataset hosted on Hugging Face and contributed by the HF Datasets community
S
The acoustic feature dataset of WD patients and healthy individuals
scidb.cn
Updated Mar 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenglin Zhang (2024). The acoustic feature dataset of WD patients and healthy individuals [Dataset]. http://doi.org/10.57760/sciencedb.11299
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.11299
Dataset updated
Mar 15, 2024
Dataset provided by
Science Data Bank
Authors
Zhenglin Zhang
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
The study uses a state-of-the-art speech embedding method for WD detection in unstructured connected speech (UCS), combining bi-directional semantic dependencies and attentional mechanisms.The feature data file contains 110 native Mandarin-speaking participants, including 55 WD patients and 55 sex-matched healthy individuals. Four columns of data are labels (0 for healthy individuals and 1 for WD patients), ComParE feature set, Wav2vec 2.0, and HuBERT embedded feature set.To obtain frame-level speech representations that can be compared and fused with embedding approaches, we use only the LLDs of ComParE (the current latest 2016 version), which contains 65-dimensional features per time step, and configure the window length and the step length to 30 ms and 20 ms, respectively. The final ComParE feature shape of each participant's 60s audio is 2999 × 65.For adapting to native speech data, we extract embeddings based on pre-trained models w2v2 and HuBERT fine-tuned on 10,000 hours of Chinese speech data from WenetSpeech, respectively. Furthermore, considering the computational resources and time cost, we choose to use the base version of the pre-trained models, i.e., the final 768-dimensional hidden layer, as the embedding representation of the audio. The last hidden state in the model serves as the embedding representation with a shape of 2999 × 768 for an audio sample.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

BinBin Zhang; Hang Lv; Pengcheng Guo; Qijie Shao; Chao Yang; Lei Xie; Xin Xu; Hui Bu; Xiaoyu Chen; Chenchen Zeng; Di wu; Zhendong Peng (2025). WenetSpeech Dataset [Dataset]. https://paperswithcode.com/dataset/wenetspeech

WenetSpeech Dataset

Explore at:

236 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jan 23, 2025

Authors

BinBin Zhang; Hang Lv; Pengcheng Guo; Qijie Shao; Chao Yang; Lei Xie; Xin Xu; Hui Bu; Xiaoyu Chen; Chenchen Zeng; Di wu; Zhendong Peng

Description

WenetSpeech is a multi-domain Mandarin corpus consisting of 10,000+ hours high-quality labeled speech, 2,400+ hours weakly labelled speech, and about 10,000 hours unlabeled speech, with 22,400+ hours in total. The authors collected the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition (OCR) based method is introduced to generate the audio/text segmentation candidates for the YouTube data on its corresponding video captions.

Clear search

Close search

Google apps

Main menu

WenetSpeech Dataset

WenetSpeech

WenetSpeech

wenetspeech-subset-S

The acoustic feature dataset of WD patients and healthy individuals

WenetSpeech Dataset