5 datasets found
  1. P

    WenetSpeech Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BinBin Zhang; Hang Lv; Pengcheng Guo; Qijie Shao; Chao Yang; Lei Xie; Xin Xu; Hui Bu; Xiaoyu Chen; Chenchen Zeng; Di wu; Zhendong Peng (2025). WenetSpeech Dataset [Dataset]. https://paperswithcode.com/dataset/wenetspeech
    Explore at:
    Dataset updated
    Jan 23, 2025
    Authors
    BinBin Zhang; Hang Lv; Pengcheng Guo; Qijie Shao; Chao Yang; Lei Xie; Xin Xu; Hui Bu; Xiaoyu Chen; Chenchen Zeng; Di wu; Zhendong Peng
    Description

    WenetSpeech is a multi-domain Mandarin corpus consisting of 10,000+ hours high-quality labeled speech, 2,400+ hours weakly labelled speech, and about 10,000 hours unlabeled speech, with 22,400+ hours in total. The authors collected the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition (OCR) based method is introduced to generate the audio/text segmentation candidates for the YouTube data on its corresponding video captions.

  2. h

    WenetSpeech

    • huggingface.co
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jingfei (2024). WenetSpeech [Dataset]. https://huggingface.co/datasets/foreveronly12/WenetSpeech
    Explore at:
    Dataset updated
    Jun 11, 2024
    Authors
    jingfei
    Description

    foreveronly12/WenetSpeech dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    WenetSpeech

    • huggingface.co
    Updated Nov 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WenetSpeech [Dataset]. https://huggingface.co/datasets/TwinkStart/WenetSpeech
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 21, 2024
    Authors
    Shi Qundong
    Description

    This dataset only contains test data, which is integrated into UltraEval-Audio(https://github.com/OpenBMB/UltraEval-Audio) framework.

    python audio_evals/main.py --dataset WenetSpeech-test-meeting --model gpt4o_audio

    python audio_evals/main.py --dataset WenetSpeech-test-net --model gpt4o_audio

      🚀超凡体验,尽在UltraEval-Audio🚀
    

    UltraEval-Audio——全球首个同时支持语音理解和语音生成评估的开源框架,专为语音大模型评估打造,集合了34项权威Benchmark,覆盖语音、声音、医疗及音乐四大领域,支持十种语言,涵盖十二类任务。选择UltraEval-Audio,您将体验到前所未有的便捷与高效:

    一键式基准管理… See the full description on the dataset page: https://huggingface.co/datasets/TwinkStart/WenetSpeech.

  4. h

    wenetspeech-subset-S

    • huggingface.co
    Updated Jun 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pengyizhou (2025). wenetspeech-subset-S [Dataset]. https://huggingface.co/datasets/pengyizhou/wenetspeech-subset-S
    Explore at:
    Dataset updated
    Jun 8, 2025
    Authors
    pengyizhou
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    pengyizhou/wenetspeech-subset-S dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. S

    The acoustic feature dataset of WD patients and healthy individuals

    • scidb.cn
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenglin Zhang (2024). The acoustic feature dataset of WD patients and healthy individuals [Dataset]. http://doi.org/10.57760/sciencedb.11299
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Zhenglin Zhang
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The study uses a state-of-the-art speech embedding method for WD detection in unstructured connected speech (UCS), combining bi-directional semantic dependencies and attentional mechanisms.The feature data file contains 110 native Mandarin-speaking participants, including 55 WD patients and 55 sex-matched healthy individuals. Four columns of data are labels (0 for healthy individuals and 1 for WD patients), ComParE feature set, Wav2vec 2.0, and HuBERT embedded feature set.To obtain frame-level speech representations that can be compared and fused with embedding approaches, we use only the LLDs of ComParE (the current latest 2016 version), which contains 65-dimensional features per time step, and configure the window length and the step length to 30 ms and 20 ms, respectively. The final ComParE feature shape of each participant's 60s audio is 2999 × 65.For adapting to native speech data, we extract embeddings based on pre-trained models w2v2 and HuBERT fine-tuned on 10,000 hours of Chinese speech data from WenetSpeech, respectively. Furthermore, considering the computational resources and time cost, we choose to use the base version of the pre-trained models, i.e., the final 768-dimensional hidden layer, as the embedding representation of the audio. The last hidden state in the model serves as the embedding representation with a shape of 2999 × 768 for an audio sample.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BinBin Zhang; Hang Lv; Pengcheng Guo; Qijie Shao; Chao Yang; Lei Xie; Xin Xu; Hui Bu; Xiaoyu Chen; Chenchen Zeng; Di wu; Zhendong Peng (2025). WenetSpeech Dataset [Dataset]. https://paperswithcode.com/dataset/wenetspeech

WenetSpeech Dataset

Explore at:
236 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 23, 2025
Authors
BinBin Zhang; Hang Lv; Pengcheng Guo; Qijie Shao; Chao Yang; Lei Xie; Xin Xu; Hui Bu; Xiaoyu Chen; Chenchen Zeng; Di wu; Zhendong Peng
Description

WenetSpeech is a multi-domain Mandarin corpus consisting of 10,000+ hours high-quality labeled speech, 2,400+ hours weakly labelled speech, and about 10,000 hours unlabeled speech, with 22,400+ hours in total. The authors collected the data from YouTube and Podcast, which covers a variety of speaking styles, scenarios, domains, topics, and noisy conditions. An optical character recognition (OCR) based method is introduced to generate the audio/text segmentation candidates for the YouTube data on its corresponding video captions.

Search
Clear search
Close search
Google apps
Main menu