AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers and total 88035 utterances. Their auxiliary attributes such as gender, age group and native accents are explicitly marked and provided in the corpus. Accordingly, transcripts in Chinese character-level and pinyin-level are provided along with the recordings. The word & tone transcription accuracy rate is above 98%, through professional speech annotation and strict quality inspection for tone and prosody.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
AISHELL-3
Identifier: SLR93 Summary: Mandarin data, provided by Beijing Shell Shell Technology Co., Ltd.
Category: Speech
License: Apache License v.2.0
Downloads (use a mirror closer to you):
data_aishell3.tgz 19G Mirrors:
[US]
[EU]
[CN]
About this resource:AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus
published by Beijing Shell Shell Technology Co.,Ltd. It can be used to train… See the full description on the dataset page: https://huggingface.co/datasets/shenyunhang/AISHELL-3.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
shenberg1/aishell3 dataset hosted on Hugging Face and contributed by the HF Datasets community
MatrixStudio/AISHELL-3 dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
VoxBox
This dataset is a curated collection of bilingual speech corpora annotated clean transcriptions and rich metadata incluing age, gender, and emotion.
Dataset Structure
. ├── audios/ │ └── aishell-3/ # Audio files (organised by sub-corpus) │ └── ... └── metadata/ ├── aishell-3.jsonl ├── casia.jsonl ├── commonvoice_cn.jsonl ├── ... └── wenetspeech4tts.jsonl # JSONL metadata files
Each JSONL file corresponds to a… See the full description on the dataset page: https://huggingface.co/datasets/SparkAudio/voxbox.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers and total 88035 utterances. Their auxiliary attributes such as gender, age group and native accents are explicitly marked and provided in the corpus. Accordingly, transcripts in Chinese character-level and pinyin-level are provided along with the recordings. The word & tone transcription accuracy rate is above 98%, through professional speech annotation and strict quality inspection for tone and prosody.