80 datasets found
  1. F

    Mandarin General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mandarin General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-mandarin-china
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Mandarin Chinese General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Mandarin speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Mandarin Chinese communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Mandarin speech models that understand and respond to authentic Chinese accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Mandarin Chinese. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Mandarin Chinese speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of China to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Mandarin speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Mandarin Chinese.
    Voice Assistants: Build smart assistants capable of understanding natural Chinese conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  2. a

    Percent Chinese Speakers

    • hub.arcgis.com
    • gis-kingcounty.opendata.arcgis.com
    Updated Aug 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    King County (2016). Percent Chinese Speakers [Dataset]. https://hub.arcgis.com/datasets/294c704e32734e8ebcd030e535b841ce
    Explore at:
    Dataset updated
    Aug 10, 2016
    Dataset authored and provided by
    King County
    Area covered
    Description

    Languages:Percent Chinese Speakers: Basic demographics by census tracts in King County based on current American Community Survey 5 Year Average (ACS). Included demographics are: total population; foreign born; median household income; English language proficiency; languages spoken; race and ethnicity; sex; and age. Numbers and derived percentages are estimates based on the current year's ACS. GEO_ID_TRT is the key field and may be used to join to other demographic Census data tables.

  3. E

    Chinese Mandarin (North) database

    • catalog.elra.info
    • live.european-language-grid.eu
    Updated May 31, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2018). Chinese Mandarin (North) database [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-S0398/
    Explore at:
    Dataset updated
    May 31, 2018
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    This database contains the recordings of 500 Chinese Mandarin speakers from Northern China (250 males and 250 females), from 18 to 60 years’ old, recorded in quiet studios located in Shenzhen and in Hong Kong Special Administrative Region, People’s Republic of China. Demographics of native speakers from Northern China is as follows:- Beijing: 200 speakers (100 males, 100 females)- North of Beijing: 101 speakers (50 males, 51 females)- Shandong: 149 speakers (75 males, 74 females)- Henan: 50 speakers (25 males, 25 females)Speaker profile includes the following information: unique ID, place of birth, place where speaker lived the longest by the age of 16, and the number of years that the speaker lived there, age, gender, recording place.Recordings were made through microphone headsets (ATM73a / AUDIO TECHNICA) and consist of 172 hours of audio data (about 30 minutes per speaker), stored in .WAV files as sequences of 48 KHz Mono, 16 bits, Linear PCM. Recording script consists of :• Phoneme balance statement: 785 sentences• Travel conversation: 1618 sentences• About 200 sentences per speaker including: 134 sentences of travel conversation, 66 sentences of phoneme balance

  4. E

    Chinese Mandarin (South) database

    • catalog.elra.info
    • live.european-language-grid.eu
    Updated May 31, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2018). Chinese Mandarin (South) database [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-S0397/
    Explore at:
    Dataset updated
    May 31, 2018
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    This database contains the recordings of 1000 Chinese Mandarin speakers from Southern China (500 males and 500 females), from 18 to 60 years’ old, recorded in quiet studios located in Shenzhen and in Hong Kong Special Administrative Region, People’s Republic of China. Demographics of native speakers from Southern China is as follows:- Guangdong: 312 speakers (154 males, 158 females)- Fujian: 155 speakers (95 males, 60 females)- Jiangsu: 262 speakers (134 males, 128 females)- Zhejiang: 160 speakers (84 males, 76 females)- Taiwan: 105 speakers (31 males, 74 females)- Other-Southern: 6 speakers (2 males, 4 females)Speaker profile includes the following information: unique ID, place of birth, place where speaker lived the longest by the age of 16, and the number of years that the speaker lived there, age, gender, recording place.Recordings were made through microphone headsets (ATM73a / AUDIO TECHNICA) and consist of 341 hours of audio data (about 30 minutes per speaker), stored in .WAV files as sequences of 48 KHz Mono, 16 bits, Linear PCM. Recording script consists of :• Phoneme balance statement: 785 sentences• Travel conversation: 1618 sentences• About 200 sentences per speaker including: 134 sentences of travel conversation, 66 sentences of phoneme balance

  5. F

    Mandarin Call Center Data for Travel AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mandarin Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-mandarin-china
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Mandarin Chinese Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Mandarin -speaking travelers.

    Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.

    Speech Data

    The dataset includes 30 hours of dual-channel audio recordings between native Mandarin Chinese speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.

    Participant Diversity:
    Speakers: 60 native Mandarin Chinese contributors from our verified pool.
    Regions: Covering multiple China provinces to capture accent and dialectal variation.
    Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).
    Recording Details:
    Conversation Nature: Naturally flowing, spontaneous customer-agent calls.
    Call Duration: Between 5 and 15 minutes per session.
    Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.
    Recording Environment: Captured in controlled, noise-free, echo-free settings.

    Topic Diversity

    Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).

    Inbound Calls:
    Booking Assistance
    Destination Information
    Flight Delays or Cancellations
    Support for Disabled Passengers
    Health and Safety Travel Inquiries
    Lost or Delayed Luggage, and more
    Outbound Calls:
    Promotional Travel Offers
    Customer Feedback Surveys
    Booking Confirmations
    Flight Rescheduling Alerts
    Visa Expiry Notifications, and others

    These scenarios help models understand and respond to diverse traveler needs in real-time.

    Transcription

    Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-Stamped Segments
    Non-speech Markers (e.g., pauses, coughs)
    High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.

    Metadata

    Extensive metadata enriches each call and speaker for better filtering and AI training:

    Participant Metadata: ID, age, gender, region, accent, and dialect.
    Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

    Usage and Applications

    This dataset is ideal for a variety of AI use cases in the travel and tourism space:

    ASR Systems: Train Mandarin speech-to-text engines for travel platforms.
    <div style="margin-top:10px; margin-bottom: 10px;

  6. 98.9 Hours - Taiwanese Mandarin(China) Scripted Monologue Smartphone speech...

    • m.nexdata.ai
    Updated Nov 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 98.9 Hours - Taiwanese Mandarin(China) Scripted Monologue Smartphone speech dataset [Dataset]. https://m.nexdata.ai/datasets/speechrecog/63
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Taiwan
    Variables measured
    Format, Country, Speaker, Language, Accuracy rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
    Description

    Taiwanese Mandarin(China) Scripted Monologue Smartphone speech dataset, collected from monologue based on given prompts, covering economy, entertainment, news, oral language, numbers, alphabet and other domains. Transcribed with text content. Our dataset was collected from extensive and diversify speakers(204 native speakers), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  7. F

    Mandarin Call Center Data for Healthcare AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mandarin Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-mandarin-china
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Mandarin Chinese Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Mandarin speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.

    Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.

    Speech Data

    The dataset features 30 Hours of dual-channel call center conversations between native Mandarin Chinese speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.

    Participant Diversity:
    Speakers: 60 verified native Mandarin Chinese speakers from our contributor community.
    Regions: Diverse provinces across China to ensure broad dialectal representation.
    Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.
    RecordingDetails:
    Conversation Nature: Naturally flowing, unscripted conversations.
    Call Duration: Each session ranges between 5 to 15 minutes.
    Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clear conditions without background noise or echo.

    Topic Diversity

    The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).

    Inbound Calls:
    Appointment Scheduling
    New Patient Registration
    Surgical Consultation
    Dietary Advice and Consultations
    Insurance Coverage Inquiries
    Follow-up Treatment Requests, and more
    OutboundCalls:
    Appointment Reminders
    Preventive Care Campaigns
    Test Results & Lab Reports
    Health Risk Assessment Calls
    Vaccination Updates
    Wellness Subscription Outreach, and more

    These real-world interactions help build speech models that understand healthcare domain nuances and user intent.

    Transcription

    Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.

    Transcription Includes:
    Speaker-identified Dialogues
    Time-coded Segments
    Non-speech Annotations (e.g., silence, cough)
    High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.

    Metadata

    Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.

    Participant Metadata: ID, gender, age, region, accent, and dialect.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    Usage and Applications

    This dataset can be used across a range of healthcare and voice AI use cases:

  8. 1250 Hours - Taiwanese Accent Mandarin Spontaneous Speech Data

    • m.nexdata.ai
    Updated Jan 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 1250 Hours - Taiwanese Accent Mandarin Spontaneous Speech Data [Dataset]. https://m.nexdata.ai/datasets/speechrecog/1295?source=Github
    Explore at:
    Dataset updated
    Jan 26, 2024
    Dataset authored and provided by
    Nexdata
    Variables measured
    Format, Accuracy, Language, Data Content, Features of annotation
    Description

    Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  9. F

    Mandarin Call Center Data for Telecom AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mandarin Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-mandarin-china
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Mandarin Chinese Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Mandarin-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native Mandarin Chinese speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.

    Participant Diversity:
    Speakers: 60 native Mandarin Chinese speakers from our verified contributor pool.
    Regions: Representing multiple provinces across China to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.

    Inbound Calls:
    Phone Number Porting
    Network Connectivity Issues
    Billing and Payments
    Technical Support
    Service Activation
    International Roaming Enquiry
    Refund Requests and Billing Adjustments
    Emergency Service Access, and others
    Outbound Calls:
    Welcome Calls & Onboarding
    Payment Reminders
    Customer Satisfaction Surveys
    Technical Updates
    Service Usage Reviews
    Network Complaint Status Calls, and more

    This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, coughs)
    High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.

    These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  10. F

    Mandarin Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mandarin Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-mandarin-china
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Mandarin Chinese Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Mandarin -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native Mandarin Chinese speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native Mandarin Chinese speakers from our verified contributor community.
    Regions: Representing different provinces across China to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for Mandarin real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  11. 205 People - Mandarin Chinese(China) Noisy Monologue Smartphone speech...

    • m.nexdata.ai
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 205 People - Mandarin Chinese(China) Noisy Monologue Smartphone speech dataset_ Guiding [Dataset]. https://m.nexdata.ai/datasets/speechrecog/192?source=Github
    Explore at:
    Dataset updated
    Jan 3, 2024
    Dataset authored and provided by
    Nexdata
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
    Description

    Mandarin Chinese(China) Noisy Monologue Smartphone speech dataset_ Guiding, collected from monologue based on given prompts, covering generic domain, such as in-car, smart home, voice assistant, recorded in noisy condition. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(205 people), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  12. f

    Table_1_A Database of Chinese-English Bilingual Speakers: Ratings of the Age...

    • figshare.com
    • frontiersin.figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jue Wang; Baoguo Chen (2023). Table_1_A Database of Chinese-English Bilingual Speakers: Ratings of the Age of Acquisition and Familiarity.xlsx [Dataset]. http://doi.org/10.3389/fpsyg.2020.554785.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Jue Wang; Baoguo Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently, considerable attention has been given to the effect of the age of acquisition (AoA) on learning a second language (L2); however, the scarcity of L2 AoA ratings has limited advancements in this field. We presented the ratings of L2 AoA in late, unbalanced Chinese-English bilingual speakers and collected the familiarity of the L2 and the corresponding Chinese translations of English words. In addition, to promote the cross-language comparison and motivate the AoA research on Chinese two-character words, data on AoA, familiarity, and concreteness of the first language (L1) were also collected from Chinese native speakers. We first reported the reliability of each rated variable. Then, we described the validity by the following three steps: the distributions of each rated variable were described, the correlations between these variables were calculated, and regression analyses were run. The results showed that AoA, familiarity, and concreteness were all significant predictors of lexical decision times. The word database can be used by researchers who are interested in AoA, familiarity, and concreteness in both the L1 and L2 of late, unbalanced Chinese-English bilingual speakers. The full database is freely available for research purposes.

  13. m

    General conversation speech datasets in Chinese for Social Conversation

    • data.macgence.com
    mp3
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). General conversation speech datasets in Chinese for Social Conversation [Dataset]. https://data.macgence.com/dataset/general-conversation-speech-datasets-in-chinese-for-social-conversation
    Explore at:
    mp3Available download formats
    Dataset updated
    May 8, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    The audio dataset includes General Conversation, featuring Chinese speakers from China with detailed metadata.

  14. F

    Chinese Conversation Chat Dataset for Real Estate Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Chinese Conversation Chat Dataset for Real Estate Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/chinese-realestate-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The dataset comprises over 10,000 chat conversations, each focusing on specific Real Estate related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.

    Participants Details: 150+ native Chinese participants from the FutureBeeAI community.
    Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

    Topic Diversity

    The chat dataset covers a wide range of conversations on Real Estate topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Real Estate use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.

    Inbound Chats:
    Property Inquiry
    Rental Property Search & Availability
    Renovation Inquiries
    Property Features & Amenities Inquiry
    Investment Property Analysis & Advice
    Property History & Ownership Details, and many more
    Outbound Chats:
    New Property Listing Update
    Post Purchase Follow-ups
    Investment Opportunities & Property Recommendations
    Property Value Updates
    Customer Satisfaction Surveys, and many more

    Language Variety & Nuances

    The conversations in this dataset capture the diverse language styles and expressions prevalent in Chinese Real Estate interactions. This diversity ensures the dataset accurately represents the language used by Chinese speakers in Real Estate contexts.

    The dataset encompasses a wide array of language elements, including:

    Naming Conventions: Chats include a variety of Chinese personal and business names.
    Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different Chinese-speaking regions.
    Temporal and Numeric Expressions: Dates, times, currencies, and numbers in Chinese forms, adhering to local conventions.
    Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in Chinese Real Estate conversations.

    This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Chinese Real Estate interactions.

    Conversational Flow and Interaction Types

    The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Real Estate customer-agent interactions.

    Simple Inquiries
    Detailed Discussions
    Transactional Interactions
    Problem-Solving Dialogues
    Advisory Sessions
    Routine Checks and Follow-Ups

    Each of these conversations contains various aspects of conversation flow like:

    Greetings
    Authentication
    Information gathering
    Resolution identification
    Solution Delivery
    Closing and Follow-ups
    <span

  15. a

    CSSD

    • datasets.activeloop.ai
    deeplake
    Updated Mar 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peking University (2022). CSSD [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/cssd-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Mar 11, 2022
    Dataset authored and provided by
    Peking University
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2016 - Jan 1, 2020
    Area covered
    Dataset funded by
    National Natural Science Foundation of China
    Description

    The Chinese Speech Separation Dataset (CSSD) is a dataset of audio recordings of people speaking Mandarin Chinese in a variety of noisy environments. The dataset consists of 10,000 audio recordings, each of which is a mixture of two speakers. The dataset is split into a training set of 8,000 recordings and a test set of 2,000 recordings. The audio recordings are in .wav format and have a sampling rate of 16 kHz. The audio recordings are labeled with the identities of the two speakers in the mixture. The CSSD dataset is a valuable resource for training speech separation models.

  16. P

    CAIS Dataset

    • paperswithcode.com
    Updated Sep 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yijin Liu; Fandong Meng; Jinchao Zhang; Jie zhou; Yufeng Chen; Jinan Xu (2019). CAIS Dataset [Dataset]. https://paperswithcode.com/dataset/cais
    Explore at:
    Dataset updated
    Sep 15, 2019
    Authors
    Yijin Liu; Fandong Meng; Jinchao Zhang; Jie zhou; Yufeng Chen; Jinan Xu
    Description

    We collect utterances from the Chinese Artificial Intelligence Speakers (CAIS), and annotate them with slot tags and intent labels. The training, validation and test sets are split by the distribution of intents, where detailed statistics are provided in the supplementary material. Since the utterances are collected from speaker systems in the real world, intent labels are partial to the PlayMusic option. We adopt the BIOES tagging scheme for slots instead of the BIO2 used in the ATIS, since previous studies have highlighted meaningful improvements with this scheme (Ratinov and Roth, 2009) in the sequence labeling field

  17. F

    Chinese Conversation Chat Dataset for BFSI Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Chinese Conversation Chat Dataset for BFSI Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/chinese-bfsi-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The dataset comprises over 10,000 chat conversations, each focusing on specific BFSI-related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.

    Participants Details: 150+ native Chinese participants from the FutureBeeAI community.
    Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

    Topic Diversity

    The chat dataset covers a wide range of conversations on BFSI topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various BFSI use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.

    Inbound Chats:
    Account Opening
    Account Management
    Transactions
    Loan Inquiries & Applications
    Credit Card Services, and many more
    Outbound Chats:
    Product & Service Promotions
    Cross-selling & Upselling
    Customer Retention & Loyalty Programs
    Loan Application Follow-ups
    Insurance Policy Renewals/Reminders, and many more

    Language Variety & Nuances

    The conversations in this dataset capture the diverse language styles and expressions prevalent in Chinese BFSI interactions. This diversity ensures the dataset accurately represents the language used by Chinese speakers in BFSI contexts.

    The dataset encompasses a wide array of language elements, including:

    Naming Conventions: Chats include a variety of Chinese personal and business names.
    Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different Chinese-speaking regions.
    Temporal and Numeric Expressions: Dates, times, currencies, and numbers in Chinese forms, adhering to local conventions.
    Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in Chinese BFSI conversations.

    This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Chinese BFSI interactions.

    Conversational Flow and Interaction Types

    The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of BFSI customer-agent interactions.

    Simple Inquiries
    Detailed Discussions
    Transactional Interactions
    Problem-Solving Dialogues
    Advisory Sessions
    Routine Checks and Follow-Ups

    Each of these conversations contains various aspects of conversation flow like:

    Greetings
    Authentication
    Information gathering
    Resolution identification
    Solution Delivery
    Closing and Follow-ups
    Feedback, etc

    This structured and varied conversational flow enables the creation of advanced NLP models that can effectively manage and respond to a wide range of customer service scenarios.

    Data Format and Structure

    <p

  18. speech_bundle

    • kaggle.com
    Updated Mar 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    _ChrisQ_ (2022). speech_bundle [Dataset]. https://www.kaggle.com/datasets/chrisqiu/speech-bundle/suggestions?status=pending
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    _ChrisQ_
    Description

    Speech Bundle

    VoxCeleb is a very popular datasets for speaker verification.
    CnCeleb is similar to VoxCeleb, but mainly contains mandarin speakers. These datasets are too large to launch quick experiments, therefore, I create a sampled version of them.
    This dataset contains 100 speakers from voxceleb and 100 speakers from cnceleb, which is enough to test out new thoughts.
    Specifically, the voxceleb part of this dataset is sampled from the dev set of the VoxCeleb2, while the cnceleb is directly sampled from the CnCeleb2.

    References

    [1] Chung, J.S., Nagrani, A. and Zisserman, A., 2018. VoxCceleb2: Deep speaker recognition. arXiv preprint arXiv:1806.05622. [2] Fan, Y., Kang, J.W., Li, L.T., Li, K.C., Chen, H.L., Cheng, S.T., Zhang, P.Y., Zhou, Z.Y., Cai, Y.Q. and Wang, D., 2020, May. Cn-celeb: a challenging chinese speaker recognition dataset. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7604-7608). IEEE.

  19. F

    Chinese Conversation Chat Dataset for Travel Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Chinese Conversation Chat Dataset for Travel Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/chinese-travel-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The dataset comprises over 10,000 chat conversations, each focusing on specific Travel related topics. Each conversation provides a detailed interaction between a call center agent and a customer, capturing real-life scenarios and language nuances.

    Participants Details: 150+ native Chinese participants from the FutureBeeAI community.
    Word Count & Length: Chats are diverse, averaging 300 to 700 words and 50 to 150 turns across both speakers.

    Topic Diversity

    The chat dataset covers a wide range of conversations on Travel topics, ensuring that the dataset is comprehensive and relevant for training and fine-tuning models for various Travel use cases. It offers diversity in terms of conversation topics, chat types, and outcomes, including both inbound and outbound chats with positive, neutral, and negative outcomes.

    Inbound Calls:
    Booking Inquiries & Assistance
    Destination Information & Recommendations
    Flight Delays or Cancellation Assistance
    Assistance for Disable Passengers
    Travel-related Health & Safety Inquiry
    Lost or Delayed Baggage Assistance, and many more
    Outbound Calls:
    Promotional Offers & Package Deals
    Customer Satisfaction Surveys
    Booking Confirmations & Updates
    Flight Schedule Changes & Notifications
    Customer Feedback Collection
    Visa Expiration Reminders, and many more

    Language Variety & Nuances

    The conversations in this dataset capture the diverse language styles and expressions prevalent in Chinese Travel interactions. This diversity ensures the dataset accurately represents the language used by Chinese speakers in Travel contexts.

    The dataset encompasses a wide array of language elements, including:

    Naming Conventions: Chats include a variety of Chinese personal and business names.
    Localized Details: Real-world addresses, emails, phone numbers, and other contact information as according to different Chinese-speaking regions.
    Temporal and Numeric Expressions: Dates, times, currencies, and numbers in Chinese forms, adhering to local conventions.
    Idiomatic Expressions and Slang: It includes local slang, idioms, and informal phrase present in Chinese Travel conversations.

    This linguistic authenticity ensures that the dataset equips researchers and developers with a comprehensive understanding of the intricate language patterns, cultural references, and communication styles inherent to Chinese Travel interactions.

    Conversational Flow and Interaction Types

    The dataset includes a broad range of conversations, from simple inquiries to detailed discussions, capturing the dynamic nature of Travel customer-agent interactions.

    Simple Inquiries
    Detailed Discussions
    Transactional Interactions
    Problem-Solving Dialogues
    Advisory Sessions
    Routine Checks and Follow-Ups

    Each of these conversations contains various aspects of conversation flow like:

    Greetings
    Authentication
    Information gathering
    Resolution identification
    Solution Delivery
    <span

  20. S

    Chinese Natural Speech Complex Emotion Dataset

    • scidb.cn
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaolong Wu; Mingxing Xu; Askar Hamdulla; Thomas Fang Zheng (2025). Chinese Natural Speech Complex Emotion Dataset [Dataset]. http://doi.org/10.57760/sciencedb.20968
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Xiaolong Wu; Mingxing Xu; Askar Hamdulla; Thomas Fang Zheng
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Although Chinese speech affective computing has received increasing attention, existing datasets still have defects such as lack of naturalness, single pronunciation style, and unreliable annotation, which seriously hinder the research in this field. To address these issues, this paper introduces the first Chinese Natural Speech Complex Emotion Dataset (CNSCED) to provide natural data resources for Chinese speech affective computing. CNSCED was collected from publicly broadcasted civil dispute and interview television programs in China, reflecting the authentic emotional characteristics of Chinese people in daily life. The dataset includes 14 hours of speech data from 454 speakers of various ages, totaling 15777 samples. Based on the inherent complexity and ambiguity of natural emotions, this paper proposes an emotion vector annotation method. This method utilizes a vector composed of six meta-emotional dimensions (angry, sad, aroused, happy, surprise, and fear) of different intensities to describe any single or complex emotion. The CNSCED released two subtasks: complex emotion classification and complex emotion intensity regression. In the experimental section, we evaluated the CNSCED dataset using deep neural network models and provided a baseline result. To the best of our knowledge, CNSCED is the first public Chinese natural speech complex emotion dataset, which can be used for scientific research free of charge.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FutureBee AI (2022). Mandarin General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-mandarin-china

Mandarin General Conversation Speech Dataset for ASR

Mandarin General Conversation Speech Corpus

Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by
FutureBeeAI
Description

Introduction

Welcome to the Mandarin Chinese General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Mandarin speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Mandarin Chinese communication.

Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Mandarin speech models that understand and respond to authentic Chinese accents and dialects.

Speech Data

The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Mandarin Chinese. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

Participant Diversity:
Speakers: 60 verified native Mandarin Chinese speakers from FutureBeeAI’s contributor community.
Regions: Representing various provinces of China to ensure dialectal diversity and demographic balance.
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
Recording Details:
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
Duration: Each conversation ranges from 15 to 60 minutes.
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity

The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

Sample Topics Include:
Family & Relationships
Food & Recipes
Education & Career
Healthcare Discussions
Social Issues
Technology & Gadgets
Travel & Local Culture
Shopping & Marketplace Experiences, and many more.

Transcription

Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

Transcription Highlights:
Speaker-segmented dialogues
Time-coded utterances
Non-speech elements (pauses, laughter, etc.)
High transcription accuracy, achieved through double QA pass, average WER < 5%

These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

Metadata

The dataset comes with granular metadata for both speakers and recordings:

Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

Usage and Applications

This dataset is a versatile resource for multiple Mandarin speech and language AI applications:

ASR Development: Train accurate speech-to-text systems for Mandarin Chinese.
Voice Assistants: Build smart assistants capable of understanding natural Chinese conversations.
<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

Search
Clear search
Close search
Google apps
Main menu