26 datasets found
  1. F

    Indian English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Indian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Indian English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Indian accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Indian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Indian English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of India to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Indian English.
    Voice Assistants: Build smart assistants capable of understanding natural Indian conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  2. h

    Indian_English_Speech_Recognition_Corpus_Conversations

    • huggingface.co
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataocean AI (2024). Indian_English_Speech_Recognition_Corpus_Conversations [Dataset]. https://huggingface.co/datasets/DataoceanAI/Indian_English_Speech_Recognition_Corpus_Conversations
    Explore at:
    Dataset updated
    Sep 26, 2024
    Authors
    Dataocean AI
    Area covered
    India
    Description

    ID

    King-ASR-631

      Language
    

    English

      Duration
    

    200 hours

      Speakers
    

    200 People

      Parameters
    

    16kHz, 16bits

      Recording Device
    

    Mobile

      URL
    

    https://dataoceanai.com/datasets/asr/indian-english-speech-recognition-corpus-conversations-mobile/

  3. Indian Emotional Speech Corpora (IESC)

    • kaggle.com
    zip
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YB Singh (2022). Indian Emotional Speech Corpora (IESC) [Dataset]. https://www.kaggle.com/datasets/ybsingh/indian-emotional-speech-corpora-iesc
    Explore at:
    zip(137868059 bytes)Available download formats
    Dataset updated
    Jan 28, 2022
    Authors
    YB Singh
    Description

    This emotional speech database is created by 8 north Indian people (5 males and 3 females), which contains 600 emotional audio files and is named as Indian Emotional Speech Corpora Multimedia Tools and Applications (IESC). IESC database audio files are recorded in five emotions i.e. neutral, happy, angry, sad,and fearful. All the audio files are recorded by using a speech recorder app through a mobile phone in a closed room to avoid any other noises. Headphones are also used with a microphone to prevent sound leakage and for noise cancellation during the recording. All the recorded audio files are saved as .wav extension files. where each audio file is saved with the unique file name. The file name of each audio file consists of 4 alphanumeric parts for unique identifications, for example, “H-4-5-1.wav” where each part is defined as follows: & First part represents the emotions (A = angry, F = fear, H = happy, N = neutral, S = sad). & Second part shows the repetition (1 = 1st repetition, 2 = 2nd repetition and so on) & The third part represents the speaker (1 = 1st Speaker, 2 = 2nd Speaker, and so on) & And the last part represents the sentence (1 = “Kids are talking by the door”,2= “Dogs are sitting by the door”).

    For using This Dataset Kindly Cite this Paper:

    Singh, Y.B., Goel, S. A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora. Multimed Tools Appl 82, 23055–23073 (2023). https://doi.org/10.1007/s11042-023-14577-w

  4. F

    Indian English Retail Scripted Monologue Speech Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Retail Scripted Monologue Speech Dataset [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/retail-scripted-speech-monologues-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    India
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Indian English Scripted Monologue Speech Dataset for the Retail & E-commerce domain. This dataset is built to accelerate the development of English language speech technologies especially for use in retail-focused automatic speech recognition (ASR), natural language processing (NLP), voicebots, and conversational AI applications.

    Speech Data

    This training dataset includes 6,000+ high-quality scripted audio recordings in Indian English, created to reflect real-world scenarios in the Retail & E-commerce sector. These prompts are tailored to improve the accuracy and robustness of customer-facing speech technologies.

    Participant Diversity
    Speakers: 60 native English speakers from across India
    Geographic Coverage: Multiple India regions to ensure dialect and accent diversity
    Demographics: Participants aged 18 to 70, with a 60:40 male-to-female distribution
    Recording Details
    Nature of Recording: Scripted monologue-style speech prompts
    Duration: Each recording spans 5 to 30 seconds
    Audio Format: WAV format, mono channel, 16-bit depth, and 8kHz / 16kHz sample rates
    Environment: Recorded in quiet conditions, free from background noise and echo

    Topic Diversity

    This dataset includes a comprehensive set of retail-specific topics to ensure wide linguistic coverage for AI training:

    Customer Service Interactions
    Order Placement and Payment Processes
    Product and Service Inquiries
    Technical Support Queries
    General Information and Guidance
    Promotional and Sales Announcements
    Domain-Specific Service Statements

    Contextual Enrichment

    To increase training utility, prompts include contextual data such as:

    Region-Specific Names: Common India male and female names in diverse formats
    Addresses: Localized address variations spoken naturally
    Dates & Times: Realistic phrasing in delivery, promotions, and return policies
    Product References: Real-world product names, brands, and categories
    Numerical Data: Spoken numbers and prices used in transactions and offers
    Order IDs & Tracking Numbers: Common references in customer service calls

    These additions help your models learn to recognize structured and unstructured retail-related speech.

    Transcription

    Every audio file is paired with a verbatim transcription, ensuring consistency and alignment for model training.

    Content: Exact scripted prompts as spoken by the participant
    Format: Provided in plain text (.TXT) format with filenames matching the associated audio
    Quality Assurance: All transcripts are verified for accuracy by native English transcribers

    Metadata

    Detailed metadata is included to support filtering, analysis, and model evaluation:

    <span

  5. F

    Indian English General Domain Scripted Monologue Speech Data

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English General Domain Scripted Monologue Speech Data [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/general-scripted-speech-monologues-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    India
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Indian English Scripted Monologue Speech Dataset for the General Domain is a carefully curated resource designed to support the development of English language speech recognition systems. This dataset focuses on general-purpose conversational topics and is ideal for a wide range of AI applications requiring natural, domain-agnostic English speech data.

    Speech Data

    This dataset features over 6,000 high-quality scripted monologue recordings in Indian English. The prompts span diverse real-life topics commonly encountered in general conversations and are intended to help train robust and accurate speech-enabled technologies.

    Participant Diversity
    Speakers: 60 native Indian English speakers
    Regions: Broad regional coverage ensures diverse accents and dialects
    Demographics: Participants aged 18 to 70, with a 60:40 male-to-female ratio
    Recording Specifications
    Recording Type: Scripted monologues and prompt-based recordings
    Audio Duration: 5 to 30 seconds per file
    Format: WAV, mono channel, 16-bit, 8 kHz & 16 kHz sample rates
    Environment: Clean, noise-free conditions to ensure clarity and usability

    Topic Coverage

    The dataset covers a wide variety of general conversation scenarios, including:

    Daily Conversations
    Topic-Specific Discussions
    General Knowledge and Advice
    Idioms and Sayings

    Contextual Features

    To enhance authenticity, the prompts include:

    Names: Male and female names specific to different India regions
    Addresses: Commonly used address formats in daily Indian English speech
    Dates & Times: References used in general scheduling and time expressions
    Organization Names: Names of businesses, institutions, and other entities
    Numbers & Currencies: Mentions of quantities, prices, and monetary values

    Each prompt is designed to reflect everyday use cases, making it suitable for developing generalized NLP and ASR solutions.

    Transcription

    Every audio file in the dataset is accompanied by a verbatim text transcription, ensuring accurate training and evaluation of speech models.

    Content: Exact match to the spoken audio
    Format: Plain text (.TXT), named identically to the corresponding audio file
    Quality Control: All transcripts are validated by native English transcribers

    Metadata

    Rich metadata is included for detailed filtering and analysis:

    Speaker Metadata: Unique speaker ID, age, gender, region, and dialect
    Audio Metadata: Prompt transcript, recording setup, device specs, sample rate, bit depth, and format

    Applications & Use Cases

    This dataset can power a variety of English language AI technologies, including:

    Speech Recognition Training: ASR model development and fine-tuning

  6. F

    Indian English Call Center Data for Travel AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    India
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for English -speaking travelers.

    Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.

    Speech Data

    The dataset includes 30 hours of dual-channel audio recordings between native Indian English speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.

    Participant Diversity:
    Speakers: 60 native Indian English contributors from our verified pool.
    Regions: Covering multiple India provinces to capture accent and dialectal variation.
    Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).
    Recording Details:
    Conversation Nature: Naturally flowing, spontaneous customer-agent calls.
    Call Duration: Between 5 and 15 minutes per session.
    Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.
    Recording Environment: Captured in controlled, noise-free, echo-free settings.

    Topic Diversity

    Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).

    Inbound Calls:
    Booking Assistance
    Destination Information
    Flight Delays or Cancellations
    Support for Disabled Passengers
    Health and Safety Travel Inquiries
    Lost or Delayed Luggage, and more
    Outbound Calls:
    Promotional Travel Offers
    Customer Feedback Surveys
    Booking Confirmations
    Flight Rescheduling Alerts
    Visa Expiry Notifications, and others

    These scenarios help models understand and respond to diverse traveler needs in real-time.

    Transcription

    Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-Stamped Segments
    Non-speech Markers (e.g., pauses, coughs)
    High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.

    Metadata

    Extensive metadata enriches each call and speaker for better filtering and AI training:

    Participant Metadata: ID, age, gender, region, accent, and dialect.
    Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

    Usage and Applications

    This dataset is ideal for a variety of AI use cases in the travel and tourism space:

    ASR Systems: Train English speech-to-text engines for travel platforms.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left:

  7. F

    Indian English Call Center Data for Delivery & Logistics AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Delivery & Logistics AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/delivery-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.

    Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native Indian English speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.

    Participant Diversity:
    Speakers: 60 native Indian English speakers from our verified contributor pool.
    Regions: Multiple provinces of India for accent and dialect diversity.
    Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.
    Call Duration: 5 to 15 minutes on average.
    Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in clean, noise-free, echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.

    Inbound Calls:
    Order Tracking
    Delivery Complaints
    Undeliverable Addresses
    Return Process Enquiries
    Delivery Method Selection
    Order Modifications, and more
    Outbound Calls:
    Delivery Confirmations
    Subscription Offer Calls
    Incorrect Address Follow-ups
    Missed Delivery Notifications
    Delivery Feedback Surveys
    Out-of-Stock Alerts, and others

    This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.

    Transcription

    All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, noise)
    High transcription accuracy with word error rate under 5% via dual-layer quality checks.

    These transcriptions support fast, reliable model development for English voice AI applications in the delivery sector.

    Metadata

    Detailed metadata is included for each participant and conversation:

    Participant Metadata: ID, age, gender, region, accent, dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.

    This metadata aids in training specialized models, filtering demographics, and running advanced analytics.

    Usage and Applications

    <p

  8. Non-Native Children English Speech (NNCES) Corpus

    • kaggle.com
    zip
    Updated Oct 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kodali Radha (2022). Non-Native Children English Speech (NNCES) Corpus [Dataset]. https://www.kaggle.com/datasets/kodaliradha20phd7093/nonnative-children-english-speech-nnces-corpus/suggestions
    Explore at:
    zip(6687814398 bytes)Available download formats
    Dataset updated
    Oct 29, 2022
    Authors
    Kodali Radha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Non-native children’s English speech (NNCES) corpus: There were a total of 50 children, 25 females and 25 males, ranging in age from 8 to 12. All of the children are native speakers of Telugu, an Indian regional language, who are learning English as a second language. All of the audio clips were acquired in a .wav file using the open source SurveyLex platform, which supports dual channel at 44.1 kHz and a data rate of 16 bits per sample. Every questionnaire is conducted 10 times per child to assess the variation in words and sentences. The data was recorded for a total around 20 hours. It incorporates both read speech, with a total of 5000 utterances, and spontaneous speech, with a total of 5000 utterances with word level transcription.

  9. F

    Indian English Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native Indian English speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native Indian English speakers from our verified contributor community.
    Regions: Representing different provinces across India to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for English real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  10. F

    Indian English Call Center Data for Retail & E-Commerce AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Retail & E-Commerce AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native Indian English speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.

    Participant Diversity:
    Speakers: 60 native Indian English speakers from our verified contributor pool.
    Regions: Representing multiple provinces across India to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.

    Inbound Calls:
    Product Inquiries
    Order Cancellations
    Refund & Exchange Requests
    Subscription Queries, and more
    Outbound Calls:
    Order Confirmations
    Upselling & Promotions
    Account Updates
    Loyalty Program Offers
    Customer Verifications, and others

    Such variety enhances your model’s ability to generalize across retail-specific voice interactions.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    30 hours-coded Segments
    Non-speech Tags (e.g., pauses, cough)
    High transcription accuracy with word error rate < 5% due to double-layered quality checks.

    These transcriptions are production-ready, making model training faster and more accurate.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.

    Usage and Applications

    This dataset is ideal for a range of voice AI and NLP applications:

    Automatic Speech Recognition (ASR): Fine-tune English speech-to-text systems.
    <span

  11. AccentDB - Core & Extended

    • kaggle.com
    zip
    Updated Feb 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sparsh Gupta (2021). AccentDB - Core & Extended [Dataset]. https://www.kaggle.com/imsparsh/accentdb-core-extended
    Explore at:
    zip(6738893005 bytes)Available download formats
    Dataset updated
    Feb 17, 2021
    Authors
    Sparsh Gupta
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A Database of Non-Native English Accents to Assist Neural Speech Recognition

    Context

    AccentDB is a multi-pairwise parallel corpus of structured and labelled accented speech. It contains speech samples from speakers of 4 non-native accents of English (8 speakers, 4 Indian languages); and also has a compilation of 4 native accents of English (4 countries, 13 speakers) and a metropolitan Indian accent (2 speakers). The dataset available here corresponds to release titled accentdb_extended on

  12. EmoFilm - A multilingual emotional speech corpus

    • zenodo.org
    • data.europa.eu
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emilia Parada-Cabaleiro; Emilia Parada-Cabaleiro; Giovanni Costantini; Anton Batliner; Alice Baird; Bjoern Schuller; Giovanni Costantini; Anton Batliner; Alice Baird; Bjoern Schuller (2024). EmoFilm - A multilingual emotional speech corpus [Dataset]. http://doi.org/10.5281/zenodo.7665999
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Emilia Parada-Cabaleiro; Emilia Parada-Cabaleiro; Giovanni Costantini; Anton Batliner; Alice Baird; Bjoern Schuller; Giovanni Costantini; Anton Batliner; Alice Baird; Bjoern Schuller
    Description

    EmoFilm is a multilingual emotional speech corpus comprising 1115 audio instances produced in English, Italian, and Spanish languages. The audio clips (with a mean length of 3.5 sec. and std 1.2 sec.) were extracted in wave format (uncompressed, mono, 48 kHz sample rate and 16-bit) from 43 films (original in English and their over-dubbed Italian and Spanish versions). Genres including comedy, drama, horror, and thriller were considered; anger, contempt, happiness, fear, and sadness emotional states were taken into account. EmoFilm has been presented at Interspeech 2018:

    Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird, and Björn Schuller (2018), Categorical vs Dimensional Perception of Italian Emotional Speech, in Proc. of Interspeech, Hyderabad, India, pp. 3638-3642 .

    We would like to thank Linda Ratz for her contribution in the generation of the transcriptions.

    How to access EmoFilm

    To get access to the dataset, please send the signed End User License Agreement (EULA) when making the request. The EULA must be signed by somebody from a university holding a permanent position, typically a full professor. Note that requests without an EULA appropriately filled out, as well as those performed from a non-institutional e-mail address, will be automatically rejected. Please download the EULA from the following link:

    https://drive.google.com/file/d/1pFHfsqk7snF_EVqq0WAC0Dz8FcTD3s9_/view?usp=share_link

  13. F

    Indian English Call Center Data for Healthcare AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of English speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.

    Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.

    Speech Data

    The dataset features 30 Hours of dual-channel call center conversations between native Indian English speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.

    Participant Diversity:
    Speakers: 60 verified native Indian English speakers from our contributor community.
    Regions: Diverse provinces across India to ensure broad dialectal representation.
    Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.
    RecordingDetails:
    Conversation Nature: Naturally flowing, unscripted conversations.
    Call Duration: Each session ranges between 5 to 15 minutes.
    Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clear conditions without background noise or echo.

    Topic Diversity

    The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).

    Inbound Calls:
    Appointment Scheduling
    New Patient Registration
    Surgical Consultation
    Dietary Advice and Consultations
    Insurance Coverage Inquiries
    Follow-up Treatment Requests, and more
    OutboundCalls:
    Appointment Reminders
    Preventive Care Campaigns
    Test Results & Lab Reports
    Health Risk Assessment Calls
    Vaccination Updates
    Wellness Subscription Outreach, and more

    These real-world interactions help build speech models that understand healthcare domain nuances and user intent.

    Transcription

    Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.

    Transcription Includes:
    Speaker-identified Dialogues
    Time-coded Segments
    Non-speech Annotations (e.g., silence, cough)
    High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.

    Metadata

    Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.

    Participant Metadata: ID, gender, age, region, accent, and dialect.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    Usage and Applications

    This dataset can be used across a range of healthcare and voice AI use cases:

    <b

  14. F

    Indian English Scripted Monologue Speech Dataset for BFSI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Scripted Monologue Speech Dataset for BFSI [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/bfsi-scripted-speech-monologues-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    India
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Indian English Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced English speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.

    Speech Data

    This dataset includes over 6,000 scripted prompt recordings in Indian English, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.

    Participant Diversity
    Speakers: 60 native Indian English speakers.
    Regions: Diverse representation from various India provinces to ensure dialect and accent coverage.
    Demographics: Age range of 18–70, with a male-to-female ratio of 60:40.
    Recording Details
    Nature: Scripted monologues and domain-specific prompt recordings.Duration:
    Audio Format: WAV, mono channel, 16-bit depth, recorded at 8 kHz and 16 kHz sample rates.
    Environment: Clean, echo-free, and noise-free environments.

    Topic & Context Diversity

    This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:

    Customer service interactions
    Financial transactions & balance inquiries
    Banking and insurance product queries
    Loan & credit support
    Regulatory and compliance questions
    Technical help and password resets
    Promotional campaigns and service updates

    Contextual Elements

    To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:

    Names: Region-specific names in multiple formats
    Addresses: Local address structures and pronunciations
    Dates & Times: Typical time expressions used in banking
    Organization Names: Names of banks, financial firms, and institutions
    Currencies & Amounts: Spoken currency formats, prices, and numeric data
    IDs & Transaction Numbers: For authentic service simulation

    Transcription

    Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.

    Content: Exact match of each prompt
    Format: Clean .TXT files, mapped to audio file names
    Accuracy: Reviewed and validated by native Indian English linguists

    Metadata

    Each data point is enriched with detailed metadata for advanced training and analysis:

    Participant Metadata: Unique ID, age, gender, state, country, dialect
    Recording Metadata: Transcript, recording setup, sample rate, bit depth, device, file format

    Applications and Use Cases

    This BFSI-focused dataset is

  15. F

    Indian English Scripted Monologue Speech Data for Healthcare

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Scripted Monologue Speech Data for Healthcare [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/healthcare-scripted-speech-monologues-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Introducing the Indian English Scripted Monologue Speech Dataset for the Healthcare Domain, a voice dataset built to accelerate the development and deployment of English language automatic speech recognition (ASR) systems, with a sharp focus on real-world healthcare interactions.

    Speech Data

    This dataset includes over 6,000 high-quality scripted audio prompts recorded in Indian English, representing typical voice interactions found in the healthcare industry. The data is tailored for use in voice technology systems that power virtual assistants, patient-facing AI tools, and intelligent customer service platforms.

    Participant Diversity
    Speakers: 60 native Indian English speakers.
    Regional Balance: Participants are sourced from multiple regions across India, reflecting diverse dialects and linguistic traits.
    Demographics: Includes a mix of male and female participants (60:40 ratio), aged between 18 and 70 years.
    Recording Specifications
    Nature of Recordings: Scripted monologues based on healthcare-related use cases.
    Duration: Each clip ranges between 5 to 30 seconds, offering short, context-rich speech samples.
    Audio Format: WAV files recorded in mono, with 16-bit depth and sample rates of 8 kHz and 16 kHz.
    Environment: Clean and echo-free spaces ensure clear and noise-free audio capture.

    Topic Coverage

    The prompts span a broad range of healthcare-specific interactions, such as:

    Patient check-in and follow-up communication
    Appointment booking and cancellation dialogues
    Insurance and regulatory support queries
    Medication, test results, and consultation discussions
    General health tips and wellness advice
    Emergency and urgent care communication
    Technical support for patient portals and apps
    Domain-specific scripted statements and FAQs

    Contextual Depth

    To maximize authenticity, the prompts integrate linguistic elements and healthcare-specific terms such as:

    Names: Gender- and region-appropriate India names
    Addresses: Varied local address formats spoken naturally
    Dates & Times: References to appointment dates, times, follow-ups, and schedules
    Medical Terminology: Common medical procedures, symptoms, and treatment references
    Numbers & Measurements: Health data like dosages, vitals, and test result values
    Healthcare Institutions: Names of clinics, hospitals, and diagnostic centers

    These elements make the dataset exceptionally suited for training AI systems to understand and respond to natural healthcare-related speech patterns.

    Transcription

    Every audio recording is accompanied by a verbatim, manually verified transcription.

    Content: The transcription mirrors the exact scripted prompt recorded by the speaker.
    Format: Files are delivered in plain text (.TXT) format with consistent naming conventions for seamless integration.
    <b

  16. h

    MANGO

    • huggingface.co
    Updated May 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI4Bharat (2025). MANGO [Dataset]. https://huggingface.co/datasets/ai4bharat/MANGO
    Explore at:
    Dataset updated
    May 13, 2025
    Dataset authored and provided by
    AI4Bharat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MANGO: A Corpus of Human Ratings for Speech

    MANGO (MUSHRA Assessment corpus using Native listeners and Guidelines to understand human Opinions at scale) is the first large-scale dataset designed for evaluating Text-to-Speech (TTS) systems in Indian languages.

      Key Features:
    

    255,150 human ratings of TTS-generated outputs and ground-truth human speech. Covers two major Indian languages: Hindi & Tamil, and English. Based on the MUSHRA (Multiple Stimuli with Hidden Reference… See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/MANGO.

  17. F

    Indian English Scripted Monologue Speech Data for Telecom

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Scripted Monologue Speech Data for Telecom [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/telecom-scripted-speech-monologues-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    India
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Presenting the Indian English Scripted Monologue Speech Dataset for the Telecom Domain, a purpose-built dataset created to accelerate the development of English speech recognition and voice AI models specifically tailored for the telecommunications industry.

    Speech Data

    This dataset includes over 6,000 high-quality scripted prompt recordings in Indian English, representing real-world telecom customer service scenarios. It’s designed to support the training of speech-based AI systems used in call centers, virtual agents, and voice-powered support tools.

    Participant Diversity
    Speakers: 60 native Indian English speakers
    Geographic Distribution: Carefully selected from multiple regions across India to capture a wide spectrum of dialects and speaking styles
    Demographics: Balanced representation of males and females (60:40 ratio), aged between 18 to 70 years
    Recording Specifications
    Type: Scripted monologue prompts focused on telecom industry use cases
    Duration: Each audio clip ranges from 5 to 30 seconds
    Format: WAV files in mono, 16-bit depth, with sample rates of 8 kHz and 16 kHz
    Environment: Clean, echo-free, and noise-controlled settings to ensure optimal audio clarity

    Topic Coverage

    The dataset reflects a wide variety of common telecom customer interactions, including:

    Customer onboarding and service inquiries
    Billing and payment questions
    Data plans and product information
    Technical support requests
    Network coverage discussions
    Regulatory compliance and policy information
    Upgrades, renewals, and service plan changes
    Domain-specific scripted interactions tailored to real-world telecom use cases

    Contextual Depth

    To maximize contextual richness, prompts include:

    Localized Names: Common India names in various formats
    Addresses: Region-specific address structures for realism
    Dates & Times: Spoken date and time references in typical telecom scenarios (e.g., billing cycles, service activation times)
    Telecom Terminology: Keywords related to mobile data, network, SIM, devices, plans, etc.
    Numbers & Rates: Usage statistics, pricing info, recharge values, and billing figures
    Service Providers: References to telecom companies and third-party service entities

    Transcription

    Each audio file is paired with an accurate, verbatim transcription for precise model training:

    Content: Transcriptions are direct representations of each recorded prompt
    Format: Plain text (.TXT), with filenames matching their corresponding audio files
    Verification: Every transcription is manually verified by native Indian English linguists to ensure consistency and accuracy

    Metadata

    Detailed metadata is included to enhance

  18. F

    Indian English Scripted Monologue Speech Data in Real Estate

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Scripted Monologue Speech Data in Real Estate [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/realestate-scripted-speech-monologues-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Introducing the Indian English Scripted Monologue Speech Dataset for the Real Estate Domain, a dataset designed to support the development of English speech recognition and conversational AI technologies tailored for the real estate industry.

    Speech Data

    This dataset includes over 6,000 high-quality scripted prompt recordings in Indian English. The speech content reflects a wide range of real estate interactions to help build intelligent, domain-specific customer support systems and speech-enabled tools.

    Participant Diversity
    Speakers: 60 native English speakers from across India
    Regional Variation: Balanced representation of regional dialects and speaking styles
    Demographics: Ages 18–70, with a 60:40 male-to-female ratio
    Recording Specifications
    Type: Scripted monologue recordings
    Duration: 5–30 seconds per audio clip
    Audio Format: WAV, mono channel, 16-bit, sampled at 8 kHz and 16 kHz
    Recording Environment: Quiet, echo-free settings with no background noise

    Topic and Scenario Coverage

    This dataset captures a broad spectrum of use cases and conversational themes within the real estate sector, such as:

    Property inquiries and viewing appointments
    Price negotiations and financial discussions
    Contractual and legal clarifications
    Relocation coordination and service support
    Real estate agent interactions
    Regulatory information and buyer/seller advisory
    Domain-specific spoken statements and service dialogues

    Contextual Depth

    Each scripted prompt incorporates key elements to simulate realistic real estate conversations:

    Names: Culturally appropriate India names in various spoken formats
    Addresses: Detailed location references, including cities, districts, and street names
    Dates & Times: Contextual references to appointments, contract timelines, or move-in dates
    Property Descriptions: Features, measurements, and amenities of real estate listings
    Financial Details: Prices, rental amounts, down payments, deposits, and loan-related figures
    Legal Terms: Frequently used terms in property contracts and documentation

    Transcription

    To ensure precision in model training, each audio recording is paired with a verbatim text transcription:

    Content: Exact scripted text for each corresponding audio prompt
    Format: Plain text (.TXT) files named to match their associated audio recordings
    Quality Control: All transcriptions are manually reviewed by native Indian English linguists for consistency and correctness

    Metadata

    Each data sample is enriched with detailed metadata to enhance usability:

    Participant Metadata: <span

  19. O

    IndicTTS

    • opendatalab.com
    zip
    Updated Apr 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Indian Institute of Technology Madras (2023). IndicTTS [Dataset]. https://opendatalab.com/OpenDataLab/IndicTTS
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 27, 2023
    Dataset provided by
    Indian Institute of Technology Madras
    License

    Attribution-ShareAlike 2.0 (CC BY-SA 2.0)https://creativecommons.org/licenses/by-sa/2.0/
    License information was derived automatically

    Description

    A special corpus of Indian languages covering 13 major languages of India. It comprises of 10000+ spoken sentences/utterances each of mono and English recorded by both Male and Female native speakers. Speech waveform files are available in .wav format along with the corresponding text. We hope that these recordings will be useful for researchers and speech technologists working on synthesis and recognition. You can request zip archives of the entire database here.

  20. F

    Indian English Wake Words & Voice Commands Speech Data

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Wake Words & Voice Commands Speech Data [Dataset]. https://www.futurebeeai.com/dataset/wake-words-and-commands-dataset/wake-words-and-commands-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Indian English Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.

    Speech Data

    This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:

    Wake words alone
    Wake words followed by command phrases

    Participant Diversity

    Speakers: 50 native Indian English speakers from the FutureBeeAI community
    Regions: Participants from various India provinces, ensuring broad coverage of accents and dialects
    Demographics: Ages 18–70; 60% male and 40% female participants

    Recording Details

    Type: Scripted wake words and command phrases
    Duration: 1 to 15 seconds per clip
    Format: WAV, stereo, 16-bit, with sample rates ranging from 16 kHz to 48 kHz

    Dataset Diversity

    Wake Word Types
    Automobile Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Ok Ford, etc.
    Voice Assistant Wake Words: Hey Siri, Ok Google, Alexa, Hey Cortana, Hi Bixby, Hey Celia, etc.
    Home Appliance Wake Words: Hi LG, Ok LG, Hello Lloyd, and more
    Command Types by Use Case
    Automobile: Play music, check directions, voice search, provide feedback, and more
    Voice Assistant: Ask general questions, make calls, control devices, shopping, manage calendars, and more
    Home Appliances: Control appliances, check status, set reminders/alarms, manage shopping lists, etc.
    Recording Environments
    No background noise
    Background traffic noise
    People talking in the background
    Speaking Pace
    Normal speed
    Fast speed

    This diversity ensures robust training for real-world voice assistant applications.

    Metadata

    Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.

    Participant Metadata: Unique ID, age, gender, region, accent, dialect
    Recording Metadata: Transcript, environment, pace, device used, sample rate, bit depth, file format

    Use Cases & Applications

    Voice Assistant Activation: Train models to accurately detect and trigger based on wake words
    Smart Home Devices: Enable responsive voice control in smart appliances
    <b

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FutureBee AI (2022). Indian English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-india

Indian English General Conversation Speech Dataset for ASR

Indian English General Conversation Speech Corpus

Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by
FutureBeeAI
Description

Introduction

Welcome to the Indian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Indian English communication.

Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Indian accents and dialects.

Speech Data

The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Indian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

Participant Diversity:
Speakers: 60 verified native Indian English speakers from FutureBeeAI’s contributor community.
Regions: Representing various provinces of India to ensure dialectal diversity and demographic balance.
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
Recording Details:
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
Duration: Each conversation ranges from 15 to 60 minutes.
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity

The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

Sample Topics Include:
Family & Relationships
Food & Recipes
Education & Career
Healthcare Discussions
Social Issues
Technology & Gadgets
Travel & Local Culture
Shopping & Marketplace Experiences, and many more.

Transcription

Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

Transcription Highlights:
Speaker-segmented dialogues
Time-coded utterances
Non-speech elements (pauses, laughter, etc.)
High transcription accuracy, achieved through double QA pass, average WER < 5%

These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

Metadata

The dataset comes with granular metadata for both speakers and recordings:

Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

Usage and Applications

This dataset is a versatile resource for multiple English speech and language AI applications:

ASR Development: Train accurate speech-to-text systems for Indian English.
Voice Assistants: Build smart assistants capable of understanding natural Indian conversations.
<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

Search
Clear search
Close search
Google apps
Main menu