Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Indian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Indian English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Indian accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Indian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple English speech and language AI applications:
Facebook
TwitterID
King-ASR-631
Language
English
Duration
200 hours
Speakers
200 People
Parameters
16kHz, 16bits
Recording Device
Mobile
URL
https://dataoceanai.com/datasets/asr/indian-english-speech-recognition-corpus-conversations-mobile/
Facebook
TwitterThis emotional speech database is created by 8 north Indian people (5 males and 3 females), which contains 600 emotional audio files and is named as Indian Emotional Speech Corpora Multimedia Tools and Applications (IESC). IESC database audio files are recorded in five emotions i.e. neutral, happy, angry, sad,and fearful. All the audio files are recorded by using a speech recorder app through a mobile phone in a closed room to avoid any other noises. Headphones are also used with a microphone to prevent sound leakage and for noise cancellation during the recording. All the recorded audio files are saved as .wav extension files. where each audio file is saved with the unique file name. The file name of each audio file consists of 4 alphanumeric parts for unique identifications, for example, “H-4-5-1.wav” where each part is defined as follows: & First part represents the emotions (A = angry, F = fear, H = happy, N = neutral, S = sad). & Second part shows the repetition (1 = 1st repetition, 2 = 2nd repetition and so on) & The third part represents the speaker (1 = 1st Speaker, 2 = 2nd Speaker, and so on) & And the last part represents the sentence (1 = “Kids are talking by the door”,2= “Dogs are sitting by the door”).
For using This Dataset Kindly Cite this Paper:
Singh, Y.B., Goel, S. A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora. Multimed Tools Appl 82, 23055–23073 (2023). https://doi.org/10.1007/s11042-023-14577-w
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Indian English Scripted Monologue Speech Dataset for the Retail & E-commerce domain. This dataset is built to accelerate the development of English language speech technologies especially for use in retail-focused automatic speech recognition (ASR), natural language processing (NLP), voicebots, and conversational AI applications.
This training dataset includes 6,000+ high-quality scripted audio recordings in Indian English, created to reflect real-world scenarios in the Retail & E-commerce sector. These prompts are tailored to improve the accuracy and robustness of customer-facing speech technologies.
This dataset includes a comprehensive set of retail-specific topics to ensure wide linguistic coverage for AI training:
To increase training utility, prompts include contextual data such as:
These additions help your models learn to recognize structured and unstructured retail-related speech.
Every audio file is paired with a verbatim transcription, ensuring consistency and alignment for model training.
Detailed metadata is included to support filtering, analysis, and model evaluation:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Indian English Scripted Monologue Speech Dataset for the General Domain is a carefully curated resource designed to support the development of English language speech recognition systems. This dataset focuses on general-purpose conversational topics and is ideal for a wide range of AI applications requiring natural, domain-agnostic English speech data.
This dataset features over 6,000 high-quality scripted monologue recordings in Indian English. The prompts span diverse real-life topics commonly encountered in general conversations and are intended to help train robust and accurate speech-enabled technologies.
The dataset covers a wide variety of general conversation scenarios, including:
To enhance authenticity, the prompts include:
Each prompt is designed to reflect everyday use cases, making it suitable for developing generalized NLP and ASR solutions.
Every audio file in the dataset is accompanied by a verbatim text transcription, ensuring accurate training and evaluation of speech models.
Rich metadata is included for detailed filtering and analysis:
This dataset can power a variety of English language AI technologies, including:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Indian English Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for English -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 30 hours of dual-channel audio recordings between native Indian English speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Indian English Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.
Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.
The dataset contains 30 hours of dual-channel call center recordings between native Indian English speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.
This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.
This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.
All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.
These transcriptions support fast, reliable model development for English voice AI applications in the delivery sector.
Detailed metadata is included for each participant and conversation:
This metadata aids in training specialized models, filtering demographics, and running advanced analytics.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Non-native children’s English speech (NNCES) corpus: There were a total of 50 children, 25 females and 25 males, ranging in age from 8 to 12. All of the children are native speakers of Telugu, an Indian regional language, who are learning English as a second language. All of the audio clips were acquired in a .wav file using the open source SurveyLex platform, which supports dual channel at 44.1 kHz and a data rate of 16 bits per sample. Every questionnaire is conducted 10 times per child to assess the variation in words and sentences. The data was recorded for a total around 20 hours. It incorporates both read speech, with a total of 5000 utterances, and spontaneous speech, with a total of 5000 utterances with word level transcription.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Indian English Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.
Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.
The dataset features 30 hours of dual-channel call center recordings between native Indian English speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.
This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.
Such domain-rich variety ensures model generalization across common real estate support conversations.
All recordings are accompanied by precise, manually verified transcriptions in JSON format.
These transcriptions streamline ASR and NLP development for English real estate voice applications.
Detailed metadata accompanies each participant and conversation:
This enables smart filtering, dialect-focused model training, and structured dataset exploration.
This dataset is ideal for voice AI and NLP systems built for the real estate sector:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Indian English Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.
The dataset contains 30 hours of dual-channel call center recordings between native Indian English speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.
Such variety enhances your model’s ability to generalize across retail-specific voice interactions.
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
These transcriptions are production-ready, making model training faster and more accurate.
Rich metadata is available for each participant and conversation:
This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.
This dataset is ideal for a range of voice AI and NLP applications:
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
AccentDB is a multi-pairwise parallel corpus of structured and labelled accented speech. It contains speech samples from speakers of 4 non-native accents of English (8 speakers, 4 Indian languages); and also has a compilation of 4 native accents of English (4 countries, 13 speakers) and a metropolitan Indian accent (2 speakers). The dataset available here corresponds to release titled accentdb_extended on
Facebook
TwitterEmoFilm is a multilingual emotional speech corpus comprising 1115 audio instances produced in English, Italian, and Spanish languages. The audio clips (with a mean length of 3.5 sec. and std 1.2 sec.) were extracted in wave format (uncompressed, mono, 48 kHz sample rate and 16-bit) from 43 films (original in English and their over-dubbed Italian and Spanish versions). Genres including comedy, drama, horror, and thriller were considered; anger, contempt, happiness, fear, and sadness emotional states were taken into account. EmoFilm has been presented at Interspeech 2018:
Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird, and Björn Schuller (2018), Categorical vs Dimensional Perception of Italian Emotional Speech, in Proc. of Interspeech, Hyderabad, India, pp. 3638-3642 .
We would like to thank Linda Ratz for her contribution in the generation of the transcriptions.
How to access EmoFilm
To get access to the dataset, please send the signed End User License Agreement (EULA) when making the request. The EULA must be signed by somebody from a university holding a permanent position, typically a full professor. Note that requests without an EULA appropriately filled out, as well as those performed from a non-institutional e-mail address, will be automatically rejected. Please download the EULA from the following link:
https://drive.google.com/file/d/1pFHfsqk7snF_EVqq0WAC0Dz8FcTD3s9_/view?usp=share_link
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Indian English Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of English speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
The dataset features 30 Hours of dual-channel call center conversations between native Indian English speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
This dataset can be used across a range of healthcare and voice AI use cases:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Indian English Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced English speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.
This dataset includes over 6,000 scripted prompt recordings in Indian English, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.
This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:
To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:
Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.
Each data point is enriched with detailed metadata for advanced training and analysis:
This BFSI-focused dataset is
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the Indian English Scripted Monologue Speech Dataset for the Healthcare Domain, a voice dataset built to accelerate the development and deployment of English language automatic speech recognition (ASR) systems, with a sharp focus on real-world healthcare interactions.
This dataset includes over 6,000 high-quality scripted audio prompts recorded in Indian English, representing typical voice interactions found in the healthcare industry. The data is tailored for use in voice technology systems that power virtual assistants, patient-facing AI tools, and intelligent customer service platforms.
The prompts span a broad range of healthcare-specific interactions, such as:
To maximize authenticity, the prompts integrate linguistic elements and healthcare-specific terms such as:
These elements make the dataset exceptionally suited for training AI systems to understand and respond to natural healthcare-related speech patterns.
Every audio recording is accompanied by a verbatim, manually verified transcription.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MANGO: A Corpus of Human Ratings for Speech
MANGO (MUSHRA Assessment corpus using Native listeners and Guidelines to understand human Opinions at scale) is the first large-scale dataset designed for evaluating Text-to-Speech (TTS) systems in Indian languages.
Key Features:
255,150 human ratings of TTS-generated outputs and ground-truth human speech. Covers two major Indian languages: Hindi & Tamil, and English. Based on the MUSHRA (Multiple Stimuli with Hidden Reference… See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/MANGO.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Presenting the Indian English Scripted Monologue Speech Dataset for the Telecom Domain, a purpose-built dataset created to accelerate the development of English speech recognition and voice AI models specifically tailored for the telecommunications industry.
This dataset includes over 6,000 high-quality scripted prompt recordings in Indian English, representing real-world telecom customer service scenarios. It’s designed to support the training of speech-based AI systems used in call centers, virtual agents, and voice-powered support tools.
The dataset reflects a wide variety of common telecom customer interactions, including:
To maximize contextual richness, prompts include:
Each audio file is paired with an accurate, verbatim transcription for precise model training:
Detailed metadata is included to enhance
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Introducing the Indian English Scripted Monologue Speech Dataset for the Real Estate Domain, a dataset designed to support the development of English speech recognition and conversational AI technologies tailored for the real estate industry.
This dataset includes over 6,000 high-quality scripted prompt recordings in Indian English. The speech content reflects a wide range of real estate interactions to help build intelligent, domain-specific customer support systems and speech-enabled tools.
This dataset captures a broad spectrum of use cases and conversational themes within the real estate sector, such as:
Each scripted prompt incorporates key elements to simulate realistic real estate conversations:
To ensure precision in model training, each audio recording is paired with a verbatim text transcription:
Each data sample is enriched with detailed metadata to enhance usability:
Facebook
TwitterAttribution-ShareAlike 2.0 (CC BY-SA 2.0)https://creativecommons.org/licenses/by-sa/2.0/
License information was derived automatically
A special corpus of Indian languages covering 13 major languages of India. It comprises of 10000+ spoken sentences/utterances each of mono and English recorded by both Male and Female native speakers. Speech waveform files are available in .wav format along with the corresponding text. We hope that these recordings will be useful for researchers and speech technologists working on synthesis and recognition. You can request zip archives of the entire database here.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Indian English Wake Word & Voice Command Dataset is expertly curated to support the training and development of voice-activated systems. This dataset includes a large collection of wake words and command phrases, essential for enabling seamless user interaction with voice assistants and other speech-enabled technologies. It’s designed to ensure accurate wake word detection and voice command recognition, enhancing overall system performance and user experience.
This dataset includes 20,000+ audio recordings of wake words and command phrases. Each participant contributed 400 recordings, captured under varied environmental conditions and speaking speeds. The data covers:
This diversity ensures robust training for real-world voice assistant applications.
Each audio file is accompanied by detailed metadata to support advanced filtering and training needs.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Indian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Indian English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Indian accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Indian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple English speech and language AI applications: