https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Hindi Telecom Chat Dataset is a comprehensive collection of over 12,000 text-based conversations between telecom customers and call center agents. This dataset captures real-world service interactions and domain-specific language in Hindi, enabling the development of intelligent conversational AI and NLP systems tailored for the telecommunications sector.Participant & Chat Overview
This dataset spans a wide range of telecom customer service scenarios:
The conversations reflect real-life telecom interactions in Hindi, incorporating:
Conversations follow the natural flow of telecom customer service exchanges, including:
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Hindi General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world Hindi usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level Hindi conversations covering a broad spectrum of everyday topics.
This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native Hindi speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.
Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:
This diversity ensures the dataset is useful across multiple NLP and language understanding applications.
Chats reflect informal, native-level Hindi usage with:
Every chat instance is accompanied by structured metadata, which includes:
This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.
All chat records pass through a rigorous QA process to maintain consistency and accuracy:
This ensures a clean, reliable dataset ready for high-performance AI model training.
This dataset is ideal for training and evaluating a wide range of text-based AI systems:
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Explore high-quality Hindi general conversation speech datasets for AI, NLP, and speech recognition research. Download and enhance your projects today!
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Home Hindi Datasetहिंदी डेटासेटHigh-Quality Hindi TTS, General Conversation, at Podcast Dataset para sa AI at ASR Models Makipag-ugnayan sa Amin General Conversation Podcast Data TTS General Conversation .elementor-58615 .elementor-element.elementor-element-91938a9 px{20} 0px;}.elementor-50…
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
A comprehensive Hindi general conversation dataset tailored for birthday party scenarios, ideal for speech recognition and conversational AI applications.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Hindi Healthcare Chat Dataset is a rich collection of over 12,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Hindi-speaking regions.
The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:
This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.
This dataset reflects the natural flow of Hindi healthcare communication and includes:
These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.
Conversations range from simple inquiries to complex advisory sessions, including:
Each conversation typically includes these structural components:
This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.
Available in JSON, CSV, and TXT formats, each conversation includes:
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
The audio dataset includes Call Center conversations from Retail, featuring Hindi speakers from INDIA ,with detailed metadata.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
About This OTS Dataset
Unlock the potential of AI development with the Hindi General Utterances Conversation Dataset, tailored for general topics. This specialized collection of voice data is meticulously curated to enhance the understanding and analysis of general conversational topics in Hindi.
Metadata Availability: Insights into Participant Details
While transcripts are not included, comprehensive metadata accompanies each recording, providing insights into:… See the full description on the dataset page: https://huggingface.co/datasets/Macgence/general-utterances-speech-datasets-in-hindi.
Hindi(India) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(1,022 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Hindi General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Hindi speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Hindi communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Hindi speech models that understand and respond to authentic Indian accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Hindi. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple Hindi speech and language AI applications:
Boost AI capabilities with our real-world call center audio data. Consented recordings in Hindi, covering industries like e-commerce, banking, insurance and medicine.
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Explore Hindi speech datasets for collaboration, ideal for AI, NLP, and research projects. Access high-quality conversational data for your needs.
Hindi(India) Children Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Explore high-quality Hindi speech datasets for Power House. Ideal for conversational AI, NLP, and speech recognition applications. Download now!
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Hinglish Everyday Conversations Dataset
A synthetically created Hinglish-based dataset of 2 columns where every row represents a unique conversation between 2 people in Hinglish about Everyday Life Topics.
Use Model
Access the model made using this dataset: Tiny-Hinglish-Chat-21M For more information about this model, its training process, or related resources, you can check the GitHub repository Tiny-Hinglish-Chat-21M-Scripts.
Dataset Details… See the full description on the dataset page: https://huggingface.co/datasets/Abhishekcr448/Hinglish-Everyday-Conversations-1M.
Indic Instruct Data v0.1
A collection of different instruction datasets spanning English and Hindi languages. The collection consists of:
Anudesh wikiHow Flan v2 (67k sample subset) Dolly Anthropic-HHH (5k sample subset) OpenAssistant v1 LymSys-Chat (50k sample subset)
We translate the English subset of specific datasets using IndicTrans2 (Gala et al., 2023). The chrF++ scores of the back-translated example and the corresponding example is provided for quality assessment of the… See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/indic-instruct-data-v0.1.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Hindi Delivery & Logistics Chat Dataset is a comprehensive collection of over 12,000 text-based conversations between customers and call center agents. Focused on real-world delivery and logistics interactions, this dataset captures the language, tone, and service patterns essential for developing robust Hindi-language conversational AI, chatbots, and NLP systems across the delivery ecosystem.
The dataset spans a wide range of delivery and logistics scenarios, ensuring strong coverage across customer service and operational workflows.
This topical spread ensures wide applicability in both customer support automation and logistics optimization use cases.
The conversations reflect the authentic language and interaction style of Hindi-speaking customers and delivery agents, incorporating:
This linguistic realism enables the development of context-aware and naturally responsive AI systems.
The dataset captures a diverse range of interaction types and delivery workflows:
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Hindi Real Estate Chat Dataset is a high-quality collection of over 12,000 text-based conversations between customers and call center agents. These conversations reflect real-world scenarios within the Real Estate sector, offering rich linguistic data for training conversational AI, chatbots, and NLP systems focused on property-related interactions in Hindi-speaking regions.
The dataset spans a broad range of Real Estate service conversations, covering various customer intents and agent support tasks:
This topic variety enables realistic model training for both lead generation and post-sale engagement scenarios.
Conversations are reflective of natural Hindi used in the Real Estate domain, incorporating:
This level of linguistic realism supports model generalization across dialects and user demographics.
Conversations include a mix of short inquiries and detailed advisory sessions, capturing full customer journeys:
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Optimize banking services with Macgence's Hindi call center dataset. Perfect for AI, linguistics, and fintech, offering precision and actionable insights!
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Hindi/Hinglish Conversation Dataset
This repository contains a dataset of conversational text in conversational hindi and hinglish(a mix of Hindi and English languages). The Conversation Dataset contains multi-turn conversations on multiple topics usually revolving around daily real-life experiences. A small amount of reasoning tasks have also been added (specifically COT style reasoning and coding) with about 1k samples from Openhermes 2.5.
Caution
This dataset was… See the full description on the dataset page: https://huggingface.co/datasets/Tensoic/gooftagoo.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Hindi Telecom Chat Dataset is a comprehensive collection of over 12,000 text-based conversations between telecom customers and call center agents. This dataset captures real-world service interactions and domain-specific language in Hindi, enabling the development of intelligent conversational AI and NLP systems tailored for the telecommunications sector.Participant & Chat Overview
This dataset spans a wide range of telecom customer service scenarios:
The conversations reflect real-life telecom interactions in Hindi, incorporating:
Conversations follow the natural flow of telecom customer service exchanges, including: