91 datasets found
  1. F

    Hindi Agent-Customer Chat Dataset for Telecom

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Telecom [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-telecom-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Telecom Chat Dataset is a comprehensive collection of over 12,000 text-based conversations between telecom customers and call center agents. This dataset captures real-world service interactions and domain-specific language in Hindi, enabling the development of intelligent conversational AI and NLP systems tailored for the telecommunications sector.Participant & Chat Overview

    Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: A mix of positive, neutral, and negative interactions

    Topic Diversity

    This dataset spans a wide range of telecom customer service scenarios:

    Inbound Chats (Customer-Initiated)
    Phone number porting
    Network connectivity issues
    Billing inquiries and adjustments
    Technical support requests
    Service activations and upgrades
    International roaming inquiries
    Refunds and complaint resolution
    Emergency service access
    Outbound Chats (Agent-Initiated)
    Welcome and onboarding calls
    Payment reminders and due alerts
    Customer satisfaction surveys
    Technical issue follow-ups
    Usage reviews and service feedback
    Promotions and service offers

    Language Nuance & Realism

    The conversations reflect real-life telecom interactions in Hindi, incorporating:

    Naming Patterns: Realistic Hindi personal, business, and telecom brand names
    Localized Content: Phone numbers, email addresses, and locations consistent with regional norms
    Time & Number Formats: Hindi representations of dates, times, currencies, and service numbers
    Informal Language & Slang: Common Hindi expressions, idioms, and conversational shortcuts found in telecom discussions

    Conversational Flow & Structure

    Conversations follow the natural flow of telecom customer service exchanges, including:

    Dialogue Types:
    Simple service inquiries
    Detailed problem-solving discussions
    Plan explanations and upgrades
    Feedback collection and status updates
    Interaction Stages:
    Initial greetings and verification
    Data or issue collection
    Clarification and troubleshooting
    Resolution and

  2. F

    Hindi Human-Human Chat Dataset for Conversational AI & NLP

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Human-Human Chat Dataset for Conversational AI & NLP [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world Hindi usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level Hindi conversations covering a broad spectrum of everyday topics.

    Conversational Text Data

    This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native Hindi speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.

    Words per Chat: 300–700
    Turns per Chat: Up to 50 dialogue turns
    Contributors: 200 native Hindi speakers from the FutureBeeAI Crowd Community
    Format: TXT, DOCS, JSON or CSV (customizable)
    Structure: Each record contains the full chat, topic tag, and metadata block

    Diversity and Domain Coverage

    Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:

    Music, books, and movies
    Health and wellness
    Children and parenting
    Family life and relationships
    Food and cooking
    Education and studying
    Festivals and traditions
    Environment and daily life
    Internet and tech usage
    Childhood memories and casual chatting

    This diversity ensures the dataset is useful across multiple NLP and language understanding applications.

    Linguistic Authenticity

    Chats reflect informal, native-level Hindi usage with:

    Colloquial expressions and local dialect influence
    Domain-relevant terminology
    Language-specific grammar, phrasing, and sentence flow
    Inclusion of realistic details such as names, phone numbers, email addresses, locations, dates, times, local currencies, and culturally grounded references
    Representation of different writing styles and input quirks to ensure training data realism

    Metadata

    Every chat instance is accompanied by structured metadata, which includes:

    Participant Age
    Gender
    Country/Region
    Chat Domain
    Chat Topic
    Dialect

    This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.

    Data Quality Assurance

    All chat records pass through a rigorous QA process to maintain consistency and accuracy:

    Manual review for content completeness
    Format checks for chat turns and metadata
    Linguistic verification by native speakers
    Removal of inappropriate or unusable samples

    This ensures a clean, reliable dataset ready for high-performance AI model training.

    Applications

    This dataset is ideal for training and evaluating a wide range of text-based AI systems:

    Conversational AI / Chatbots
    Smart assistants and voicebots
    <div

  3. s

    Daneyên Hindi

    • ku.shaip.com
    Updated Jun 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2025). Daneyên Hindi [Dataset]. https://ku.shaip.com/offerings/speech-data-catalog/hindi-dataset/
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Home Hindi Datasetहिंदी डेटासेटHigh-Quality Hindi Call-Center, General Conversation, and Podcast Dataset for AI & ASR Models Contact Us OverviewTitle (Language)Hindi Language DatasetDataset TypesCall Center, General Conversation, Media (Podcast), Scripted MonologueCountryIndiaDescriptionUnscripted…

  4. F

    Hindi Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Healthcare Chat Dataset is a rich collection of over 12,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Hindi-speaking regions.

    Participant & Chat Overview

    Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Hindi healthcare communication and includes:

    Authentic Naming Patterns: Hindi personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Hindi formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Hindi-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines

    Applications

    <p

  5. m

    General conversation speech datasets in Hindi for General

    • data.macgence.com
    mp3
    Updated Aug 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). General conversation speech datasets in Hindi for General [Dataset]. https://data.macgence.com/dataset/general-conversation-speech-datasets-in-hindi-for-general
    Explore at:
    mp3Available download formats
    Dataset updated
    Aug 4, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Explore high-quality Hindi general conversation speech datasets for AI, NLP, and speech recognition research. Download and enhance your projects today!

  6. Hindi Children Speech Dataset – 34 Hours (Real-world Conversation &...

    • nexdata.ai
    Updated Sep 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Hindi Children Speech Dataset – 34 Hours (Real-world Conversation & Monologue) [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1377
    Explore at:
    Dataset updated
    Sep 12, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    World
    Variables measured
    Age, Format, Country, Accuracy, Language, Content category, Language(Region) Code, Recording environment, Features of annotation
    Description

    This dataset contains 34 hours of Hindi children’s speech.The recordings cover self-media, conversations, live talk, lectures, variety show and other generic domains, mirrors real-world interactions. Each utterance is transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  7. F

    Hindi General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-hindi-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Hindi General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Hindi speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Hindi communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Hindi speech models that understand and respond to authentic Indian accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Hindi. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Hindi speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of India to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Hindi speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Hindi.
    Voice Assistants: Build smart assistants capable of understanding natural Indian conversations.
    <span

  8. 760 Hours Hindi Speech Dataset (Telephony Recordings)

    • nexdata.ai
    Updated Oct 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 760 Hours Hindi Speech Dataset (Telephony Recordings) [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1206
    Explore at:
    Dataset updated
    Oct 14, 2023
    Dataset authored and provided by
    Nexdata
    Variables measured
    Format, Country, Speaker, Language, Accuracy rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
    Description

    This dataset contains 760 hours of spontaneous Hindi dialogue speech, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(1,004 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  9. h

    hindi-speech-recognition-dataset

    • huggingface.co
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata NLP (2025). hindi-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/ud-nlp/hindi-speech-recognition-dataset
    Explore at:
    Dataset updated
    Aug 1, 2025
    Authors
    Unidata NLP
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Hindi Telephone Dialogues Dataset - 760 Hours

    Dataset comprises 760 hours of high-quality audio recordings from 1,000+ native Hindi speakers, featuring telephone dialogues across diverse topics and domains. With a 95% sentence accuracy rate, this essential dataset is ideal for training and evaluating Hindi speech recognition systems. - Get the data

      Dataset characteristics:
    

    Characteristic Data

    Description Audio of telephone dialogues in Hindi for training… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/hindi-speech-recognition-dataset.

  10. m

    General conversation speech datasets in Hindi for Virtual Reality

    • data.macgence.com
    mp3
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). General conversation speech datasets in Hindi for Virtual Reality [Dataset]. https://data.macgence.com/dataset/general-conversation-speech-datasets-in-hindi-for-virtual-reality
    Explore at:
    mp3Available download formats
    Dataset updated
    Apr 24, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    The audio dataset includes General Conversation, featuring Hindi speakers from India with detailed metadata.

  11. D

    Live Hindi Call Center Conversations

    • defined.ai
    Updated May 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Defined.ai (2025). Live Hindi Call Center Conversations [Dataset]. https://defined.ai/datasets/live-hindi-call-center-conversations
    Explore at:
    Dataset updated
    May 17, 2025
    Dataset provided by
    Defined.ai
    Description

    Boost AI capabilities with our real-world call center audio data. Consented recordings in Hindi, covering industries like e-commerce, banking, insurance and medicine.

  12. 797 Hours Hindi Speech Dataset – 1,022 Native Indian Speakers

    • nexdata.ai
    • m.nexdata.ai
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 797 Hours Hindi Speech Dataset – 1,022 Native Indian Speakers [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1156
    Explore at:
    Dataset updated
    Apr 13, 2024
    Dataset authored and provided by
    Nexdata
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
    Description

    This dataset contains 797 hours of spontaneous Hindi dialogue speech, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(1,022 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  13. F

    Hindi Agent-Customer Chat Dataset for Delivery & Logistics

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Delivery & Logistics [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-delivery-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Delivery & Logistics Chat Dataset is a comprehensive collection of over 12,000 text-based conversations between customers and call center agents. Focused on real-world delivery and logistics interactions, this dataset captures the language, tone, and service patterns essential for developing robust Hindi-language conversational AI, chatbots, and NLP systems across the delivery ecosystem.

    Participant & Chat Overview

    Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns between customer and agent
    Chat Types: Inbound (customer-initiated) and outbound (agent-initiated)
    Sentiment Coverage: Includes positive, neutral, and negative interaction outcomes

    Topic Diversity

    The dataset spans a wide range of delivery and logistics scenarios, ensuring strong coverage across customer service and operational workflows.

    Inbound Chats (Customer-Initiated)
    Order tracking and delivery status inquiries
    Complaints about late or missing deliveries
    Undeliverable or incorrect address resolution
    Return process and pickup scheduling
    Order modifications and change requests
    Enquiries about delivery method options
    Outbound Chats (Agent-Initiated)
    Delivery confirmations and dispatch updates
    Subscription renewal or delivery reminders
    Notification of delivery issues or missed attempts
    Out-of-stock or product unavailability alerts
    Satisfaction surveys and service feedback collection
    Address verification for upcoming deliveries

    This topical spread ensures wide applicability in both customer support automation and logistics optimization use cases.

    Language Diversity & Realism

    The conversations reflect the authentic language and interaction style of Hindi-speaking customers and delivery agents, incorporating:

    Naming Patterns: Personal names, business names, and logistics company references
    Localized Details: Hindi-format emails, phone numbers, regional addresses, and delivery zones
    Temporal and Numeric Expressions: Dates, delivery windows, prices, and tracking IDs in Hindi formats
    Slang and Informal Speech: Everyday expressions and delivery-specific idioms used across Hindi dialects

    This linguistic realism enables the development of context-aware and naturally responsive AI systems.

    Conversational Structure & Flow

    The dataset captures a diverse range of interaction types and delivery workflows:

    Dialogue Types:
    Quick status checks and confirmations
    Multi-turn issue resolution
    Process walkthroughs and guidance
    Feedback and escalation handling
    Common Flow Elements:
    Greetings and caller verification
    Request or complaint initiation
    <div style="margin-left: 60px; font-weight: 300;

  14. F

    Hindi Agent-Customer Chat Dataset for Real Estate

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Real Estate [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-realestate-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Real Estate Chat Dataset is a high-quality collection of over 12,000 text-based conversations between customers and call center agents. These conversations reflect real-world scenarios within the Real Estate sector, offering rich linguistic data for training conversational AI, chatbots, and NLP systems focused on property-related interactions in Hindi-speaking regions.

    Participant & Chat Overview

    Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both speakers
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative interactions included

    Topic Diversity

    The dataset spans a broad range of Real Estate service conversations, covering various customer intents and agent support tasks:

    Inbound Chats (Customer-Initiated)
    Property inquiries (buy/rent)
    Rental property availability
    Renovation and maintenance inquiries
    Property features and amenities
    Investment advice and ROI analysis
    Property ownership and legal history
    Outbound Chats (Agent-Initiated)
    New property listing announcements
    Post-purchase follow-ups
    Investment opportunity alerts
    Property valuation updates
    Customer satisfaction and feedback surveys

    This topic variety enables realistic model training for both lead generation and post-sale engagement scenarios.

    Language Nuance & Authenticity

    Conversations are reflective of natural Hindi used in the Real Estate domain, incorporating:

    Cultural Naming Patterns: Personal names, agency names, and developer brands
    Localized Contact Info: Phone numbers, email addresses, and geographic locations across Hindi-speaking regions
    Numeric and Temporal Language: Dates, prices, unit sizes, and time references formatted in Hindi conventions
    Informal and Domain-Specific Language: Real estate slang, idioms, and casual tone used in property discussions

    This level of linguistic realism supports model generalization across dialects and user demographics.

    Conversational Structure & Flow

    Conversations include a mix of short inquiries and detailed advisory sessions, capturing full customer journeys:

    Dialogue Types
    General inquiries
    Sales consultations
    Investment advisory
    Follow-up coordination
    Complaint handling and support
    Flow Components
    Greetings and identity verification
    Intent identification and context gathering
    <div style="margin-left: 60px; font-weight: 300; display: flex; gap: 16px; align-items: baseline; margin-block:

  15. h

    english-hindi-colloquial-dataset

    • huggingface.co
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    deeksha bajpai (2025). english-hindi-colloquial-dataset [Dataset]. https://huggingface.co/datasets/bajpaideeksha/english-hindi-colloquial-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2025
    Authors
    deeksha bajpai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A curated dataset of colloquial English phrases and their corresponding Hindi translations. This dataset focuses on informal language, including slang, idioms, and everyday expressions, making it ideal for training models that handle casual conversations. Dataset Details: Size:e.g., 500+ phrase pairs] Source: Collected from publicly available conversational datasets, social media, and crowdsourced contributions. Language Pair: English → Hindi Annotations: Each phrase pair is manually verified… See the full description on the dataset page: https://huggingface.co/datasets/bajpaideeksha/english-hindi-colloquial-dataset.

  16. m

    Call Center Conversations Speech Dataset of BFSI Sector in Hindi

    • data.macgence.com
    mp3
    Updated Jun 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2025). Call Center Conversations Speech Dataset of BFSI Sector in Hindi [Dataset]. https://data.macgence.com/dataset/call-center-conversations-speech-dataset-of-bfsi-sector-in-hindi
    Explore at:
    mp3Available download formats
    Dataset updated
    Jun 8, 2025
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Download the comprehensive Call Center Conversations Speech Dataset in Hindi, focused on the BFSI sector. Ideal for AI training, speech recognition, and customer service analytics.

  17. h

    indic-instruct-data-v0.1

    • huggingface.co
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI4Bharat (2024). indic-instruct-data-v0.1 [Dataset]. https://huggingface.co/datasets/ai4bharat/indic-instruct-data-v0.1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 26, 2024
    Dataset authored and provided by
    AI4Bharat
    Description

    Indic Instruct Data v0.1

    A collection of different instruction datasets spanning English and Hindi languages. The collection consists of:

    Anudesh wikiHow Flan v2 (67k sample subset) Dolly Anthropic-HHH (5k sample subset) OpenAssistant v1 LymSys-Chat (50k sample subset)

    We translate the English subset of specific datasets using IndicTrans2 (Gala et al., 2023). The chrF++ scores of the back-translated example and the corresponding example is provided for quality assessment of the… See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/indic-instruct-data-v0.1.

  18. h

    hindi-colloquial-dataset

    • huggingface.co
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sirisha D (2025). hindi-colloquial-dataset [Dataset]. https://huggingface.co/datasets/SirirshaD/hindi-colloquial-dataset
    Explore at:
    Dataset updated
    Feb 18, 2025
    Authors
    Sirisha D
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Hindi Colloquial Dataset

    This dataset contains pairs of English Text and Hindi Colloquial Text, designed for training machine learning models for translation . The dataset was created as part of a hackathon organized by Swati.

      Dataset Details
    

    Size: 90 pairs of English and colloquial Hindi sentences Languages: English, Hindi Task: Translation, Text Generation Content: Contains colloquial translations for everyday conversational texts in Hindi.

      Example… See the full description on the dataset page: https://huggingface.co/datasets/SirirshaD/hindi-colloquial-dataset.
    
  19. F

    Hindi Agent-Customer Chat Dataset for Retail & E-Commerce

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Retail & E-Commerce [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-retail-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Retail & E-Commerce Chat Dataset is a large-scale, high-quality collection of over 12,000 chat conversations between customers and call center agents, focused exclusively on Retail and E-Commerce domains. Designed to reflect real-world service interactions, this dataset supports the development of robust conversational AI and NLP models tailored for Hindi-speaking audiences.

    Participant & Chat Overview

    Contributors: 200 native Hindi speakers from the FutureBeeAI Crowd Community
    Chat Length: 300–700 words per conversation
    Turn Count: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative interaction outcomes

    Topic Diversity

    This dataset spans a wide range of Retail and E-Commerce conversation types:

    Inbound Chats (Customer-Initiated)
    Product inquiries
    Return or exchange requests
    Order cancellations
    Refunds and payment issues
    Membership or subscription queries
    Shipping, delivery, and more
    Outbound Chats (Agent-Initiated)
    Order confirmation and verification
    Cross-selling and upselling
    Loyalty program promotions
    Account updates
    Special offers and discounts
    Customer feedback and verification

    This diversity enables training of models that handle varied intents, scenarios, and outcomes within customer service workflows.

    Language Nuance & Realism

    The dataset is rich in linguistic diversity and mirrors real conversational tone and structure used in Hindi-speaking regions:

    Personal & Brand Names: Culturally accurate naming conventions
    Local Elements: Realistic addresses, phone numbers, emails, currency references, and time/date formats
    Slang & Idioms: Local expressions, informal phrases, and customer service jargon
    Cultural Specificity: Region-aware vocabulary and tone

    This linguistic authenticity ensures the development of culturally fluent AI models for Hindi Retail & E-Commerce use cases.

    Conversational Structure & Flow

    The conversations reflect natural dialogue dynamics and are organized into various types of interaction styles:

    Simple inquiries
    Detailed problem-solving discussions
    Transactional exchanges
    Follow-ups and status updates
    Advisory and assistance sessions

    Each conversation includes common dialogue stages such as:

    Greetings
    Customer authentication
    Information gathering
    <div style="margin-top:10px; margin-bottom: 10px; margin-left: 30px;font-weight: 300; display: flex; gap: 16px;

  20. s

    Hindi Dataset

    • ht.shaip.com
    Updated Dec 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). Hindi Dataset [Dataset]. https://ht.shaip.com/offerings/speech-data-catalog/hindi-dataset/
    Explore at:
    Dataset updated
    Dec 27, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Akèy Ansanm Done Hindiहिंदी डेटासेट Ansanm Done Sant Apèl, Konvèsasyon Jeneral, ak Podcast Hindi Kalite Siperyè pou Modèl IA ak ASR Kontakte nou Apèsi sou sijè aTit (Lang)Ansanm Done Lang HindiTip Ansanm DoneSant Apèl, Konvèsasyon Jeneral, Medya (Podcast), Monològ ak SenaryoPeyiEndDeskripsyonSan Senaryo…

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Telecom [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-telecom-domain-conversation-text-dataset

Hindi Agent-Customer Chat Dataset for Telecom

Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by
FutureBeeAI
Description

Introduction

The Hindi Telecom Chat Dataset is a comprehensive collection of over 12,000 text-based conversations between telecom customers and call center agents. This dataset captures real-world service interactions and domain-specific language in Hindi, enabling the development of intelligent conversational AI and NLP systems tailored for the telecommunications sector.Participant & Chat Overview

Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
Conversation Length: 300–700 words per chat
Turns per Chat: 50–150 dialogue turns across both participants
Chat Types: Inbound and outbound
Sentiment Coverage: A mix of positive, neutral, and negative interactions

Topic Diversity

This dataset spans a wide range of telecom customer service scenarios:

Inbound Chats (Customer-Initiated)
Phone number porting
Network connectivity issues
Billing inquiries and adjustments
Technical support requests
Service activations and upgrades
International roaming inquiries
Refunds and complaint resolution
Emergency service access
Outbound Chats (Agent-Initiated)
Welcome and onboarding calls
Payment reminders and due alerts
Customer satisfaction surveys
Technical issue follow-ups
Usage reviews and service feedback
Promotions and service offers

Language Nuance & Realism

The conversations reflect real-life telecom interactions in Hindi, incorporating:

Naming Patterns: Realistic Hindi personal, business, and telecom brand names
Localized Content: Phone numbers, email addresses, and locations consistent with regional norms
Time & Number Formats: Hindi representations of dates, times, currencies, and service numbers
Informal Language & Slang: Common Hindi expressions, idioms, and conversational shortcuts found in telecom discussions

Conversational Flow & Structure

Conversations follow the natural flow of telecom customer service exchanges, including:

Dialogue Types:
Simple service inquiries
Detailed problem-solving discussions
Plan explanations and upgrades
Feedback collection and status updates
Interaction Stages:
Initial greetings and verification
Data or issue collection
Clarification and troubleshooting
Resolution and

Search
Clear search
Close search
Google apps
Main menu