97 datasets found
  1. F

    Hindi Agent-Customer Chat Dataset for Telecom

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Telecom [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-telecom-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Telecom Chat Dataset is a comprehensive collection of over 12,000 text-based conversations between telecom customers and call center agents. This dataset captures real-world service interactions and domain-specific language in Hindi, enabling the development of intelligent conversational AI and NLP systems tailored for the telecommunications sector.Participant & Chat Overview

    Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: A mix of positive, neutral, and negative interactions

    Topic Diversity

    This dataset spans a wide range of telecom customer service scenarios:

    Inbound Chats (Customer-Initiated)
    Phone number porting
    Network connectivity issues
    Billing inquiries and adjustments
    Technical support requests
    Service activations and upgrades
    International roaming inquiries
    Refunds and complaint resolution
    Emergency service access
    Outbound Chats (Agent-Initiated)
    Welcome and onboarding calls
    Payment reminders and due alerts
    Customer satisfaction surveys
    Technical issue follow-ups
    Usage reviews and service feedback
    Promotions and service offers

    Language Nuance & Realism

    The conversations reflect real-life telecom interactions in Hindi, incorporating:

    Naming Patterns: Realistic Hindi personal, business, and telecom brand names
    Localized Content: Phone numbers, email addresses, and locations consistent with regional norms
    Time & Number Formats: Hindi representations of dates, times, currencies, and service numbers
    Informal Language & Slang: Common Hindi expressions, idioms, and conversational shortcuts found in telecom discussions

    Conversational Flow & Structure

    Conversations follow the natural flow of telecom customer service exchanges, including:

    Dialogue Types:
    Simple service inquiries
    Detailed problem-solving discussions
    Plan explanations and upgrades
    Feedback collection and status updates
    Interaction Stages:
    Initial greetings and verification
    Data or issue collection
    Clarification and troubleshooting
    Resolution and

  2. F

    Hindi Human-Human Chat Dataset for Conversational AI & NLP

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Human-Human Chat Dataset for Conversational AI & NLP [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world Hindi usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level Hindi conversations covering a broad spectrum of everyday topics.

    Conversational Text Data

    This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native Hindi speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.

    Words per Chat: 300–700
    Turns per Chat: Up to 50 dialogue turns
    Contributors: 200 native Hindi speakers from the FutureBeeAI Crowd Community
    Format: TXT, DOCS, JSON or CSV (customizable)
    Structure: Each record contains the full chat, topic tag, and metadata block

    Diversity and Domain Coverage

    Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:

    Music, books, and movies
    Health and wellness
    Children and parenting
    Family life and relationships
    Food and cooking
    Education and studying
    Festivals and traditions
    Environment and daily life
    Internet and tech usage
    Childhood memories and casual chatting

    This diversity ensures the dataset is useful across multiple NLP and language understanding applications.

    Linguistic Authenticity

    Chats reflect informal, native-level Hindi usage with:

    Colloquial expressions and local dialect influence
    Domain-relevant terminology
    Language-specific grammar, phrasing, and sentence flow
    Inclusion of realistic details such as names, phone numbers, email addresses, locations, dates, times, local currencies, and culturally grounded references
    Representation of different writing styles and input quirks to ensure training data realism

    Metadata

    Every chat instance is accompanied by structured metadata, which includes:

    Participant Age
    Gender
    Country/Region
    Chat Domain
    Chat Topic
    Dialect

    This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.

    Data Quality Assurance

    All chat records pass through a rigorous QA process to maintain consistency and accuracy:

    Manual review for content completeness
    Format checks for chat turns and metadata
    Linguistic verification by native speakers
    Removal of inappropriate or unusable samples

    This ensures a clean, reliable dataset ready for high-performance AI model training.

    Applications

    This dataset is ideal for training and evaluating a wide range of text-based AI systems:

    Conversational AI / Chatbots
    Smart assistants and voicebots
    <div

  3. m

    General conversation speech datasets in Hindi for General

    • data.macgence.com
    mp3
    Updated Aug 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). General conversation speech datasets in Hindi for General [Dataset]. https://data.macgence.com/dataset/general-conversation-speech-datasets-in-hindi-for-general
    Explore at:
    mp3Available download formats
    Dataset updated
    Aug 4, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Explore high-quality Hindi general conversation speech datasets for AI, NLP, and speech recognition research. Download and enhance your projects today!

  4. s

    Hindi Dataset

    • tl.shaip.com
    Updated Dec 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). Hindi Dataset [Dataset]. https://tl.shaip.com/offerings/speech-data-catalog/hindi-dataset/
    Explore at:
    Dataset updated
    Dec 6, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Home Hindi Datasetहिंदी डेटासेटHigh-Quality Hindi TTS, General Conversation, at Podcast Dataset para sa AI at ASR Models Makipag-ugnayan sa Amin General Conversation Podcast Data TTS General Conversation .elementor-58615 .elementor-element.elementor-element-91938a9 px{20} 0px;}.elementor-50…

  5. m

    General conversation speech datasets in Hindi for Birthday Party

    • data.macgence.com
    mp3
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). General conversation speech datasets in Hindi for Birthday Party [Dataset]. https://data.macgence.com/dataset/general-conversation-speech-datasets-in-hindi-for-birthday-party
    Explore at:
    mp3Available download formats
    Dataset updated
    Jul 27, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    A comprehensive Hindi general conversation dataset tailored for birthday party scenarios, ideal for speech recognition and conversational AI applications.

  6. F

    Hindi Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Healthcare Chat Dataset is a rich collection of over 12,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Hindi-speaking regions.

    Participant & Chat Overview

    Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Hindi healthcare communication and includes:

    Authentic Naming Patterns: Hindi personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Hindi formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Hindi-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines

    Applications

    <p

  7. m

    Call Center conversation speech datasets in Hindi for Retail

    • data.macgence.com
    mp3
    Updated Mar 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). Call Center conversation speech datasets in Hindi for Retail [Dataset]. https://data.macgence.com/dataset/call-center-conversation-speech-datasets-in-hindi--for-retail
    Explore at:
    mp3Available download formats
    Dataset updated
    Mar 27, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    The audio dataset includes Call Center conversations from Retail, featuring Hindi speakers from INDIA ,with detailed metadata.

  8. h

    general-utterances-speech-datasets-in-hindi

    • huggingface.co
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2025). general-utterances-speech-datasets-in-hindi [Dataset]. https://huggingface.co/datasets/Macgence/general-utterances-speech-datasets-in-hindi
    Explore at:
    Dataset updated
    Jul 17, 2025
    Authors
    Macgence
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    About This OTS Dataset

    Unlock the potential of AI development with the Hindi General Utterances Conversation Dataset, tailored for general topics. This specialized collection of voice data is meticulously curated to enhance the understanding and analysis of general conversational topics in Hindi.

      Metadata Availability: Insights into Participant Details
    

    While transcripts are not included, comprehensive metadata accompanies each recording, providing insights into:… See the full description on the dataset page: https://huggingface.co/datasets/Macgence/general-utterances-speech-datasets-in-hindi.

  9. 797 Hours - Hindi(India) Spontaneous Dialogue Smartphone speech dataset

    • m.nexdata.ai
    • nexdata.ai
    Updated Jun 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). 797 Hours - Hindi(India) Spontaneous Dialogue Smartphone speech dataset [Dataset]. https://m.nexdata.ai/datasets/speechrecog/1156?source=Huggingface
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    India
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
    Description

    Hindi(India) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(1,022 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  10. F

    Hindi General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-hindi-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Hindi General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Hindi speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Hindi communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Hindi speech models that understand and respond to authentic Indian accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Hindi. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Hindi speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of India to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Hindi speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Hindi.
    Voice Assistants: Build smart assistants capable of understanding natural Indian conversations.
    <span

  11. D

    Live Hindi Call Center Conversations

    • defined.ai
    Updated May 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Defined.ai (2025). Live Hindi Call Center Conversations [Dataset]. https://defined.ai/datasets/live-hindi-call-center-conversations
    Explore at:
    Dataset updated
    May 17, 2025
    Dataset provided by
    Defined.ai
    Description

    Boost AI capabilities with our real-world call center audio data. Consented recordings in Hindi, covering industries like e-commerce, banking, insurance and medicine.

  12. m

    General conversation speech datasets in Hindi for Collaboration

    • data.macgence.com
    mp3
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2025). General conversation speech datasets in Hindi for Collaboration [Dataset]. https://data.macgence.com/dataset/general-conversation-speech-datasets-in-hindi-for-collaboration
    Explore at:
    mp3Available download formats
    Dataset updated
    May 21, 2025
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Explore Hindi speech datasets for collaboration, ideal for AI, NLP, and research projects. Access high-quality conversational data for your needs.

  13. 34 Hours - Hindi(India) Children Real-world Casual Conversation and...

    • nexdata.ai
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 34 Hours - Hindi(India) Children Real-world Casual Conversation and Monologue speech dataset [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1377
    Explore at:
    Dataset updated
    Nov 16, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    World, India
    Variables measured
    Age, Format, Country, Accuracy, Language, Content category, Language(Region) Code, Recording environment, Features of annotation
    Description

    Hindi(India) Children Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  14. m

    General conversation speech datasets in Hindi for Power house

    • data.macgence.com
    mp3
    Updated May 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2025). General conversation speech datasets in Hindi for Power house [Dataset]. https://data.macgence.com/dataset/general-conversation-speech-datasets-in-hindi-for-power-house
    Explore at:
    mp3Available download formats
    Dataset updated
    May 12, 2025
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Explore high-quality Hindi speech datasets for Power House. Ideal for conversational AI, NLP, and speech recognition applications. Download now!

  15. h

    Hinglish-Everyday-Conversations-1M

    • huggingface.co
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Khatri (2025). Hinglish-Everyday-Conversations-1M [Dataset]. https://huggingface.co/datasets/Abhishekcr448/Hinglish-Everyday-Conversations-1M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 13, 2025
    Authors
    Abhishek Khatri
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Hinglish Everyday Conversations Dataset

    A synthetically created Hinglish-based dataset of 2 columns where every row represents a unique conversation between 2 people in Hinglish about Everyday Life Topics.

      Use Model
    

    Access the model made using this dataset: Tiny-Hinglish-Chat-21M For more information about this model, its training process, or related resources, you can check the GitHub repository Tiny-Hinglish-Chat-21M-Scripts.

      Dataset Details… See the full description on the dataset page: https://huggingface.co/datasets/Abhishekcr448/Hinglish-Everyday-Conversations-1M.
    
  16. h

    indic-instruct-data-v0.1

    • huggingface.co
    Updated Jan 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI4Bharat (2024). indic-instruct-data-v0.1 [Dataset]. https://huggingface.co/datasets/ai4bharat/indic-instruct-data-v0.1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 26, 2024
    Dataset authored and provided by
    AI4Bharat
    Description

    Indic Instruct Data v0.1

    A collection of different instruction datasets spanning English and Hindi languages. The collection consists of:

    Anudesh wikiHow Flan v2 (67k sample subset) Dolly Anthropic-HHH (5k sample subset) OpenAssistant v1 LymSys-Chat (50k sample subset)

    We translate the English subset of specific datasets using IndicTrans2 (Gala et al., 2023). The chrF++ scores of the back-translated example and the corresponding example is provided for quality assessment of the… See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/indic-instruct-data-v0.1.

  17. F

    Hindi Agent-Customer Chat Dataset for Delivery & Logistics

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Delivery & Logistics [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-delivery-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Delivery & Logistics Chat Dataset is a comprehensive collection of over 12,000 text-based conversations between customers and call center agents. Focused on real-world delivery and logistics interactions, this dataset captures the language, tone, and service patterns essential for developing robust Hindi-language conversational AI, chatbots, and NLP systems across the delivery ecosystem.

    Participant & Chat Overview

    Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns between customer and agent
    Chat Types: Inbound (customer-initiated) and outbound (agent-initiated)
    Sentiment Coverage: Includes positive, neutral, and negative interaction outcomes

    Topic Diversity

    The dataset spans a wide range of delivery and logistics scenarios, ensuring strong coverage across customer service and operational workflows.

    Inbound Chats (Customer-Initiated)
    Order tracking and delivery status inquiries
    Complaints about late or missing deliveries
    Undeliverable or incorrect address resolution
    Return process and pickup scheduling
    Order modifications and change requests
    Enquiries about delivery method options
    Outbound Chats (Agent-Initiated)
    Delivery confirmations and dispatch updates
    Subscription renewal or delivery reminders
    Notification of delivery issues or missed attempts
    Out-of-stock or product unavailability alerts
    Satisfaction surveys and service feedback collection
    Address verification for upcoming deliveries

    This topical spread ensures wide applicability in both customer support automation and logistics optimization use cases.

    Language Diversity & Realism

    The conversations reflect the authentic language and interaction style of Hindi-speaking customers and delivery agents, incorporating:

    Naming Patterns: Personal names, business names, and logistics company references
    Localized Details: Hindi-format emails, phone numbers, regional addresses, and delivery zones
    Temporal and Numeric Expressions: Dates, delivery windows, prices, and tracking IDs in Hindi formats
    Slang and Informal Speech: Everyday expressions and delivery-specific idioms used across Hindi dialects

    This linguistic realism enables the development of context-aware and naturally responsive AI systems.

    Conversational Structure & Flow

    The dataset captures a diverse range of interaction types and delivery workflows:

    Dialogue Types:
    Quick status checks and confirmations
    Multi-turn issue resolution
    Process walkthroughs and guidance
    Feedback and escalation handling
    Common Flow Elements:
    Greetings and caller verification
    Request or complaint initiation
    <div style="margin-left: 60px; font-weight: 300;

  18. F

    Hindi Agent-Customer Chat Dataset for Real Estate

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Real Estate [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-realestate-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Real Estate Chat Dataset is a high-quality collection of over 12,000 text-based conversations between customers and call center agents. These conversations reflect real-world scenarios within the Real Estate sector, offering rich linguistic data for training conversational AI, chatbots, and NLP systems focused on property-related interactions in Hindi-speaking regions.

    Participant & Chat Overview

    Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both speakers
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative interactions included

    Topic Diversity

    The dataset spans a broad range of Real Estate service conversations, covering various customer intents and agent support tasks:

    Inbound Chats (Customer-Initiated)
    Property inquiries (buy/rent)
    Rental property availability
    Renovation and maintenance inquiries
    Property features and amenities
    Investment advice and ROI analysis
    Property ownership and legal history
    Outbound Chats (Agent-Initiated)
    New property listing announcements
    Post-purchase follow-ups
    Investment opportunity alerts
    Property valuation updates
    Customer satisfaction and feedback surveys

    This topic variety enables realistic model training for both lead generation and post-sale engagement scenarios.

    Language Nuance & Authenticity

    Conversations are reflective of natural Hindi used in the Real Estate domain, incorporating:

    Cultural Naming Patterns: Personal names, agency names, and developer brands
    Localized Contact Info: Phone numbers, email addresses, and geographic locations across Hindi-speaking regions
    Numeric and Temporal Language: Dates, prices, unit sizes, and time references formatted in Hindi conventions
    Informal and Domain-Specific Language: Real estate slang, idioms, and casual tone used in property discussions

    This level of linguistic realism supports model generalization across dialects and user demographics.

    Conversational Structure & Flow

    Conversations include a mix of short inquiries and detailed advisory sessions, capturing full customer journeys:

    Dialogue Types
    General inquiries
    Sales consultations
    Investment advisory
    Follow-up coordination
    Complaint handling and support
    Flow Components
    Greetings and identity verification
    Intent identification and context gathering
    <div style="margin-left: 60px; font-weight: 300; display: flex; gap: 16px; align-items: baseline; margin-block:

  19. m

    Call Center Conversation Speech Datasets in Hindi for Banking

    • data.macgence.com
    mp3
    Updated Aug 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). Call Center Conversation Speech Datasets in Hindi for Banking [Dataset]. https://data.macgence.com/dataset/call-center-conversation-speech-datasets-in-hindi-for-banking
    Explore at:
    mp3Available download formats
    Dataset updated
    Aug 11, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Optimize banking services with Macgence's Hindi call center dataset. Perfect for AI, linguistics, and fintech, offering precision and actionable insights!

  20. h

    gooftagoo

    • huggingface.co
    Updated Mar 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tensoic AI (2024). gooftagoo [Dataset]. https://huggingface.co/datasets/Tensoic/gooftagoo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 17, 2024
    Dataset authored and provided by
    Tensoic AI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Hindi/Hinglish Conversation Dataset

    This repository contains a dataset of conversational text in conversational hindi and hinglish(a mix of Hindi and English languages). The Conversation Dataset contains multi-turn conversations on multiple topics usually revolving around daily real-life experiences. A small amount of reasoning tasks have also been added (specifically COT style reasoning and coding) with about 1k samples from Openhermes 2.5.

      Caution
    

    This dataset was… See the full description on the dataset page: https://huggingface.co/datasets/Tensoic/gooftagoo.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Telecom [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-telecom-domain-conversation-text-dataset

Hindi Agent-Customer Chat Dataset for Telecom

Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by
FutureBeeAI
Description

Introduction

The Hindi Telecom Chat Dataset is a comprehensive collection of over 12,000 text-based conversations between telecom customers and call center agents. This dataset captures real-world service interactions and domain-specific language in Hindi, enabling the development of intelligent conversational AI and NLP systems tailored for the telecommunications sector.Participant & Chat Overview

Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
Conversation Length: 300–700 words per chat
Turns per Chat: 50–150 dialogue turns across both participants
Chat Types: Inbound and outbound
Sentiment Coverage: A mix of positive, neutral, and negative interactions

Topic Diversity

This dataset spans a wide range of telecom customer service scenarios:

Inbound Chats (Customer-Initiated)
Phone number porting
Network connectivity issues
Billing inquiries and adjustments
Technical support requests
Service activations and upgrades
International roaming inquiries
Refunds and complaint resolution
Emergency service access
Outbound Chats (Agent-Initiated)
Welcome and onboarding calls
Payment reminders and due alerts
Customer satisfaction surveys
Technical issue follow-ups
Usage reviews and service feedback
Promotions and service offers

Language Nuance & Realism

The conversations reflect real-life telecom interactions in Hindi, incorporating:

Naming Patterns: Realistic Hindi personal, business, and telecom brand names
Localized Content: Phone numbers, email addresses, and locations consistent with regional norms
Time & Number Formats: Hindi representations of dates, times, currencies, and service numbers
Informal Language & Slang: Common Hindi expressions, idioms, and conversational shortcuts found in telecom discussions

Conversational Flow & Structure

Conversations follow the natural flow of telecom customer service exchanges, including:

Dialogue Types:
Simple service inquiries
Detailed problem-solving discussions
Plan explanations and upgrades
Feedback collection and status updates
Interaction Stages:
Initial greetings and verification
Data or issue collection
Clarification and troubleshooting
Resolution and

Search
Clear search
Close search
Google apps
Main menu