Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Indian English Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for English -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 30 hours of dual-channel audio recordings between native Indian English speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This news dataset is a persistent historical archive of noteable events in the Indian subcontinent from start-2001 to q2-2023, recorded in real-time by the journalists of India. It contains approximately 3.8 million events published by Times of India.
A majority of the data is focusing on Indian local news including national, city level and entertainment.
Prepared by Rohit Kulkarni
Time Range : Start Date: 2001-01-01 ; End Date: 2023-06-30
CSV Rows: 3,876,557
Columns: 1. publish_date: Date of the article being published online in yyyyMMdd format 2. headline_category: Category of the headline, ascii, dot delimited, lowercase values 3. headline_text: Text of the Headline in English, only ascii characters
Times Group as a news agency, reaches out a very wide audience across Asia and drawfs every other agency in the quantity of English articles published per day. Due to the heavy daily volume (avg. 600 articles) over multiple years, this data offers a deep insight into Indian society, its priorities, events, issues and talking points and how they have unfolded over time.
It is possible to chop this dataset into a smaller piece based on one or more facets.
Similar news datasets exploring other attributes, countries and topics can be seen on my profile.
Facebook
TwitterIn 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Algerian Arabic Scripted Monologue Speech Dataset for the Travel domain, a carefully constructed resource created to support the development of Arabic speech recognition technologies, particularly for applications in travel, tourism, and customer service automation.
This training dataset features 6,000+ high-quality scripted prompt recordings in Algerian Arabic, crafted to simulate real-world Travel industry conversations. It’s ideal for building robust ASR systems, virtual assistants, and customer interaction tools.
The dataset includes a wide spectrum of travel-related interactions to reflect diverse real-world scenarios:
To boost contextual realism, the scripted prompts integrate frequently encountered travel terms and variables:
Every audio file is paired with a verbatim transcription in .TXT format.
Each audio file is enriched with detailed metadata to support advanced analytics and filtering:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Gujarati Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Gujarati speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.
This dataset includes over 6,000 scripted prompt recordings in Gujarati, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.
This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:
To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:
Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.
Each data point is enriched with detailed metadata for advanced training and analysis:
This BFSI-focused dataset is ideal for:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Indian English Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced English speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.
This dataset includes over 6,000 scripted prompt recordings in Indian English, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.
This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:
To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:
Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.
Each data point is enriched with detailed metadata for advanced training and analysis:
This BFSI-focused dataset is
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Tamil Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Tamil -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 30 hours of dual-channel audio recordings between native Tamil speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Tamil Scripted Monologue Speech Dataset for the General Domain is a carefully curated resource designed to support the development of Tamil language speech recognition systems. This dataset focuses on general-purpose conversational topics and is ideal for a wide range of AI applications requiring natural, domain-agnostic Tamil speech data.
This dataset features over 6,000 high-quality scripted monologue recordings in Tamil. The prompts span diverse real-life topics commonly encountered in general conversations and are intended to help train robust and accurate speech-enabled technologies.
The dataset covers a wide variety of general conversation scenarios, including:
To enhance authenticity, the prompts include:
Each prompt is designed to reflect everyday use cases, making it suitable for developing generalized NLP and ASR solutions.
Every audio file in the dataset is accompanied by a verbatim text transcription, ensuring accurate training and evaluation of speech models.
Rich metadata is included for detailed filtering and analysis:
This dataset can power a variety of Tamil language AI technologies, including:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Presenting the Tamil Scripted Monologue Speech Dataset for the Telecom Domain, a purpose-built dataset created to accelerate the development of Tamil speech recognition and voice AI models specifically tailored for the telecommunications industry.
This dataset includes over 6,000 high-quality scripted prompt recordings in Tamil, representing real-world telecom customer service scenarios. It’s designed to support the training of speech-based AI systems used in call centers, virtual agents, and voice-powered support tools.
The dataset reflects a wide variety of common telecom customer interactions, including:
To maximize contextual richness, prompts include:
Each audio file is paired with an accurate, verbatim transcription for precise model training:
Detailed metadata is included to enhance dataset usability and
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Punjabi Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Punjabi speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.
This dataset includes over 6,000 scripted prompt recordings in Punjabi, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.
This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:
To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:
Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.
Each data point is enriched with detailed metadata for advanced training and analysis:
This BFSI-focused dataset is ideal for:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Odia Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 40 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Odia -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 40 hours of dual-channel audio recordings between native Odia speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Tamil Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Tamil speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.
This dataset includes over 6,000 scripted prompt recordings in Tamil, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.
This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:
To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:
Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.
Each data point is enriched with detailed metadata for advanced training and analysis:
This BFSI-focused dataset is ideal for:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Presenting the Malayalam Scripted Monologue Speech Dataset for the Telecom Domain, a purpose-built dataset created to accelerate the development of Malayalam speech recognition and voice AI models specifically tailored for the telecommunications industry.
This dataset includes over 6,000 high-quality scripted prompt recordings in Malayalam, representing real-world telecom customer service scenarios. It’s designed to support the training of speech-based AI systems used in call centers, virtual agents, and voice-powered support tools.
The dataset reflects a wide variety of common telecom customer interactions, including:
To maximize contextual richness, prompts include:
Each audio file is paired with an accurate, verbatim transcription for precise model training:
Detailed metadata is included to enhance dataset
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Malayalam Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Malayalam speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.
This dataset includes over 6,000 scripted prompt recordings in Malayalam, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.
This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:
To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:
Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.
Each data point is enriched with detailed metadata for advanced training and analysis:
This BFSI-focused dataset is ideal
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Telugu Scripted Monologue Speech Dataset tailored for the BFSI (Banking, Financial Services, and Insurance) domain. This dataset empowers the development of advanced Telugu speech recognition systems, natural language understanding models, and conversational AI solutions focused on the BFSI sector.
This dataset includes over 6,000 scripted prompt recordings in Telugu, covering a wide range of realistic banking and finance-related scenarios to support robust ASR and voice AI systems.
This dataset spans multiple BFSI-related themes to simulate practical customer interaction scenarios:
To make the dataset as context-rich as possible, each prompt integrates commonly encountered real-world BFSI elements:
Every audio file is paired with verbatim transcription to streamline ASR and NLP model development.
Each data point is enriched with detailed metadata for advanced training and analysis:
This BFSI-focused dataset is ideal
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Bengali Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Bengali -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 30 hours of dual-channel audio recordings between native Bengali speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Malayalam Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for Malayalam -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 30 hours of dual-channel audio recordings between native Malayalam speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Indian English Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for English -speaking travelers.
Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.
The dataset includes 30 hours of dual-channel audio recordings between native Indian English speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.
Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).
These scenarios help models understand and respond to diverse traveler needs in real-time.
Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.
Extensive metadata enriches each call and speaker for better filtering and AI training:
This dataset is ideal for a variety of AI use cases in the travel and tourism space: