27 datasets found
  1. F

    Indian English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Indian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Indian English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Indian accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Indian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Indian English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of India to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Indian English.
    Voice Assistants: Build smart assistants capable of understanding natural Indian conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  2. s

    Speech Accent Archive

    • marketplace.sshopencloud.eu
    • kaggle.com
    Updated Apr 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Speech Accent Archive [Dataset]. https://marketplace.sshopencloud.eu/dataset/jnNNLE
    Explore at:
    Dataset updated
    Apr 24, 2020
    Description

    Everyone who speaks a language, speaks it with an accent. A particular accent essentially reflects a person's linguistic background. When people listen to someone speak with a different accent from their own, they notice the difference, and they may even make certain biased social judgments about the speaker. The speech accent archive is established to uniformly exhibit a large set of speech accents from a variety of language backgrounds. Native and non-native speakers of English all read the same English paragraph and are carefully recorded. The archive is constructed as a teaching tool and as a research tool. It is meant to be used by linguists as well as other people who simply wish to listen to and compare the accents of different English speakers. This dataset allows you to compare the demographic and linguistic backgrounds of the speakers in order to determine which variables are key predictors of each accent. The speech accent archive demonstrates that accents are systematic rather than merely mistaken speech. All of the linguistic analyses of the accents are available for public scrutiny. We welcome comments on the accuracy of our transcriptions and analyses.

  3. F

    Audio Visual Speech Dataset: Indian English

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Audio Visual Speech Dataset: Indian English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/indian-english-visual-speech-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Indian English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.

    Dataset Content

    This visual speech dataset contains 1000 videos in Indian English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.

    Participant Diversity:
    Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of India.
    Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Video Data

    While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.

    Recording Details:
    File Duration: Average duration of 30 seconds to 3 minutes per video.
    Formats: Videos are available in MP4 or MOV format.
    Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.
    Device: Both the latest Android and iOS devices are used in this collection.
    Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:
    Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.
    Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.
    Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.
    Face Orientation: Contains straight face and tilted face angles.
    Participant Positions: Records participants in both standing and seated positions.
    Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.
    Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.
    Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.
    Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:
    Happy
    Sad
    Excited
    Angry
    Annoyed
    Normal
    Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

    Metadata

    The dataset provides comprehensive metadata for each video recording and participant:

  4. B

    CSLU: Foreign Accented English Release 1.2

    • borealisdata.ca
    • dataone.org
    Updated Apr 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T Lander (2023). CSLU: Foreign Accented English Release 1.2 [Dataset]. http://doi.org/10.5683/SP2/K7EQTE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2023
    Dataset provided by
    Borealis
    Authors
    T Lander
    License

    https://borealisdata.ca/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.5683/SP2/K7EQTEhttps://borealisdata.ca/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.5683/SP2/K7EQTE

    Description

    Introduction This file contains documentation on CSLU: Foreign Accented English Release 1.2, Linguistic Data Consortium (LDC) catalog number LDC2006S38 and isbn 1-58563-392-5. CSLU: Foreign Accented English Release 1.2 consists of continuous speech in English by native speakers of 22 different languages: Arabic, Cantonese, Czech, Farsi, French, German, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Mandarin Chinese, Malay, Polish, Portuguese (Brazilian and Iberian), Russian, Swedish, Spanish, Swahili, Tamil and Vietnamese. The corpus contains 4925 telephone-quality utterances, information about the speakers' linguistic backgrounds and perceptual judgments about the accents in the utterances. The speakers were asked to speak about themselves in English for 20 seconds. Three native speakers of American English independently listened to each utterance and judged the speakers' accents on a 4-point scale: negligible/no accent, mild accent, strong accent and very strong accent. This corpus is intended to support the study of the underlying characteristics of foreign accent and to enable research, development and evaluation of algorithms for the identification and understanding of accented speech. Some of the files in this corpus are also contained in CSLU: 22 Languages Corpus, LDC2005S26. Samples For an example of the data in this corpus, please listen to this audio sample. Copyright Portions © 2000-2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2007 Trustees of the University of Pennsylvania

  5. Non-Native Children Speech Mini Corpus

    • kaggle.com
    zip
    Updated Oct 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kodali Radha (2022). Non-Native Children Speech Mini Corpus [Dataset]. https://www.kaggle.com/datasets/kodaliradha20phd7093/nonnative-children-speech-mini-corpus/discussion
    Explore at:
    zip(1214699262 bytes)Available download formats
    Dataset updated
    Oct 26, 2022
    Authors
    Kodali Radha
    Description

    There were a total of 20 children, 11 females and 9 males, ranging in age from 7 to 12. All of the children are native speakers of Telugu, an Indian regional language, who are learning English as a second language. All of the audio clips were acquired in a .wav file using the open source SurveyLex platform, which supports dual channel at 44.1 kHz and a data rate of 16 bits per sample. Every questionnaire is conducted as many times as the child can, up to a maximum of 10 times per child, to assess the variation in words and sentences.

  6. F

    Indian English Call Center Data for Travel AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Travel AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    India
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Travel industry is purpose-built to power the next generation of voice AI applications for travel booking, customer support, and itinerary assistance. With over 30 hours of unscripted, real-world conversations, the dataset enables the development of highly accurate speech recognition and natural language understanding models tailored for English -speaking travelers.

    Created by FutureBeeAI, this dataset supports researchers, data scientists, and conversational AI teams in building voice technologies for airlines, travel portals, and hospitality platforms.

    Speech Data

    The dataset includes 30 hours of dual-channel audio recordings between native Indian English speakers engaged in real travel-related customer service conversations. These audio files reflect a wide variety of topics, accents, and scenarios found across the travel and tourism industry.

    Participant Diversity:
    Speakers: 60 native Indian English contributors from our verified pool.
    Regions: Covering multiple India provinces to capture accent and dialectal variation.
    Participant Profile: Balanced representation of age (18–70) and gender (60% male, 40% female).
    Recording Details:
    Conversation Nature: Naturally flowing, spontaneous customer-agent calls.
    Call Duration: Between 5 and 15 minutes per session.
    Audio Format: Stereo WAV, 16-bit depth, at 8kHz and 16kHz.
    Recording Environment: Captured in controlled, noise-free, echo-free settings.

    Topic Diversity

    Inbound and outbound conversations span a wide range of real-world travel support situations with varied outcomes (positive, neutral, negative).

    Inbound Calls:
    Booking Assistance
    Destination Information
    Flight Delays or Cancellations
    Support for Disabled Passengers
    Health and Safety Travel Inquiries
    Lost or Delayed Luggage, and more
    Outbound Calls:
    Promotional Travel Offers
    Customer Feedback Surveys
    Booking Confirmations
    Flight Rescheduling Alerts
    Visa Expiry Notifications, and others

    These scenarios help models understand and respond to diverse traveler needs in real-time.

    Transcription

    Each call is accompanied by manually curated, high-accuracy transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-Stamped Segments
    Non-speech Markers (e.g., pauses, coughs)
    High transcription accuracy by dual-layered transcription review ensures word error rate under 5%.

    Metadata

    Extensive metadata enriches each call and speaker for better filtering and AI training:

    Participant Metadata: ID, age, gender, region, accent, and dialect.
    Conversation Metadata: Topic, domain, call type, sentiment, and audio specs.

    Usage and Applications

    This dataset is ideal for a variety of AI use cases in the travel and tourism space:

    ASR Systems: Train English speech-to-text engines for travel platforms.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left:

  7. a

    American and Alaska Native population that speaks English and Native...

    • catalog.epscor.alaska.edu
    Updated Dec 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). American and Alaska Native population that speaks English and Native language 2010-2014 [Dataset]. https://catalog.epscor.alaska.edu/dataset/american-and-alaska-native-population-that-speaks-english-and-native-language-2010-2014
    Explore at:
    Dataset updated
    Dec 17, 2019
    Area covered
    United States, Alaska
    Description

    This data was made as part of the Alaska Experimental Program to Stimulate Competitive Research (EPSCoR) Northern Test Case. The data can be used to look at language skills and retention over time. This data is the percent of the American and Alaska Native population that speaks only Other. Other languages include: Navajo, Other Native American languages, Hungarian, Arabic, Hebrew, African languages, All other languages. We chose only Natives because our interest is Alaska Natives. However, data for places like Anchorage might have a large other Native presence which should be examined. Source: American Community Survey (ACS) Extent: Data is for all communities in Alaska. Notes: We chose only Natives because our interest is Alaska Natives. However, data for places like Anchorage might have a large other Native presence which should be examined.

  8. H

    Ability To Speak English By Nativity

    • opendata.hawaii.gov
    • data.wu.ac.at
    csv, json, rdf, xml
    Updated Dec 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Labor & Industrial Relations (2019). Ability To Speak English By Nativity [Dataset]. https://opendata.hawaii.gov/dataset/ability-to-speak-english-by-nativity
    Explore at:
    rdf, csv, json, xmlAvailable download formats
    Dataset updated
    Dec 12, 2019
    Dataset authored and provided by
    Labor & Industrial Relations
    Description

    (Excluding those less than 5 years old or speak only English)

    Hawaii’s Limited English Proficient (LEP) Population: A Demographic and Socio-Economic Profile

  9. h

    english_dialects

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoach Lacombe, english_dialects [Dataset]. https://huggingface.co/datasets/ylacombe/english_dialects
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Yoach Lacombe
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for "english_dialects"

      Dataset Summary
    

    This dataset consists of 31 hours of transcribed high-quality audio of English sentences recorded by 120 volunteers speaking with different accents of the British Isles. The dataset is intended for linguistic analysis as well as use for speech technologies. The speakers self-identified as native speakers of Southern England, Midlands, Northern England, Welsh, Scottish and Irish varieties of English. The recording scripts… See the full description on the dataset page: https://huggingface.co/datasets/ylacombe/english_dialects.

  10. 1,012 Hours - Indian English Speech Data by Mobile Phone

    • nexdata.ai
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 1,012 Hours - Indian English Speech Data by Mobile Phone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/940?source=Huggingface
    Explore at:
    Dataset updated
    Oct 5, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    India
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
    Description

    English(India) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers( 2,100 Indian native speakers), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  11. F

    Indian English Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native Indian English speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native Indian English speakers from our verified contributor community.
    Regions: Representing different provinces across India to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for English real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  12. F

    Indian English Call Center Data for Delivery & Logistics AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Delivery & Logistics AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/delivery-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.

    Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native Indian English speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.

    Participant Diversity:
    Speakers: 60 native Indian English speakers from our verified contributor pool.
    Regions: Multiple provinces of India for accent and dialect diversity.
    Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.
    Call Duration: 5 to 15 minutes on average.
    Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in clean, noise-free, echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.

    Inbound Calls:
    Order Tracking
    Delivery Complaints
    Undeliverable Addresses
    Return Process Enquiries
    Delivery Method Selection
    Order Modifications, and more
    Outbound Calls:
    Delivery Confirmations
    Subscription Offer Calls
    Incorrect Address Follow-ups
    Missed Delivery Notifications
    Delivery Feedback Surveys
    Out-of-Stock Alerts, and others

    This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.

    Transcription

    All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, noise)
    High transcription accuracy with word error rate under 5% via dual-layer quality checks.

    These transcriptions support fast, reliable model development for English voice AI applications in the delivery sector.

    Metadata

    Detailed metadata is included for each participant and conversation:

    Participant Metadata: ID, age, gender, region, accent, dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.

    This metadata aids in training specialized models, filtering demographics, and running advanced analytics.

    Usage and Applications

    <p

  13. The ORBIT (Object Recognition for Blind Image Training)-India Dataset

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones (2025). The ORBIT (Object Recognition for Blind Image Training)-India Dataset [Dataset]. http://doi.org/10.5281/zenodo.12608444
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.

    Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.

    The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.

    This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.

    REFERENCES:

    1. Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597

    2. microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset

    3. Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641

  14. D

    Replication Data for: Listening to Accents: Comprehensibility, accentedness...

    • dataverse.no
    • dataverse.azure.uit.no
    • +1more
    pdf +3
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gil Verbeke; Gil Verbeke; Ellen Simon; Ellen Simon (2023). Replication Data for: Listening to Accents: Comprehensibility, accentedness and intelligibility of native and non-native English speech [Dataset]. http://doi.org/10.18710/8F0Q0L
    Explore at:
    txt(40079), pdf(221820), pdf(189915), text/x-r-notebook(14258), text/comma-separated-values(6445), text/comma-separated-values(4247), text/comma-separated-values(7463), pdf(113103), text/comma-separated-values(5153), text/comma-separated-values(2991), text/comma-separated-values(627655)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    DataverseNO
    Authors
    Gil Verbeke; Gil Verbeke; Ellen Simon; Ellen Simon
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Nov 2021 - Jan 2022
    Area covered
    Belgium, Flanders
    Dataset funded by
    Ghent University
    Description

    Dataset abstract This dataset contains the results from 33 Flemish English as a Foreign Language (EFL) learners, who were exposed to eight native and non-native accents of English. These participants completed (i) a comprehensibility and accentedness rating task, followed by (ii) an orthographic transcription task. In the first task, listeners were asked to rate eight speakers of English on comprehensibility and accentedness on a nine-point scale (1 = easy to understand/no accent; 9 = hard to understand/strong accent). How Accentedness ratings and listeners' Familiarity with the different accents impacted on their Comprehensibility judgements was measured using a linear mixed-effects model. The orthographic transcription task, then, was used to verify how well listeners actually understood the different accents of English (i.e. intelligibility). To that end, participants' transcription Accuracy was measured as the number of correctly transcribed words and was estimated using a logistic mixed-effects model. Finally, the relation between listeners' self-reported ease of understanding the different speakers (comprehensibility) and their actual understanding of the speakers (intelligibility) was assessed using a linear mixed-effects regression. R code for the data analysis is provided. Article abstract This study investigates how well English as a Foreign Language (EFL) learners report understanding (i.e. comprehensibility) and actually understand (i.e. intelligibility) native and non-native accents of English, and how EFL learners’ self-reported ease of understanding and actual understanding of these accents are aligned. Thirty-three Dutch-speaking EFL learners performed a comprehensibility and accentedness judgement task, followed by an orthographic transcription task. The judgement task elicited listeners’ scalar ratings of authentic speech from eight speakers with traditional Inner, Outer and Expanding Circle accents. The transcription task assessed listeners’ actual understanding of 40 sentences produced by the same eight speakers. Speakers with Inner Circle accents were reported to be more comprehensible than speakers with non-Inner Circle accents, with Expanding Circle speakers being easier to understand than Outer Circle speakers. The strength of a speaker’s accent significantly affected listeners’ comprehensibility ratings. Most speakers were highly intelligible, with intelligibility scores ranging between 79% and 95%. Listeners’ self-reported ease of understanding the speakers in our study generally matched their actual understanding of those speakers, but no correlation between comprehensibility and intelligibility was detected. The study foregrounds the effect of native and non-native accents on comprehensibility and intelligibility, and highlights the importance of multidialectal listening skills.

  15. d

    Replication Data for: Phonetic reduction in native and non-native English...

    • search.dataone.org
    • dataverse.no
    Updated Jan 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verbeke, Gil; Mitterer, Holger; Simon, Ellen (2025). Replication Data for: Phonetic reduction in native and non-native English speech: Assessing the intelligibility for L2 listeners [Dataset]. http://doi.org/10.18710/OHP3O3
    Explore at:
    Dataset updated
    Jan 10, 2025
    Dataset provided by
    DataverseNO
    Authors
    Verbeke, Gil; Mitterer, Holger; Simon, Ellen
    Time period covered
    Jan 1, 2023 - Sep 30, 2023
    Description

    Dataset abstract This dataset contains the results from 40 L1 British English, 80 Belgian Dutch and 80 European Spanish listeners, who were exposed to English speakers with a General British English, Newcastle and French accent. In the first experiment, participants completed (i) a demographic and linguistic background questionnaire, (ii) an orthographic transcription task and (iii) a vocabulary/general proficiency test (LexTALE; cf. Lemhöfer & Broersma, 2012). In the transcription task, participants listened to 120 stimulus sentences and were asked to write down what the speakers said. Crucially, each sentence contained one target word that was either phonetically unreduced or phonetically reduced. How well the different groups of listeners understood the speakers (i.e. Intelligibility), and more importantly the unreduced and reduced words, was measured as the number of correctly transcribed target words and was assessed using a linear mixed-effects regression model. In the second experiment, participants completed (i) a demographic and linguistic background questionnaire, (ii) an auditory lexical decision task and (iii) a vocabulary/general proficiency test (LexTALE; cf. Lemhöfer & Broersma, 2012). In the lexical decision task, participants were asked to decide whether a particular target word was a real word in English or a nonword. Participants' lexical decision responses (word vs. nonword) were analyzed using a mixed-effects logistic regression model, and their response times (i.e. time interval between stimulus offset and keypress) were analysed using a linear mixed-effects regression model. R code for the data analysis is provided. Article abstract This study examines to what extent phonetic reduction in different accents affects intelligibility for non-native (L2) listeners, and if similar reduction processes in listeners’ first language (L1) facilitate the recognition and processing of reduced word forms in the target language. In two experiments, 80 Dutch-speaking and 80 Spanish-speaking learners of English were presented with unreduced and reduced pronunciation variants in native and non-native English speech. Results showed that unreduced words are recognized more accurately and more quickly than reduced words, regardless of whether these variants occur in non-regionally, regionally or non-native accented speech. No differential effect of phonetic reduction on intelligibility and spoken word recognition was observed between Dutch-speaking and Spanish-speaking participants, despite the absence of strong vowel reduction in Spanish. These findings suggest that similar speech processes in listeners’ L1 and L2 do not invariably lead to an intelligibility benefit or a cross-linguistic facilitation effect in lexical access.

  16. F

    American English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native US English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of United States of America to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for US English.
    Voice Assistants: Build smart assistants capable of understanding natural American conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  17. F

    Indian English Call Center Data for Telecom AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Indian English Call Center Data for Telecom AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/telecom-call-center-conversation-english-india
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Indian English Call Center Speech Dataset for the Telecom industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English-speaking telecom customers. Featuring over 30 hours of real-world, unscripted audio, it delivers authentic customer-agent interactions across key telecom support scenarios to help train robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI engineers, telecom automation teams, and NLP researchers to build high-accuracy, production-ready models for telecom-specific use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native Indian English speakers. Captured in realistic customer support settings, these conversations span a wide range of telecom topics from network complaints to billing issues, offering a strong foundation for training and evaluating telecom voice AI solutions.

    Participant Diversity:
    Speakers: 60 native Indian English speakers from our verified contributor pool.
    Regions: Representing multiple provinces across India to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral ensuring broad scenario coverage for telecom AI development.

    Inbound Calls:
    Phone Number Porting
    Network Connectivity Issues
    Billing and Payments
    Technical Support
    Service Activation
    International Roaming Enquiry
    Refund Requests and Billing Adjustments
    Emergency Service Access, and others
    Outbound Calls:
    Welcome Calls & Onboarding
    Payment Reminders
    Customer Satisfaction Surveys
    Technical Updates
    Service Usage Reviews
    Network Complaint Status Calls, and more

    This variety helps train telecom-specific models to manage real-world customer interactions and understand context-specific voice patterns.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, coughs)
    High transcription accuracy with word error rate < 5% thanks to dual-layered quality checks.

    These transcriptions are production-ready, allowing for faster development of ASR and conversational AI systems in the Telecom domain.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  18. q

    2011. Population in Private Households by Age Groups, Sex, First Official...

    • desq.quescren.ca
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). 2011. Population in Private Households by Age Groups, Sex, First Official Language Spoken showing Aboriginal Identity and Registered or Treaty Indian Status for selected geography - Dataset - Data Portal on English-Speaking Quebec [Dataset]. https://desq.quescren.ca/dataset/chssn-2011-co-1594-table6
    Explore at:
    Dataset updated
    Mar 8, 2023
    Area covered
    Quebec
    Description

    This ZIP file contains an IVT file.

  19. MOBIO

    • data.europa.eu
    • data.niaid.nih.gov
    • +1more
    unknown
    Updated Nov 30, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2010). MOBIO [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-4269551?locale=da
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Nov 30, 2010
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    Description

    MOBIO is a dataset for mobile face and speaker recognition. The dataset consists of bi-modal (audio and video) data taken from 150 people. The dataset has a female-male ratio of nearly 1:2 (99 males and 51 females) and was collected from August 2008 until July 2010 in six different sites from five different countries. This led to a diverse bi-modal dataset with both native and non-native English speakers. In total 12 sessions were captured for each client: 6 sessions for Phase I and 6 sessions for Phase II. The Phase I data consists of 21 questions with the question types ranging from: Short Response Questions, Short Response Free Speech, Set Speech, and Free Speech. The Phase II data consists of 11 questions with the question types ranging from: Short Response Questions, Set Speech, and Free Speech. A more detailed description of the questions asked of the clients is provided below. The database was recorded using two mobile devices: a mobile phone and a laptop computer. The mobile phone used to capture the database was a NOKIA N93i mobile while the laptop computer was a standard 2008 MacBook. The laptop was only used to capture part of the first session, this first session consists of data captured on both the laptop and the mobile phone. Detailed Description of Questions Please note that the answers to the Short Response Free Speech and Free Speech questions DO NOT necessarily relate to the question as the sole purpose is to have the subject speaking free speech, therefore, the answers to ALL of these questions are assumed to be false. 1. Short Response Questions The short response questions consisted of five pre-defined questions, which were: What is your name? – the user supplies their fake name What is your address? – the user supplies their fake address What is your birthdate? – the user supplies their fake birthdate What is your license number? – the user supplied their fake ID card number (the same for each person) What is your credit card number? – the user supplies their fake Card number 2. Short Response Free Speech There were five random questions taken form a list of 30-40 questions. The user had to answer these questions by speaking for approximately 5 seconds of recording (sometimes more and sometimes less). 3. Set Speech The users were asked to read pre-defined text out aloud. This text was designed to take longer than 10 seconds to utter and the participants were allowed to correct themselves while reading these paragraphs. The text that was read was: I have signed the MOBIO consent form and I understand that my biometric data is being captured for a database that might be made publicly available for research purposes. I understand that I am solely responsible for the content of my statements and my behaviour. I will ensure that when answering a question I do not provide any personal information in response to any question. 4. Free Speech The free speech session consisted of 10 random questions from a list of approximately 30 questions. The answers to each of these questions took approximately 10 seconds (sometimes less and sometimes more). Acknowledgements Elie Khoury, Laurent El-Shafey, Christopher McCool, Manuel Günther, Sébastien Marcel, “Bi-modal biometric authentication on mobile phones in challenging conditions”, Image and Vision Computing Volume 32, Issue 12, 2014. 10.1016/j.imavis.2013.10.001 https://publications.idiap.ch/index.php/publications/show/2689

  20. Fourteen-channel EEG with Imagined Speech (FEIS) dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Wellington; Jonathan Clayton; Scott Wellington; Jonathan Clayton (2020). Fourteen-channel EEG with Imagined Speech (FEIS) dataset [Dataset]. http://doi.org/10.5281/zenodo.3554128
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Scott Wellington; Jonathan Clayton; Scott Wellington; Jonathan Clayton
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description
    ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
    
    Welcome to the FEIS (Fourteen-channel EEG with Imagined Speech) dataset.
    
    <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
    
    The FEIS dataset comprises Emotiv EPOC+ [1] EEG recordings of:
    
    * 21 participants listening to, imagining speaking, and then actually speaking
     16 English phonemes (see supplementary, below)
    
    * 2 participants listening to, imagining speaking, and then actually speaking
     16 Chinese syllables (see supplementary, below)
    
    For replicability and for the benefit of further research, this dataset
    includes the complete experiment set-up, including participants' recorded
    audio and 'flashcard' screens for audio-visual prompts, Lua script and .mxs
    scenario for the OpenVibe [2] environment, as well as all Python scripts
    for the preparation and processing of data as used in the supporting
    studies (submitted in support of completion of the MSc Speech and Language
    Processing with the University of Edinburgh):
    
    * J. Clayton, "Towards phone classification from imagined speech using
     a lightweight EEG brain-computer interface," M.Sc. dissertation,
     University of Edinburgh, Edinburgh, UK, 2019.
    
    * S. Wellington, "An investigation into the possibilities and limitations
     of decoding heard, imagined and spoken phonemes using a low-density,
     mobile EEG headset," M.Sc. dissertation, University of Edinburgh,
     Edinburgh, UK, 2019.
    
    Each participant's data comprise 5 .csv files -- these are the 'raw'
    (unprocessed) EEG recordings for the 'stimuli', 'articulators' (see
    supplementary, below) 'thinking', 'speaking' and 'resting' phases per epoch
    for each trial -- alongside a 'full' .csv file with the end-to-end
    experiment recording (for the benefit of calculating deltas).
    
    To guard against software deprecation or inaccessability, the full repository
    of open-source software used in the above studies is also included.
    
    We hope for the FEIS dataset to be of some utility for future researchers,
    due to the sparsity of similar open-access databases. As such, this dataset
    is made freely available for all academic and research purposes (non-profit).
    
    ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
    
    REFERENCING
    
    <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
    
    If you use the FEIS dataset, please reference:
    
    * S. Wellington, J. Clayton, "Fourteen-channel EEG with Imagined Speech
     (FEIS) dataset," v1.0, University of Edinburgh, Edinburgh, UK, 2019.
     doi:10.5281/zenodo.3369178
    
    ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
    
    LEGAL
    
    <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
    
    The research supporting the distribution of this dataset has been approved by
    the PPLS Research Ethics Committee, School of Philosophy, Psychology and
    Language Sciences, University of Edinburgh (reference number: 435-1819/2).
    
    This dataset is made available under the Open Data Commons Attribution License
    (ODC-BY): http://opendatacommons.org/licenses/by/1.0
    
    ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
    
    ACKNOWLEDGEMENTS
    
    <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
    
    The FEIS database was compiled by:
    
    Scott Wellington (MSc Speech and Language Processing, University of Edinburgh)
    Jonathan Clayton (MSc Speech and Language Processing, University of Edinburgh)
    
    Principal Investigators:
    
    Oliver Watts (Senior Researcher, CSTR, University of Edinburgh)
    Cassia Valentini-Botinhao (Senior Researcher, CSTR, University of Edinburgh)
    
    <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
    
    METADATA
    
    ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
    
    For participants, dataset refs 01 to 21:
    
    01 - NNS
    02 - NNS
    03 - NNS, Left-handed
    04 - E
    05 - E, Voice heard as part of 'stimuli' portions of trials belongs to
       particpant 04, due to microphone becoming damaged and unusable prior to
       recording
    06 - E
    07 - E
    08 - E, Ambidextrous
    09 - NNS, Left-handed
    10 - E
    11 - NNS
    12 - NNS, Only sessions one and two recorded (out of three total), as
       particpant had to leave the recording session early
    13 - E
    14 - NNS
    15 - NNS
    16 - NNS
    17 - E
    18 - NNS
    19 - E
    20 - E
    21 - E
    
    E = native speaker of English
    NNS = non-native speaker of English (>= C1 level)
    
    For participants, dataset refs chinese-1 and chinese-2:
    
    chinese-1 - C
    chinese-2 - C, Voice heard as part of 'stimuli' portions of trials belongs to
          participant chinese-1
    
    C = native speaker of Chinese
    
    <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
    
    SUPPLEMENTARY
    
    ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
    
    Under the international 10-20 system, the Emotiv EPOC+ headset 14 channels:
    
    F3 FC5 AF3 F7 T7 P7 O1 O2 P8 T8 F8 AF4 FC6 F4
    
    The 16 English phonemes investigated in dataset refs 01 to 21:
    
    /i/ /u:/ /æ/ /ɔ:/ /m/ /n/ /ŋ/ /f/ /s/ /ʃ/ /v/ /z/ /ʒ/ /p /t/ /k/
    
    The 16 Chinese syllables investigated in dataset refs chinese-1 and chinese-2:
    
    mā má mǎ mà mēng méng měng mèng duō duó duǒ duò tuī tuí tuǐ tuì
    
    All references to 'articulators' (e.g. as part of filenames) refer to the
    1-second 'fixation point' portion of trials. The name is a layover from
    preliminary trials which were modelled on the KARA ONE database
    (http://www.cs.toronto.edu/~complingweb/data/karaOne/karaOne.html) [3].
    
    <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <>< <><
    ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><> ><>
    
    [1] Emotiv EPOC+. https://emotiv.com/epoc. Accessed online 14/08/2019.
    
    [2] Y. Renard, F. Lotte, G. Gibert, M. Congedo, E. Maby, V. Delannoy,
      O. Bertrand, A. Lécuyer. “OpenViBE: An Open-Source Software Platform
      to Design, Test and Use Brain-Computer Interfaces in Real and Virtual
      Environments”, Presence: teleoperators and virtual environments,
      vol. 19, no 1, 2010.
    
    [3] S. Zhao, F. Rudzicz. "Classifying phonological categories in imagined
      and articulated speech." In Proceedings of ICASSP 2015, Brisbane
      Australia, 2015.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FutureBee AI (2022). Indian English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-india

Indian English General Conversation Speech Dataset for ASR

Indian English General Conversation Speech Corpus

Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Dataset funded by
FutureBeeAI
Description

Introduction

Welcome to the Indian English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Indian English communication.

Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic Indian accents and dialects.

Speech Data

The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Indian English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

Participant Diversity:
Speakers: 60 verified native Indian English speakers from FutureBeeAI’s contributor community.
Regions: Representing various provinces of India to ensure dialectal diversity and demographic balance.
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
Recording Details:
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
Duration: Each conversation ranges from 15 to 60 minutes.
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity

The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

Sample Topics Include:
Family & Relationships
Food & Recipes
Education & Career
Healthcare Discussions
Social Issues
Technology & Gadgets
Travel & Local Culture
Shopping & Marketplace Experiences, and many more.

Transcription

Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

Transcription Highlights:
Speaker-segmented dialogues
Time-coded utterances
Non-speech elements (pauses, laughter, etc.)
High transcription accuracy, achieved through double QA pass, average WER < 5%

These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

Metadata

The dataset comes with granular metadata for both speakers and recordings:

Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

Usage and Applications

This dataset is a versatile resource for multiple English speech and language AI applications:

ASR Development: Train accurate speech-to-text systems for Indian English.
Voice Assistants: Build smart assistants capable of understanding natural Indian conversations.
<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

Search
Clear search
Close search
Google apps
Main menu