100+ datasets found
  1. Spanish Spontaneous Dialogue speech dataset

    • kaggle.com
    zip
    Updated Jun 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Wong (2024). Spanish Spontaneous Dialogue speech dataset [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/spanish-spontaneous-dialogue-speech-dataset
    Explore at:
    zip(93236 bytes)Available download formats
    Dataset updated
    Jun 7, 2024
    Authors
    Frank Wong
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Spanish(Spain) Spontaneous Dialogue Telephony speech dataset

    Description

    Spanish(Spain) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(600 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1234?source=Kaggle

    Format

    8kHz 8bit, a-law/u-law pcm, mono channel

    Content category

    Dialogue based on given topics

    Recording condition

    Low background noise (indoor)

    Recording device

    Telephony

    Country

    Spain(ESP)

    Language(Region) Code

    es-ES

    Language

    Spanish

    Speaker

    600 people in total, 49% male and 51% female

    Features of annotation

    Transcription text, timestamp, speaker ID, gender

    Accuracy rate

    Word accuracy rate(WAR) 98%

    Licensing Information

    Commercial License

  2. F

    Mexican Spanish General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mexican Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Mexican Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Mexican Spanish communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Mexican accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Mexican Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Mexican Spanish speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Mexico to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Spanish speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Mexican Spanish.
    Voice Assistants: Build smart assistants capable of understanding natural Mexican conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  3. Spanish Speech Recognition Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). Spanish Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/spanish-speech-recognition-dataset
    Explore at:
    zip(93217 bytes)Available download formats
    Dataset updated
    Jun 25, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Spanish Speech Dataset for recognition task

    Dataset comprises 488 hours of telephone dialogues in Spanish, collected from 600 native speakers across various topics and domains. This dataset boasts an impressive 98% word accuracy rate, making it a valuable resource for advancing speech recognition technology.

    By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

    The dataset includes high-quality audio recordings with text transcriptions, making it ideal for training and evaluating speech recognition models.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

    This dataset is a valuable resource for researchers and developers working on speech recognition, language models, and speech technology.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  4. English Speech Dataset (Spanish Speakers) – 388 Hours Scripted Monologue by...

    • nexdata.ai
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). English Speech Dataset (Spanish Speakers) – 388 Hours Scripted Monologue by Smartphone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/990
    Explore at:
    Dataset updated
    Oct 31, 2023
    Dataset authored and provided by
    Nexdata
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
    Description

    This dataset contains 388 hours of English speech from Spanish speakers, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(891 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  5. F

    Colombian Spanish General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Colombian Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-spanish-colombia
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Colombia
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Colombian Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Colombian Spanish communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Colombian accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Colombian Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Colombian Spanish speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Colombia to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Spanish speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Colombian Spanish.
    Voice Assistants: Build smart assistants capable of understanding natural Colombian conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex;

  6. a

    Percent Spanish Speakers

    • gis-kingcounty.opendata.arcgis.com
    Updated Aug 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    King County (2016). Percent Spanish Speakers [Dataset]. https://gis-kingcounty.opendata.arcgis.com/datasets/kingcounty::percent-spanish-speakers
    Explore at:
    Dataset updated
    Aug 10, 2016
    Dataset authored and provided by
    King County
    Area covered
    Description

    Languages:Percent Spanish Speakers: Basic demographics by census tracts in King County based on current American Community Survey 5 Year Average (ACS). Included demographics are: total population; foreign born; median household income; English language proficiency; languages spoken; race and ethnicity; sex; and age. Numbers and derived percentages are estimates based on the current year's ACS. GEO_ID_TRT is the key field and may be used to join to other demographic Census data tables.

  7. h

    spanish-speech-recognition-dataset

    • huggingface.co
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata NLP (2025). spanish-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/ud-nlp/spanish-speech-recognition-dataset
    Explore at:
    Dataset updated
    Jul 30, 2025
    Authors
    Unidata NLP
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Spanish Telephone Dialogues Dataset - 488 Hours

    Dataset comprises 488 hours of high-quality telephone audio recordings in Spanish, featuring 600 native speakers and achieving a 95% sentence accuracy rate. Designed for advancing speech recognition models and language processing, this extensive speech data corpus covers diverse topics and domains, making it ideal for training robust automatic speech recognition (ASR) systems. - Get the data

      Dataset characteristics:… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/spanish-speech-recognition-dataset.
    
  8. Spanish(Mexico) Real-world Casual Conversation and Monologue speech dataset

    • nexdata.ai
    Updated Jul 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Spanish(Mexico) Real-world Casual Conversation and Monologue speech dataset [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1715
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    World, Mexico
    Variables measured
    Format, Country, Language, Accuracy Rate, Content category, Recording condition, Language(Region) Code, Features of annotation
    Description

    Spanish(Mexico) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  9. H

    Language as a Barrier to Local Government Access: Spanish Language Access to...

    • dataverse.harvard.edu
    Updated Dec 31, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Scott McDonald (2015). Language as a Barrier to Local Government Access: Spanish Language Access to Local Government Websites [Dataset]. http://doi.org/10.7910/DVN/USCRQN
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 31, 2015
    Dataset provided by
    Harvard Dataverse
    Authors
    J. Scott McDonald
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data set scores Spanish language access to local and county government websites. Few data exist to support measuring the language accessibility of government websites by persons with limited English proficiency (LEP). The Worldwide Web is asserted as the great leveler, bringing citizens into closer contact with their governments and the services those governments provide. This is certainly the case with English speakers. However for individuals with limited English proficiency, the web has left many behind. The data is organized into two datasets: 1) cities and 2) counties. The city dataset is comprised of the 100 largest U.S. cities for 2012 (http://www.citymayors.com/gratis/uscities_100.html). Counties were sampled on two criteria: a) percentage of population that speaks Spanish or Spanish Creole at home and b) region. To obtain a regional distribution of counties, those with the highest percentages of population that speaks Spanish or Spanish Creole at home were sampled within each of four Census regions: Northeast, Midwest, South, and West.

  10. 189 Hours - Spanish(Latin America) Children Real-world Casual Conversation...

    • nexdata.ai
    Updated Nov 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 189 Hours - Spanish(Latin America) Children Real-world Casual Conversation and Monologue speech dataset [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1250
    Explore at:
    Dataset updated
    Nov 1, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Latin America, World
    Variables measured
    Age, Format, Country, Accuracy, Language, Content category, Language(Region) Code, Recording environment, Features of annotation
    Description

    Spanish(Latin America) Children Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  11. 2013 American Community Survey - Table Packages: Detailed Language Spoken in...

    • catalog.data.gov
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2023). 2013 American Community Survey - Table Packages: Detailed Language Spoken in the U.S. [Dataset]. https://catalog.data.gov/dataset/2013-american-community-survey-table-packages-detailed-language-spoken-in-the-u-s
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    United States
    Description

    This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.

  12. F

    Spanish(Spain) General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Spanish(Spain) General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-spanish-spain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Spain
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Spanish communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Spanish accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Spanish speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Spain to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Spanish speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Spanish.
    Voice Assistants: Build smart assistants capable of understanding natural Spanish conversations.
    <span

  13. 346 Hours - Spanish(Mexico) Spontaneous Dialogue Smartphone speech dataset

    • nexdata.ai
    Updated Jan 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 346 Hours - Spanish(Mexico) Spontaneous Dialogue Smartphone speech dataset [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1303
    Explore at:
    Dataset updated
    Jan 3, 2024
    Dataset authored and provided by
    Nexdata
    Area covered
    Mexico
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
    Description

    Spanish(Mexico) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(338 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  14. F

    Spanish (Spain) Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Spanish (Spain) Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-spanish-spain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Spanish Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native Spanish speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native Spanish speakers from our verified contributor community.
    Regions: Representing different provinces across Spain to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for Spanish real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

  15. m

    General conversation speech datasets in Spanish for Social Conversation

    • data.macgence.com
    mp3
    Updated Mar 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). General conversation speech datasets in Spanish for Social Conversation [Dataset]. https://data.macgence.com/dataset/general-conversation-speech-datasets-in-spanish-for-social-conversation
    Explore at:
    mp3Available download formats
    Dataset updated
    Mar 22, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    The audio dataset includes General Conversation, featuring Spanish speakers from Spain with detailed metadata.

  16. h

    EpaDB

    • huggingface.co
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koel Labs (2025). EpaDB [Dataset]. https://huggingface.co/datasets/KoelLabs/EpaDB
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Koel Labs
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    EpaDB

    EpaDB is a speech database of 50 native Spanish speakers (25 male, 25 female) from Argentina speaking English. It contains phonemic annotations using mainly the sounds supported by ARPABet with a few extensions to model Spanish influenced dialects of English. It was developed by Jazmin Vidal, Luciana Ferrer, and Leonardo Brambilla at the Speech Lab. Read more on their official github and paper.

      This Processed Version
    

    We have processed the dataset into an easily… See the full description on the dataset page: https://huggingface.co/datasets/KoelLabs/EpaDB.

  17. Spoken Language Identification

    • kaggle.com
    zip
    Updated Jul 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomasz (2018). Spoken Language Identification [Dataset]. https://www.kaggle.com/toponowicz/spoken-language-identification
    Explore at:
    zip(16022179692 bytes)Available download formats
    Dataset updated
    Jul 5, 2018
    Authors
    Tomasz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset contains speech samples of English, German and Spanish languages. Samples are equally balanced between languages, genders and speakers.

    More information at the spoken-language-dataset repository.

    Background

    The project was inspired by the TopCoder contest, Spoken Languages 2. The given dataset contains 10 second of speech recorded in 1 of 176 languages. The entire dataset has been based on bible readings. Poorly, in many cases there is a single speaker per language (male in most cases). Even worse the same single speaker exists in the test set. Of course this can't lead to a good generic solution.

    There are two ways we can take:

    • First approach is to use a big dataset where all voice or language properties (e.g. gender, age, accent) become equally possible. A good example is the Common Voice from Mozilla. Most likely this leads to the best performance. However processing such a huge dataset is expensive and adding new languages is challenging.
    • Second approach is to use a small handcrafted dataset and boost it with data augmentation. The advantage is that we can add new languages quickly. Last but not least the dataset is small thus it can be processed quickly.

    The second approach has been taken.

    LibriVox recordings were used to prepare the dataset. Particular attention was paid to a big variety of unique speakers. Big variance forces the model to concentrate more on language properties than a specific voice. Samples are equally balanced between languages, genders and speakers in order not to favour any subgroup. Finally the dataset is divided into train and test set. Speakers present in the test set, are not present in the train set. This helps estimate a generalization error.

    The core of the train set is based on 420 minutes (2520 samples) of original recordings. After applying several audio transformations (pitch, speed and noise) the train set was extended to 12180 minutes (73080 samples). The test set contains 90 minutes (540 samples) of original recordings. No data augmentation has been applied.

    Original recordings contain 90 unique speakers. The number of unique speakers was increased by adjusting pitch (8 different levels) and speed (8 different levels). After applying audio transformations there are 1530 unique speakers.

    Data structure

    The dataset is divided into 2 directories:

    • train (73080 samples)
    • test (540 samples)

    Each sample is an FLAC audio file with:

    • sample rate: 22050
    • bit depth: 16
    • channels: 1
    • duration: 10 seconds (sharp)

    The original recordings are MP3 files but they are converted into FLAC files quickly to avoid re-encoding (and losing quality) during transformations.

    The filename of the sample has following syntax:

    (language)_(gender)_(recording ID).fragment(index)[.(transformation)(index)].flac
    

    ...and variables:

    • language: en, de, or es
    • gender: m or f
    • recording ID: a hash of the URL
    • fragment index: 1-30
    • transformation: speed, pitch or noise
    • transformation index:
      • if speed: 1-8
      • if pitch: 1-8
      • if noise: 1-12

    For example:

    es_m_f7d959494477e5e7e33d4666f15311c9.fragment9.speed8.flac
    

    Sample Model

    The dataset was used to train the spoken language identification model. The trained model has 97% score (i.e. F1 metric) against the test set. Additionally it generalizes well which was confirmed against real life content. The fact that samples are prefeclty stratified was one of the reasons to achieve such a high performance.

    Feel free to create your own model and share results!

  18. Nexdata | Spanish Speech Data by Mobile Phone | 435 Hours

    • datarade.ai
    • data.nexdata.ai
    Updated Nov 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). Nexdata | Spanish Speech Data by Mobile Phone | 435 Hours [Dataset]. https://datarade.ai/data-products/nexdata-spanish-speech-data-by-mobile-phone-435-hours-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Nov 11, 2025
    Dataset authored and provided by
    Nexdata
    Area covered
    Spain
    Description

    Spanish(Spain) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers, news and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(989 people in total), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

    Format

    16kHz, 16bit, uncompressed wav, mono channel;

    Recording condition

    Low background noise(indoor), without echo;

    Content category

    Generic domain; news; human-machine interaction; smart home command and control; in-car command and control; numbers

    Recording device

    Android Smartphone, iPhone;

    Speaker

    989 speakers totally, with 49% male and 51% female ; and 57% speakers of all are in the age group of 17-25,39% speakers of all are in the age group of 26-45, 4% speakers of all are in the age group of 46-60;

    Country

    Spain(ESP);

    Language(Region) Code

    es-ES;

    Language

    Spanish;

    Features of annotation

    Transcription text;

    Accuracy Rate

    Sentence Accuracy Rate (SAR) 95%

  19. 478 Hours - Spanish Conversational Speech Data by Mobile Phone

    • nexdata.ai
    • m.nexdata.ai
    Updated Dec 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 478 Hours - Spanish Conversational Speech Data by Mobile Phone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1147
    Explore at:
    Dataset updated
    Dec 5, 2023
    Dataset authored and provided by
    Nexdata
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
    Description

    Spanish(Spain) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(596 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  20. Spanish Spontaneous Dialogue Telephony speech

    • kaggle.com
    zip
    Updated Jun 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Wong (2024). Spanish Spontaneous Dialogue Telephony speech [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/spanish-spontaneous-dialogue-telephony-speech/code
    Explore at:
    zip(215338 bytes)Available download formats
    Dataset updated
    Jun 11, 2024
    Authors
    Frank Wong
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    88-Hours-Mexican-Spanish-Conversational-Speech-Data-by-Telephone

    Description

    Spanish(Mexico) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics. Transcribed with text content, timestamp, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers(122 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied. For more details, please refer to the link:https://www.nexdata.ai/datasets/speechrecog/1352?source=Kaggle

    Format

    8kHz 8bit, a-law/u-law pcm, mono channel

    Content category

    Dialogue based on given topics

    Recording condition

    Low background noise (indoor)

    Recording device

    Telephony

    Country

    Mexico(MEX)

    Language(Region) Code

    es-MX

    Language

    Spanish

    Speaker

    122 people in total, 53% male and 47% female

    Features of annotation

    Transcription text, timestamp, speaker ID, gender, noise

    Accuracy rate

    Word accuracy rate(WAR) 98%

    Licensing Information

    Commercial License

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Frank Wong (2024). Spanish Spontaneous Dialogue speech dataset [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/spanish-spontaneous-dialogue-speech-dataset
Organization logo

Spanish Spontaneous Dialogue speech dataset

Explore at:
zip(93236 bytes)Available download formats
Dataset updated
Jun 7, 2024
Authors
Frank Wong
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Spanish(Spain) Spontaneous Dialogue Telephony speech dataset

Description

Spanish(Spain) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(600 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1234?source=Kaggle

Format

8kHz 8bit, a-law/u-law pcm, mono channel

Content category

Dialogue based on given topics

Recording condition

Low background noise (indoor)

Recording device

Telephony

Country

Spain(ESP)

Language(Region) Code

es-ES

Language

Spanish

Speaker

600 people in total, 49% male and 51% female

Features of annotation

Transcription text, timestamp, speaker ID, gender

Accuracy rate

Word accuracy rate(WAR) 98%

Licensing Information

Commercial License

Search
Clear search
Close search
Google apps
Main menu