100+ datasets found

F
Mexican Spanish General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Mexican Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-spanish-mexico
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Mexico
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Mexican Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Mexican Spanish communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Mexican accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Mexican Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native Mexican Spanish speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of Mexico to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple Spanish speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for Mexican Spanish.

•
Voice Assistants: Build smart assistants capable of understanding natural Mexican conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;
a
Percent Spanish Speakers
king-snocoplanning.opendata.arcgis.com
hub.arcgis.com
Updated Aug 10, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
King County (2016). Percent Spanish Speakers [Dataset]. https://king-snocoplanning.opendata.arcgis.com/datasets/kingcounty::percent-spanish-speakers
Explore at:
Dataset updated
Aug 10, 2016
Dataset authored and provided by
King County
Area covered

Description
Languages:Percent Spanish Speakers: Basic demographics by census tracts in King County based on current American Community Survey 5 Year Average (ACS). Included demographics are: total population; foreign born; median household income; English language proficiency; languages spoken; race and ethnicity; sex; and age. Numbers and derived percentages are estimates based on the current year's ACS. GEO_ID_TRT is the key field and may be used to join to other demographic Census data tables.
2013 American Community Survey - Table Packages: Detailed Language Spoken in...
catalog.data.gov
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). 2013 American Community Survey - Table Packages: Detailed Language Spoken in the U.S. [Dataset]. https://catalog.data.gov/dataset/2013-american-community-survey-table-packages-detailed-language-spoken-in-the-u-s
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
United States
Description
This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.
English Speech Dataset (Spanish Speakers) – 388 Hours Scripted Monologue by...
nexdata.ai
Updated Sep 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). English Speech Dataset (Spanish Speakers) – 388 Hours Scripted Monologue by Smartphone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/990?source=Huggingface
Explore at:
Dataset updated
Sep 22, 2025
Dataset authored and provided by
Nexdata
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
Description
This dataset contains 388 hours of English speech from Spanish speakers, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(891 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
E
Spanish Speaking English Speech Data by Mobile Phone - 388 Hours
catalog.elra.info
Updated Oct 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2022). Spanish Speaking English Speech Data by Mobile Phone - 388 Hours [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-S0440/
Explore at:
Dataset updated
Oct 6, 2022
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
891 Spanish native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data set can be applied for automatic speech recognition, and machine translation scenes.Format：16kHz, 16bit, uncompressed wav, mono channelRecording Environment：quiet indoor environment, low background noise, without echoRecording content (read speech)：generic category; human-machine interaction category; smart home command and control category; in-car command and control category; numbersDemographics：891 speakers totally, with 40% males and 60% females; and 41% speakers of all are in the age group of 16-25,53% speakers of all are in the age group of 26-45, 6% speakers of all are in the age group of 46-60.Device：iPhone, Android mobile phoneLanguage：EnglishApplication Scene：speech recognition; voiceprint recognition
762 Hours - Spanish(Latin America) Scripted Monologue Smartphone speech...
nexdata.ai
Updated Jan 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). 762 Hours - Spanish(Latin America) Scripted Monologue Smartphone speech dataset [Dataset]. https://www.nexdata.ai/datasets/speechrecog/970
Explore at:
Dataset updated
Jan 2, 2024
Dataset authored and provided by
Nexdata
Area covered
Latin America
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
Description
Spanish(Latin America) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers, news and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,630 people in total, such as Mexicans, Colombians, etc.), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
145 Hours - Spanish(spain) Children Real-world Casual Conversation and...
nexdata.ai
Updated Dec 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). 145 Hours - Spanish(spain) Children Real-world Casual Conversation and Monologue speech dataset [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1251
Explore at:
Dataset updated
Dec 2, 2023
Dataset authored and provided by
Nexdata
Area covered
World, Spain
Variables measured
Age, Format, Country, Accuracy, Language, Content category, Language(Region) Code, Recording environment, Features of annotation
Description
Spanish(spain) Children Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
F
Spanish (Spain) Call Center Data for Realestate AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Spanish (Spain) Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-spanish-spain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Spain
Dataset funded by
FutureBeeAI
Description
Introduction
This Spanish Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.
Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.
Speech Data
The dataset features 30 hours of dual-channel call center recordings between native Spanish speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.
•Participant Diversity:
•
Speakers: 60 native Spanish speakers from our verified contributor community.

•
Regions: Representing different provinces across Spain to ensure accent and dialect variation.

•
Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted agent-customer discussions.

•
Call Duration: Average 5–15 minutes per call.

•
Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.

•
Recording Environment: Captured in noise-free and echo-free conditions.

Topic Diversity
This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.
•Inbound Calls:
•Property Inquiries
•Rental Availability
•Renovation Consultation
•Property Features & Amenities
•Investment Property Evaluation
•Ownership History & Legal Info, and more
•Outbound Calls:
•New Listing Notifications
•Post-Purchase Follow-ups
•Property Recommendations
•Value Updates
•Customer Satisfaction Surveys, and others
Such domain-rich variety ensures model generalization across common real estate support conversations.
Transcription
All recordings are accompanied by precise, manually verified transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., background noise, pauses)
•High transcription accuracy with word error rate below 5% via dual-layer human review.
These transcriptions streamline ASR and NLP development for Spanish real estate voice applications.
Metadata
Detailed metadata accompanies each participant and conversation:
•
Participant Metadata: ID, age, gender, location, accent, and dialect.

•
Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

This enables smart filtering, dialect-focused model training, and structured dataset exploration.
Usage and Applications
This dataset is ideal for voice AI and NLP systems built for the real estate sector:
w
Dataset of books about Spanish language-Text-books for foreign speakers
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books about Spanish language-Text-books for foreign speakers [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_subject&fop0=%3D&fval0=Spanish+language-Text-books+for+foreign+speakers&j=1&j0=book_subjects
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 24 rows and is filtered where the book subjects is Spanish language-Text-books for foreign speakers. It features 9 columns including author, publication date, language, and book publisher.
h
spanish-speech-recognition-dataset
huggingface.co
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata (2025). spanish-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/spanish-speech-recognition-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2025
Authors
Unidata
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Spanish Speech Dataset for recognition task

Dataset comprises 488 hours of telephone dialogues in Spanish, collected from 600 native speakers across various topics and domains. This dataset boasts an impressive 98% word accuracy rate, making it a valuable resource for advancing speech recognition technology. By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/spanish-speech-recognition-dataset.
F
Colombian Spanish General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Colombian Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-spanish-colombia
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Colombian Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Colombian Spanish communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Colombian accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Colombian Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native Colombian Spanish speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of Colombia to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple Spanish speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for Colombian Spanish.

•
Voice Assistants: Build smart assistants capable of understanding natural Colombian conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex;
Argentinian Spanish Speech
kaggle.com
Updated Aug 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Santiago Torres Busquets (2023). Argentinian Spanish Speech [Dataset]. https://www.kaggle.com/datasets/storresbusquets/argentinian-spanish-speech
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Santiago Torres Busquets
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Argentina
Description
This data set contains transcribed high-quality audio of random Spanish sentences recorded by volunteers in Buenos Aires, Argentina. The data set consists of wave files, and a TSV file (line_index.tsv). The file line_index.tsv contains a anonymized FileID and the transcription of audio in the file.

Obtained from Open Speech and Language Resources website: https://www.openslr.org/index.html Identifier: SLR61 Summary: Data set which contains 5739 recordings of native speakers of Spanish Category: Speech License: Attribution-ShareAlike 4.0 International

Dataset available on: https://www.openslr.org/61/
m
General conversation speech datasets in Spanish for Virtual Reality
data.macgence.com
mp3
Updated Apr 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2024). General conversation speech datasets in Spanish for Virtual Reality [Dataset]. https://data.macgence.com/dataset/general-conversation-speech-datasets-in-spanish-for-virtual-reality
Explore at:
mp3Available download formats
Dataset updated
Apr 22, 2024
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
Worldwide
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
The audio dataset includes General Conversation, featuring Spanish speakers from Spain with detailed metadata.
Spanish(Mexico) Real-world Casual Conversation and Monologue speech dataset
nexdata.ai
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Spanish(Mexico) Real-world Casual Conversation and Monologue speech dataset [Dataset]. https://www.nexdata.ai/datasets/speechrecog/1715
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Nexdata
Area covered
World, Mexico
Variables measured
Format, Country, Language, Accuracy Rate, Content category, Recording condition, Language(Region) Code, Features of annotation
Description
Spanish(Mexico) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
h
spanish-speech-recognition-dataset
huggingface.co
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata NLP (2025). spanish-speech-recognition-dataset [Dataset]. https://huggingface.co/datasets/ud-nlp/spanish-speech-recognition-dataset
Explore at:
Dataset updated
Jul 30, 2025
Authors
Unidata NLP
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Spanish Telephone Dialogues Dataset - 488 Hours

Dataset comprises 488 hours of high-quality telephone audio recordings in Spanish, featuring 600 native speakers and achieving a 95% sentence accuracy rate. Designed for advancing speech recognition models and language processing, this extensive speech data corpus covers diverse topics and domains, making it ideal for training robust automatic speech recognition (ASR) systems. - Get the data

Dataset characteristics:… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/spanish-speech-recognition-dataset.
478 Hours - Spanish Conversational Speech Data by Mobile Phone
m.nexdata.ai
nexdata.ai
Updated Jun 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). 478 Hours - Spanish Conversational Speech Data by Mobile Phone [Dataset]. https://m.nexdata.ai/datasets/speechrecog/1147?source=Huggingface
Explore at:
Dataset updated
Jun 3, 2025
Dataset authored and provided by
Nexdata
Variables measured
Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Language(Region) Code, Features of annotation
Description
Spanish(Spain) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(596 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
e
Hamburg Corpus of Argentinean Spanish (HaCASpa) - Dataset - B2FIND
b2find.eudat.eu
Updated Apr 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Hamburg Corpus of Argentinean Spanish (HaCASpa) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/ab6502cb-48a6-530f-9d7d-172737124ff2
Explore at:
Dataset updated
Apr 30, 2023
Area covered
Hamburg
Description
Audio and video recordings of experimental/read and spontaneous speech from adult speakers of Porteño Spanish in Argentina. Speakers are 18-69 years old and from two geographic areas. For the intonational experiments, there are audio recordings only, whereas some of the free interviews and map tasks feature video recordings. The material used as stimuli in the experiments is available with references encoded in the transcriptions. The Hamburg Corpus of Argentinean Spanish (HaCASpa) was compiled in December 2008 and November/December 2009 within the context of the research project The intonation of Spanish in Argentina (H9, director: Christoph Gabriel), part of the Collaborative Research Centre "Multilingalism", funded by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) and hosted by the University of Hamburg. It comprises data from two varieties of Argentinean Spanish, i.e. a) the dialect spoken in the capital of Buenos Aires (also called Porteño, derived from puerto 'harbor') and b) the variety of the Neuquén/Comahue area (Northern Patagonia). The seven parts of HaCASpa correspond to the seven tasks described below in more detail: Five experiments were carried out in order to elicit specific data for research in prosody, with a main focus on (Task 1–5); in addition, several speakers took part in a free interview (Task 6) and a map task experiment (Task 7). The Task is encoded as a metadata attribute for each communication. HaCASpa comprises three different types of spoken data, depending on the Task, i.e. spontaneous, semi-spontaneous, and scripted speech. This information corresponds to the metadata attribute Speech type. The regional dimension of the corpus is represented through the attribute Area (i.e. Buenos Aires or Neuquén/Comahue), its diachronic dimension through the attribute Age group (i.e. Under 25/Over 25). The subjects are 60 native speakers of the relevant variety of Argentinean Spanish, i.e. Buenos Aires (Porteño) or Nequén/Comahue Spanish. For each speaker, the following information is available: Age, Education, Occupation, Year of school enrollment, Year of school graduation and Parents' mother tongue. The current version 0.2 contains mainly orthographic transcriptions of verbal behaviour (141,000 transcribed words) and codes that relate utterances to the materials used for the experimental tasks. Experimental design: Task (1) consists of two subparts: reading a story (1a) and retelling it (1b). For (1a), the subjects were asked to read the short story "The North Wind and the Sun", which was presented on a computer screen, two times. The fable is well known for its use of phonetic descriptions of different languages (see Handbook of the International Phonetic Association, International Phonetic Association. Cambridge: Cambridge University Press, 2005); the Latin American version we used in our data stems from the Dialectoteca del español, (coordination: C.-E. Piñeros). For (1b), the speakers were instructed to retell the story in their own words without being able to consult the text. With the help of these two parts, data of scripted (part 1a) as well as of semi-spontaneous speech (part 1b) could be collected. Task (2) was designed to collect data of semi-spontaneous speech by asking the subjects to answer questions pertaining to a given picture story. In a first step, the speakers were familiarized with the story, which was presented as two pictures displayed on a computer screen. In a second step, they were asked to answer specific questions about the story. The questions were also presented on the computer screen and varied in their design in order to elicit answers with different information-structural readings (such as broad vs. narrow focus or different focus types). In general, the speakers were free to answer as they wished. However, in order to avoid single word answers, they were asked to utter complete sentences. Task (3) consisted of reading question-answer pairs, the content of which was based on the picture stories already familiar from task (2). The answers were given together with the questions on the computer screen (i.e. one question / one answer) and the speakers simply had to read both the question and the answer. Task (4) was a reading task in which the subjects were asked to utter 10 simple subject-verb-object (SVO) sentences, presented on a computer screen. The speakers were instructed to read them at both normal and fast speech rate. Along the lines proposed in D´Imperio et al. 2005 ("Intonational Phrasing in Romance: The Role of Syntactic and Prosodic Structure", in: Prosodies: With Special Reference to Iberian Languages, ed. by Frota, S. et al., Berlin: Mouton de Gruyter, 59-97), the subject and object constituents differed in their syntactic and prosodic complexity (e.g. determiner plus noun or determiner plus noun plus adjective and one or three prosodic words, respectively). The participants were instructed to read the sentences as if they contained new information. The complete experiment design is described in Gabriel, C. et al. 2011 ("Prosodic phrasing in Porteño Spanish", in: Intonational Phrasing in Romance and Germanic: Cross-Linguistic and Bilingual Studies, ed. by Gabriel, C. & Lleó, C., Amsterdam: Benjamins, 153-182). Task (5), the so-called intonation survey, consisted of 48 situations designed to elicit various intonational contours with specific pragmatic meanings. In this inductive method, the researcher confronts the speaker with a series of hypothetical situations to which he or she is supposed to react verbally. In the Argentinean version of the questionnaire, the hypothetical situations were illustrated by appropriate pictures. The experimental design is described in more detail in Prieto, P. & Roseano, P. 2010 (eds). Transcription of Intonation of the Spanish Language. Munich: Lincom; see also the Interactive atlas of Spanish intonation (coordination: P. Prieto & P. Roseano). Task (6) was conducted to collect spontaneous speech data by conducting free interviews. In this task, the subjects were asked to tell the interviewer something about a past experience, be it a vacation or memories of Argentina as it was decades ago. Even though the interviewer was still part of the conversation, it was mainly the subjects who spoke during the recordings. Task (7) consists of Map Task dialogs. Map Task is a technique employed to collect data of spontaneous speech in which two subjects cooperate to complete a specified task. It is designed to lead the subjects to produce particular interrogative patterns. Each of the two subjects receives a map of an imaginary town marked with buildings and other specific elements. A route is marked on the map of one of the two participants, who assumes the role of the instruction-giver. The version of the same map given to the other participant, who assumes the role of the instruction-follower, differs from that of the instruction-giver in that it does not show the route to be followed. The instruction-follower therefore must ask the instruction-giver questions in order to be able to reproduce the same route on his or her own map (see also the Interactive atlas of Spanish intonation). CLARIN Metadata summary for Hamburg Corpus of Argentinean Spanish (HaCASpa) (CMDI-based) Title: Hamburg Corpus of Argentinean Spanish (HaCASpa) Description: Audio and video recordings of experimental/read and spontaneous speech from adult speakers of Porteño Spanish in Argentina. Speakers are 18-69 years old and from two geographic areas. For the intonational experiments, there are audio recordings only, whereas some of the free interviews and map tasks feature video recordings. The material used as stimuli in the experiments is available with references encoded in the transcriptions. Publication date: 2011-06-30 Data owner: Christoph Gabriel, Institut für Romanistik / Von-Melle-Park 6 / D-20146 Hamburg, christoph.gabriel@uni-hamburg.de Contributors: Christoph Gabriel, Institut für Romanistik / Von-Melle-Park 6 / D-20146 Hamburg, christoph.gabriel@uni-hamburg.de (compiler) Project: H9 "The intonation of Spanish in Argentina", German Research Foundation (DFG) Keywords: contact variety, cross-sectional data, regional variety, language contact, EXMARaLDA Language: Spanish (spa) Size: 63 speakers (39 female, 24 male), 259 communications, 261 recordings, 1119 minutes, 261 transcriptions, 141321 words Annotation types: transcription (manual): mainly orthographic, project-specific conventions, code: reference to underlying prompts Temporal Coverage: 2008-11-01/2009-12-01 Spatial Coverage: Buenos Aires, AR; Neuquén/Comahue, AR Genre: discourse Modality: spoken
E
SALA Spanish Mexican Database
catalogue.elra.info
live.european-language-grid.eu
Updated Feb 22, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2007). SALA Spanish Mexican Database [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0173/
Explore at:
Dataset updated
Feb 22, 2007
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Area covered
Mexico
Description
The SALA Spanish Mexican Database comprises 1260 Mexican speakers (554 males, 706 females) recorded over the Mexican fixed telephone network. This database is partitioned into 7 CD-ROMs The speech databases made within the SALA project were validated by SPEX, the Netherlands, to assess their compliance with the SALA format and content specifications.The speech files are stored as sequences of 8-bit, 8kHz A-law speech files and are not compressed, according to the specifications of SALA. Each prompt utterance is stored within a separate file and has an accompanying ASCII SAM label file.Each speaker uttered the following items: * 6 application words; * 1 sequence of 10 isolated digits; * 4 connected digits: 1 sheet number (6 digits), 1 telephone number (9-11 digits), 1 credit card number (14-16 digits), 1 PIN code (6 digits); * 3 dates: 1 spontaneous date (e.g. birthday), 1 prompted date (word style), 1 relative and general date expression; * 1 spotting phrase using an application word (embedded); * 1 isolated digit; * 3 spelled-out words (letter sequences): 1 spelling of surname; 1 spelling of directory assistance city name; 1 real/artificial name for coverage; * 1 currency money amount; * 1 natural number; * 5 directory assistance names: 1 surname (out of 500); 1 city of birth / growing up (spontaneous); 1 most frequent city (out of 500); 1 most frequent company/agency (out of 500); 1 "forename surname" (set of 150 ) * 2 questions, including "fuzzy" yes/no: 1 predominantly "yes" question, 1 predominantly "no" question; * 9 phonetically rich sentences; * 9 additional spontaneous items * 2 time phrases: 1 time of day (spontaneous), 1 time phrase (word style); * 4 phonetically rich words. The following age distribution has been obtained: 20 speakers are under 16 years old, 801 speakers are between 16 and 30, 291 speakers are between 31 and 45, 124 speakers are between 46 and 60, and 24 speakers are over 60. A phonetic lexicon with canonical transcriptions in SAMPA is also provided.
F
Spanish(Spain) General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Spanish(Spain) General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-spanish-spain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Spain
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Spanish communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Spanish accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native Spanish speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of Spain to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple Spanish speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for Spanish.

•
Voice Assistants: Build smart assistants capable of understanding natural Spanish conversations.

<span
F
Spanish Human-Human Chat Dataset for Conversational AI & NLP
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Spanish Human-Human Chat Dataset for Conversational AI & NLP [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/spanish-general-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The Spanish General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world Spanish usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level Spanish conversations covering a broad spectrum of everyday topics.
Conversational Text Data
This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native Spanish speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.
•
Words per Chat: 300–700

•
Turns per Chat: Up to 50 dialogue turns

•
Contributors: 200 native Spanish speakers from the FutureBeeAI Crowd Community

•
Format: TXT, DOCS, JSON or CSV (customizable)

•
Structure: Each record contains the full chat, topic tag, and metadata block

Diversity and Domain Coverage
Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:
•Music, books, and movies
•Health and wellness
•Children and parenting
•Family life and relationships
•Food and cooking
•Education and studying
•Festivals and traditions
•Environment and daily life
•Internet and tech usage
•Childhood memories and casual chatting
This diversity ensures the dataset is useful across multiple NLP and language understanding applications.
Linguistic Authenticity
Chats reflect informal, native-level Spanish usage with:
•Colloquial expressions and local dialect influence
•Domain-relevant terminology
•Language-specific grammar, phrasing, and sentence flow
•Inclusion of realistic details such as names, phone numbers, email addresses, locations, dates, times, local currencies, and culturally grounded references
•Representation of different writing styles and input quirks to ensure training data realism
Metadata
Every chat instance is accompanied by structured metadata, which includes:
•Participant Age
•Gender
•Country/Region
•Chat Domain
•Chat Topic
•Dialect
This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.
Data Quality Assurance
All chat records pass through a rigorous QA process to maintain consistency and accuracy:
•Manual review for content completeness
•Format checks for chat turns and metadata
•Linguistic verification by native speakers
•Removal of inappropriate or unusable samples
This ensures a clean, reliable dataset ready for high-performance AI model training.
Applications
This dataset is ideal for training and evaluating a wide range of text-based AI systems:
•Conversational AI / Chatbots
•Smart assistants and voicebots
<div

Facebook

Twitter

Click to copy link

Link copied

Cite

FutureBee AI (2022). Mexican Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-spanish-mexico

Mexican Spanish General Conversation Speech Dataset for ASR

Mexican Spanish General Conversation Speech Corpus

Explore at:

wavAvailable download formats

Dataset updated

Aug 1, 2022

Dataset provided by

FutureBeeAI

Authors

FutureBee AI

License

https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

Area covered

Mexico

Dataset funded by

FutureBeeAI

Description

Introduction

Welcome to the Mexican Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Mexican Spanish communication.

Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Mexican accents and dialects.

Speech Data

The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Mexican Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

•Participant Diversity:

•

Speakers: 60 verified native Mexican Spanish speakers from FutureBeeAI’s contributor community.

•

Regions: Representing various provinces of Mexico to ensure dialectal diversity and demographic balance.

•

Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:

•

Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•

Duration: Each conversation ranges from 15 to 60 minutes.

•

Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•

Environment: Quiet, echo-free settings with no background noise.

Topic Diversity

The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

•Sample Topics Include:

•Family & Relationships

•Food & Recipes

•Education & Career

•Healthcare Discussions

•Social Issues

•Technology & Gadgets

•Travel & Local Culture

•Shopping & Marketplace Experiences, and many more.

Transcription

Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

•Transcription Highlights:

•Speaker-segmented dialogues

•Time-coded utterances

•Non-speech elements (pauses, laughter, etc.)

•High transcription accuracy, achieved through double QA pass, average WER < 5%

These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

Metadata

The dataset comes with granular metadata for both speakers and recordings:

•

Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•

Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

Usage and Applications

This dataset is a versatile resource for multiple Spanish speech and language AI applications:

•

ASR Development: Train accurate speech-to-text systems for Mexican Spanish.

•

Voice Assistants: Build smart assistants capable of understanding natural Mexican conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

Clear search

Close search

Google apps

Main menu

Mexican Spanish General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Percent Spanish Speakers

2013 American Community Survey - Table Packages: Detailed Language Spoken in...

English Speech Dataset (Spanish Speakers) – 388 Hours Scripted Monologue by...

Spanish Speaking English Speech Data by Mobile Phone - 388 Hours

762 Hours - Spanish(Latin America) Scripted Monologue Smartphone speech...

145 Hours - Spanish(spain) Children Real-world Casual Conversation and...

Spanish (Spain) Call Center Data for Realestate AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Dataset of books about Spanish language-Text-books for foreign speakers

spanish-speech-recognition-dataset

Colombian Spanish General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Argentinian Spanish Speech

General conversation speech datasets in Spanish for Virtual Reality

Spanish(Mexico) Real-world Casual Conversation and Monologue speech dataset

spanish-speech-recognition-dataset

478 Hours - Spanish Conversational Speech Data by Mobile Phone

Hamburg Corpus of Argentinean Spanish (HaCASpa) - Dataset - B2FIND

SALA Spanish Mexican Database

Spanish(Spain) General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Spanish Human-Human Chat Dataset for Conversational AI & NLP

Introduction

Conversational Text Data

Diversity and Domain Coverage

Linguistic Authenticity

Metadata

Data Quality Assurance

Applications

Mexican Spanish General Conversation Speech Dataset for ASR

Mexican Spanish General Conversation Speech Corpus

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications