47 datasets found

Languages in Mexico 2020
statista.com
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Languages in Mexico 2020 [Dataset]. https://www.statista.com/statistics/275440/languages-in-mexico/
Explore at:
Dataset updated
Apr 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2020
Area covered
Mexico
Description
In 2020, about 93.8 percent of the Mexican population was monolingual in Spanish. Around five percent spoke a combination of Spanish and indigenous languages. Spanish is the third-most spoken native language worldwide, after Mandarin Chinese and Hindi.

Mexican Spanish

Spanish was first being used in Mexico in the 16th century, at the time of Spanish colonization during the Conquest campaigns of what is now Mexico and the Caribbean. As of 2018, Mexico is the country with the largest number of native Spanish speakers worldwide. Mexican Spanish is influenced by English and Nahuatl, and has about 120 million users. The Mexican government uses Spanish in the majority of its proceedings, however it recognizes 68 national languages, 63 of which are indigenous.

Indigenous languages spoken

Of the indigenous languages spoken, two of the most widely used are Nahuatl and Maya. Due to a history of marginalization of indigenous groups, most indigenous languages are endangered, and many linguists warn they might cease to be used after a span of just a few decades. In recent years, legislative attempts such as the San Andréas Accords have been made to protect indigenous groups, who make up about 25 million of Mexico’s 125 million total inhabitants, though the efficacy of such measures is yet to be seen.
Speakers of indigenous languages in Mexico 2020, by language
statista.com
ai-chatbox.pro
Updated Mar 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Speakers of indigenous languages in Mexico 2020, by language [Dataset]. https://www.statista.com/statistics/1323032/indigenous-language-speakers-by-language-mexico/
Explore at:
Dataset updated
Mar 16, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2, 2021 - Mar 27, 2021
Area covered
Mexico
Description
There were more than ************* speakers of indigenous languages in Mexico as of 2020. Nahuatl was the most spoken indigenous language (although it is also considered a group of languages), with more than **** million speakers. Both the Mayan languages Tseltal and Tsotsil were spoken by over ******* persons. Furthermore, about ******* of all the indigenous language speakers were located in just two states: Chiapas and Oaxaca.
Speakers of indigenous languages Mexico 2020, by region
ai-chatbox.pro
statista.com
Updated Sep 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Speakers of indigenous languages Mexico 2020, by region [Dataset]. https://www.ai-chatbox.pro/?_=%2Fstatistics%2F1323026%2Findigenous-language-speakers-by-state-mexico%2F%23XgboD02vawLbpWJjSPEePEUG%2FVFd%2Bik%3D
Explore at:
Dataset updated
Sep 10, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2, 2020 - Mar 27, 2020
Area covered
Mexico
Description
There were more than seven million speakers of indigenous languages in Mexico as of 2020. Chiapas and Oaxaca ranked as the federal entities with the largest population aged over three years who speak an indigenous language, with 1.5 and 1.2 million people respectively. Moreover, Nahuatl was the most spoken indigenous language or group of languages.
Number of indigenous language speakers in Mexico State 2020
statista.com
Updated Aug 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Number of indigenous language speakers in Mexico State 2020 [Dataset]. https://www.statista.com/statistics/1386885/number-indigenous-language-speakers-mexico-state/
Explore at:
Dataset updated
Aug 7, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2020
Area covered
Mexico
Description
In the year 2020, Mazahua stood out as the predominant indigenous language among the prominent ones spoken in Mexico State, with a count over 111,000 people proficient in the language. Not far behind was Otomi, with a significant number of 102.600 speakers.
s
Data from: Spanish (Mexico) Dataset
hmn.shaip.com
shaip.com
Updated Dec 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2024). Spanish (Mexico) Dataset [Dataset]. https://hmn.shaip.com/offerings/speech-data-catalog/spanish-mexico-dataset/
Explore at:
Dataset updated
Dec 24, 2024
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Mexico
Description
Home Spanish (Mexico) DatasetConjunto de datos español (México)High-Quality Spanish Mexico TTS Dataset for AI & Speech Models Contact Us OverviewTitleSpanish (Mexico) Language DatasetDataset TypeTTSDescriptionSingle-utterance recordings, which tend to fall in…
Number of indigenous language speakers in Guerrero 2020
statista.com
Updated Aug 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Number of indigenous language speakers in Guerrero 2020 [Dataset]. https://www.statista.com/statistics/1389720/number-indigenous-language-speakers-guerrero-mexico/
Explore at:
Dataset updated
Aug 7, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2020
Area covered
Mexico
Description
In 2020, the Mexican state of Guerrero exhibited a rich variety of indigenous languages. Among these, Nahuatl emerged as the predominant language, spoken by an estimated 157,740 individuals. Additionally, the presence of Mixtec and Tlapanec languages made a significant impact.
p
Chinese Language Schools in State of Mexico, Mexico - 2 Verified Listings...
poidata.io
csv, excel, json
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poidata.io (2025). Chinese Language Schools in State of Mexico, Mexico - 2 Verified Listings Database [Dataset]. https://www.poidata.io/report/chinese-language-school/mexico/state-of-mexico
Explore at:
excel, csv, jsonAvailable download formats
Dataset updated
Jul 2, 2025
Dataset provided by
Poidata.io
Area covered
State of Mexico, Mexico
Description
Comprehensive dataset of 2 Chinese language schools in State of Mexico, Mexico as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
Number of indigenous language speakers in Nuevo Leon 2020
statista.com
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Number of indigenous language speakers in Nuevo Leon 2020 [Dataset]. https://www.statista.com/statistics/1385793/number-indigenous-language-speakers-nuevo-leon-mexico/
Explore at:
Dataset updated
Jul 5, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2020
Area covered
Nuevo Leon, Mexico
Description
In 2020, Nahuatl emerged as the most widely spoken indigenous language among the most prominent ones in the Mexican state of Nuevo Leon, boasting 54,110 speakers. Following closely behind was Huasteco, with the substantial figure of 19,460 speakers.
E
SALA II Spanish from Mexico database
catalogue.elra.info
live.european-language-grid.eu
Updated Aug 28, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2007). SALA II Spanish from Mexico database [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0171/
Explore at:
Dataset updated
Aug 28, 2007
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Area covered
Mexico
Description
The SALA II Spanish from Mexico database collected in Mexico was recorded within the scope of the SALA II project.The SALA II Spanish from Mexico database contains the recordings of 1,075 Mexican speakers (539 males and 536 females) recorded over the Mexican mobile telephone network.The following acoustic conditions were selected as representative of a mobile user's environment: * Passenger in moving car, railway, bus, etc. (155 speakers) * Public place (279 speakers) * Stationary pedestrian by road side (223 speakers) * Home/office environment (364 speakers) * Passenger in moving car using a hands-free kit (54 speakers) This database is distributed as 1 DVD-ROM The speech files are stored as sequences of 8-bit, 8kHz a-law speech files and are not compressed, according to the specifications of SALA II. Each prompt utterance is stored within a separate file and has an accompanying ASCII SAM label file.This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SALA II format and content specifications.Each speaker uttered the following items: * 6 application words * 1 sequence of 10 isolated digits * 4 connected digits (1 sheet number -6 digits, 1 telephone number -9/11 digits, 1 credit card number -14/16 digits, 1 PIN code -6 digits) * 3 dates (1 spontaneous date e.g. birthday, 1 word style prompted date, 1 relative and general date expression) * 2 spotting phrase using an embedded application word * 2 isolated digits * 3 spelled words (1surname, 1 directory assistance city name, 1 real/artificial name for coverage) * 1 currency money amount * 1 natural number * 5 directory assistance names (1 surname out of a set of 500, 1 city of birth/growing up, 1 most frequent city out of a set of 500, 1 most frequent company/agency out of a set of 500, 1 "forename surname" out of a set of 150 ) * 2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question) * 9 phonetically rich sentences * 2 time phrases (1 spontaneous time of day, 1word style time phrase) * 4 phonetically rich words The following age distribution has been obtained: 7 speakers are under 16, 643 speakers are between 16 and 30, 248 speakers are between 31 and 45, 169 speakers are between 46 and 60, and 8 speakers are over 60.A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
F
Mexican Spanish General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Mexican Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-spanish-mexico
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Mexico
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Mexican Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Mexican Spanish communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Mexican accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Mexican Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native Mexican Spanish speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of Mexico to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple Spanish speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for Mexican Spanish.

•
Voice Assistants: Build smart assistants capable of understanding natural Mexican conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;
f
Data_Sheet_3_The Role of Language in Structuring Social Networks Following...
figshare.com
txt
Updated Jun 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cecilia Padilla-Iglesias; Karen L. Kramer (2023). Data_Sheet_3_The Role of Language in Structuring Social Networks Following Market Integration in a Yucatec Maya Population.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2021.656963.s003
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2021.656963.s003
Dataset updated
Jun 4, 2023
Dataset provided by
Frontiers
Authors
Cecilia Padilla-Iglesias; Karen L. Kramer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Language is the human universal mode of communication, and is dynamic and constantly in flux accommodating user needs as individuals interface with a changing world. However, we know surprisingly little about how language responds to market integration, a pressing force affecting indigenous communities worldwide today. While models of culture change often emphasize the replacement of one language, trait, or phenomenon with another following socioeconomic transitions, we present a more nuanced framework. We use demographic, economic, linguistic, and social network data from a rural Maya community that spans a 27-year period and the transition to market integration. By adopting this multivariate approach for the acquisition and use of languages, we find that while the number of bilingual speakers has significantly increased over time, bilingualism appears stable rather than transitionary. We provide evidence that when indigenous and majority languages provide complementary social and economic payoffs, both can be maintained. Our results predict the circumstances under which indigenous language use may be sustained or at risk. More broadly, the results point to the evolutionary dynamics that shaped the current distribution of the world’s linguistic diversity.
f
Data_Sheet_5_The Role of Language in Structuring Social Networks Following...
figshare.com
txt
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cecilia Padilla-Iglesias; Karen L. Kramer (2023). Data_Sheet_5_The Role of Language in Structuring Social Networks Following Market Integration in a Yucatec Maya Population.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2021.656963.s005
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2021.656963.s005
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers
Authors
Cecilia Padilla-Iglesias; Karen L. Kramer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Language is the human universal mode of communication, and is dynamic and constantly in flux accommodating user needs as individuals interface with a changing world. However, we know surprisingly little about how language responds to market integration, a pressing force affecting indigenous communities worldwide today. While models of culture change often emphasize the replacement of one language, trait, or phenomenon with another following socioeconomic transitions, we present a more nuanced framework. We use demographic, economic, linguistic, and social network data from a rural Maya community that spans a 27-year period and the transition to market integration. By adopting this multivariate approach for the acquisition and use of languages, we find that while the number of bilingual speakers has significantly increased over time, bilingualism appears stable rather than transitionary. We provide evidence that when indigenous and majority languages provide complementary social and economic payoffs, both can be maintained. Our results predict the circumstances under which indigenous language use may be sustained or at risk. More broadly, the results point to the evolutionary dynamics that shaped the current distribution of the world’s linguistic diversity.
p
English Language Schools in Mexico - 2,719 Verified Listings Database
poidata.io
csv, excel, json
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poidata.io (2025). English Language Schools in Mexico - 2,719 Verified Listings Database [Dataset]. https://www.poidata.io/report/english-language-school/mexico
Explore at:
json, excel, csvAvailable download formats
Dataset updated
Jul 4, 2025
Dataset provided by
Poidata.io
Area covered
Mexico
Description
Comprehensive dataset of 2,719 English language schools in Mexico as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
Number of indigenous language speakers in Sonora 2020
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of indigenous language speakers in Sonora 2020 [Dataset]. https://www.statista.com/statistics/1388253/number-indigenous-language-speakers-sonora-mexico/
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2020
Area covered
Mexico
Description
In the year 2020, the linguistic diversity within the Mexican state of Sonora was mostly dominated by Mayo emerging as the primary indigenous language, spoken by approximately ****** individuals. Not far behind was Yaqui, with the significant figure of ****** speakers.
p
English Language Camps in Mexico - 32 Verified Listings Database
poidata.io
csv, excel, json
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poidata.io (2025). English Language Camps in Mexico - 32 Verified Listings Database [Dataset]. https://www.poidata.io/report/english-language-camp/mexico
Explore at:
json, excel, csvAvailable download formats
Dataset updated
Jun 28, 2025
Dataset provided by
Poidata.io
Area covered
Mexico
Description
Comprehensive dataset of 32 English language camps in Mexico as of June, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
p
Trends in Reading and Language Arts Proficiency (2011-2022): Mexico...
publicschoolreview.com
Updated Mar 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public School Review (2024). Trends in Reading and Language Arts Proficiency (2011-2022): Mexico Elementary School vs. New York vs. Mexico Central School District [Dataset]. https://www.publicschoolreview.com/mexico-elementary-school-profile
Explore at:
Dataset updated
Mar 20, 2024
Dataset authored and provided by
Public School Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Mexico Central School District
Description
This dataset tracks annual reading and language arts proficiency from 2011 to 2022 for Mexico Elementary School vs. New York and Mexico Central School District
w
Dataset of books called An areal-typological study of American Indian...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called An areal-typological study of American Indian languages north of Mexico [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=An+areal-typological+study+of+American+Indian+languages+north+of+Mexico
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Mexico
Description
This dataset is about books. It has 1 row and is filtered where the book is An areal-typological study of American Indian languages north of Mexico. It features 7 columns including author, publication date, language, and book publisher.
p
English Language Schools in Tamaulipas, Mexico - 61 Available (Free Sample)
poidata.io
csv
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poidata.io (2025). English Language Schools in Tamaulipas, Mexico - 61 Available (Free Sample) [Dataset]. https://www.poidata.io/report/english-language-school/mexico/tamaulipas
Explore at:
csvAvailable download formats
Dataset updated
Jun 24, 2025
Dataset provided by
Poidata.io
Area covered
Tamaulipas, Mexico
Description
This dataset provides information on 61 in Tamaulipas, Mexico as of June, 2025. It includes details such as email addresses (where publicly available), phone numbers (where publicly available), and geocoded addresses. Explore market trends, identify potential business partners, and gain valuable insights into the industry. Download a complimentary sample of 10 records to see what's included.
Spanish Language Datasets | 1.8M+ Sentences | NLP | TTS | Dictionary Display...
datarade.ai
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxford Languages (2025). Spanish Language Datasets | 1.8M+ Sentences | NLP | TTS | Dictionary Display | Game | Translations | European & Latin Amer. Coverage [Dataset]. https://datarade.ai/data-products/spanish-language-datasets-1-8m-sentences-nlp-tts-dic-oxford-languages
Explore at:
.csv, .json, .mp3, .txt, .wav, .xls, .xmlAvailable download formats
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Oxford Languageshttps://www.lexico.com/
Area covered
Ecuador, Paraguay, Panama, Bolivia (Plurinational State of), Chile, Cuba, Nicaragua, Colombia, Honduras, Costa Rica
Description
Our Spanish language datasets are carefully compiled and annotated by language and linguistic experts; you can find them available for licensing:

Spanish Monolingual Dictionary Data

Spanish Bilingual Dictionary Data

Spanish Sentences Data

Synonyms and Antonyms Data

Audio Data

Word list Data

Key Features (approximate numbers):

Spanish Monolingual Dictionary Data

Our Spanish monolingual reliably offers clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Spanish language.

Headwords: 73,000

Senses: 123,000

Sentence examples: 104,000

Format: XML and JSON formats

Delivery: Email (link-based file sharing) and REST API

Updated frequency: annually

Spanish Bilingual Dictionary Data

The bilingual data provides translations in both directions, from English to Spanish and from Spanish to English. It is annually reviewed and updated by our in-house team of language experts. Offers significant coverage of the language, providing a large volume of translated words of excellent quality.

Translations: 221,300

Senses: 103,500

Example sentences: 74,500

Example translations: 83,800

Format: XML and JSON formats

Delivery: Email (link-based file sharing) and REST API

Updated frequency: annually

Spanish Sentences Data

Spanish sentences retrieved from the corpus are ideal for NLP model training, presenting approximately 20 million words. The sentences provide a great coverage of Spanish-speaking countries and are accordingly tagged to a particular country or dialect.

Sentences volume: 1,840,000

Format: XML and JSON format

Delivery: Email (link-based file sharing) and REST API

Spanish Synonyms and Antonyms Data

This Spanish language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for building linguistically aware AI systems and language technologies.

Synonyms: 127,700

Antonyms: 9,500

Format: XML format

Delivery: Email (link-based file sharing)

Updated frequency: annually

Spanish Audio Data (word-level)

Curated word-level audio data for the Spanish language, which covers all varieties of world Spanish, providing rich dialectal diversity in the Spanish language.

Audio files: 20,900

Format: XLSX (for index), MP3 and WAV (audio files)

Spanish Word List Data

This language data contains a carefully curated and comprehensive list of 450,000 Spanish words.

Wordforms: 450,000

Format: CSV and TXT formats

Delivery: Email (link-based file sharing)

Use Cases:

We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD).

If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Oxford.Languages@oup.com to start the conversation.

Pricing:

Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

Contact our team or email us at Oxford.Languages@oup.com to explore pricing options and discover how our language data can support your goals.
F
Travel Scripted Monologue Speech Data: Spanish (Mexico)
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Travel Scripted Monologue Speech Data: Spanish (Mexico) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/travel-scripted-speech-monologues-spanish-mexico
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
Mexico
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Mexican Spanish Scripted Monologue Speech Dataset for the Travel Domain. This meticulously curated dataset is designed to advance the development of Spanish language speech recognition models, particularly for the Travel industry.
Speech Data
This training dataset comprises over 6,000 high-quality scripted prompt recordings in Mexican Spanish. These recordings cover various topics and scenarios relevant to the Travel domain, designed to build robust and accurate customer service speech technology.
•Participant Diversity:
•
Speakers: 60 native Spanish speakers from different regions of Mexico.

•
Regions: Ensures a balanced representation of Mexican Spanish accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

•Recording Details:
•
Recording Nature: Audio recordings of scripted prompts/monologues.

•
Audio Duration: Average duration of 5 to 30 seconds per recording.

•
Formats: WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.

•
Environment: Recordings are conducted in quiet settings without background noise and echo.

•
Topic Diversity: The dataset encompasses a wide array of topics and conversational scenarios to ensure comprehensive coverage of the Travel sector. Topics include:

•Customer Service Interactions
•Booking and Reservations
•Travel Inquiries
•Technical Support
•General Information and Advice
•Promotional and Sales Events
•Domain Specific Statements
•
Other Elements: To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in Travel interactions:

•
Names: Region-specific names of males and females in various formats.

•
Addresses: Region-specific addresses in different spoken formats.

•
Dates & Times: Inclusion of date and time in various travel contexts, such as booking dates, departure and arrival times.

•
Destinations: Specific names of cities, countries, and tourist attractions relevant to the travel sector.

•
Numbers & Prices: Various numbers and prices related to ticket costs, hotel rates, and transaction amounts.

•
Booking IDs and Confirmation Numbers: Inclusion of booking identification and confirmation details for realistic customer service scenarios.

Each scripted prompt is crafted to reflect real-life scenarios encountered in the Travel domain, ensuring applicability in training robust natural language processing and speech recognition models.
Transcription Data
In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.
•
Content: Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.

•
Format: Transcriptions are provided in plain text (.TXT) format, with files named to match their associated audio files for easy reference.

<div

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Languages in Mexico 2020 [Dataset]. https://www.statista.com/statistics/275440/languages-in-mexico/

Languages in Mexico 2020

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 15, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

2020

Area covered

Mexico

Description

In 2020, about 93.8 percent of the Mexican population was monolingual in Spanish. Around five percent spoke a combination of Spanish and indigenous languages. Spanish is the third-most spoken native language worldwide, after Mandarin Chinese and Hindi.

Mexican Spanish

Spanish was first being used in Mexico in the 16th century, at the time of Spanish colonization during the Conquest campaigns of what is now Mexico and the Caribbean. As of 2018, Mexico is the country with the largest number of native Spanish speakers worldwide. Mexican Spanish is influenced by English and Nahuatl, and has about 120 million users. The Mexican government uses Spanish in the majority of its proceedings, however it recognizes 68 national languages, 63 of which are indigenous.

Indigenous languages spoken

Of the indigenous languages spoken, two of the most widely used are Nahuatl and Maya. Due to a history of marginalization of indigenous groups, most indigenous languages are endangered, and many linguists warn they might cease to be used after a span of just a few decades. In recent years, legislative attempts such as the San Andréas Accords have been made to protect indigenous groups, who make up about 25 million of Mexico’s 125 million total inhabitants, though the efficacy of such measures is yet to be seen.

Clear search

Close search

Google apps

Main menu

Languages in Mexico 2020

Speakers of indigenous languages in Mexico 2020, by language

Speakers of indigenous languages Mexico 2020, by region

Number of indigenous language speakers in Mexico State 2020

Data from: Spanish (Mexico) Dataset

Number of indigenous language speakers in Guerrero 2020

Chinese Language Schools in State of Mexico, Mexico - 2 Verified Listings...

Number of indigenous language speakers in Nuevo Leon 2020

SALA II Spanish from Mexico database

Mexican Spanish General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Data_Sheet_3_The Role of Language in Structuring Social Networks Following...

Data_Sheet_5_The Role of Language in Structuring Social Networks Following...

English Language Schools in Mexico - 2,719 Verified Listings Database

Number of indigenous language speakers in Sonora 2020

English Language Camps in Mexico - 32 Verified Listings Database

Trends in Reading and Language Arts Proficiency (2011-2022): Mexico...

Dataset of books called An areal-typological study of American Indian...

English Language Schools in Tamaulipas, Mexico - 61 Available (Free Sample)

Spanish Language Datasets | 1.8M+ Sentences | NLP | TTS | Dictionary Display...

Travel Scripted Monologue Speech Data: Spanish (Mexico)

Introduction

Speech Data

Transcription Data

Languages in Mexico 2020