47 datasets found
  1. Languages in Mexico 2020

    • statista.com
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Languages in Mexico 2020 [Dataset]. https://www.statista.com/statistics/275440/languages-in-mexico/
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2020
    Area covered
    Mexico
    Description

    In 2020, about 93.8 percent of the Mexican population was monolingual in Spanish. Around five percent spoke a combination of Spanish and indigenous languages. Spanish is the third-most spoken native language worldwide, after Mandarin Chinese and Hindi.

    Mexican Spanish

    Spanish was first being used in Mexico in the 16th century, at the time of Spanish colonization during the Conquest campaigns of what is now Mexico and the Caribbean. As of 2018, Mexico is the country with the largest number of native Spanish speakers worldwide. Mexican Spanish is influenced by English and Nahuatl, and has about 120 million users. The Mexican government uses Spanish in the majority of its proceedings, however it recognizes 68 national languages, 63 of which are indigenous.

    Indigenous languages spoken

    Of the indigenous languages spoken, two of the most widely used are Nahuatl and Maya. Due to a history of marginalization of indigenous groups, most indigenous languages are endangered, and many linguists warn they might cease to be used after a span of just a few decades. In recent years, legislative attempts such as the San Andréas Accords have been made to protect indigenous groups, who make up about 25 million of Mexico’s 125 million total inhabitants, though the efficacy of such measures is yet to be seen.

  2. Speakers of indigenous languages in Mexico 2020, by language

    • statista.com
    • ai-chatbox.pro
    Updated Mar 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Speakers of indigenous languages in Mexico 2020, by language [Dataset]. https://www.statista.com/statistics/1323032/indigenous-language-speakers-by-language-mexico/
    Explore at:
    Dataset updated
    Mar 16, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2, 2021 - Mar 27, 2021
    Area covered
    Mexico
    Description

    There were more than ************* speakers of indigenous languages in Mexico as of 2020. Nahuatl was the most spoken indigenous language (although it is also considered a group of languages), with more than **** million speakers. Both the Mayan languages Tseltal and Tsotsil were spoken by over ******* persons. Furthermore, about ******* of all the indigenous language speakers were located in just two states: Chiapas and Oaxaca.

  3. Speakers of indigenous languages Mexico 2020, by region

    • ai-chatbox.pro
    • statista.com
    Updated Sep 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Speakers of indigenous languages Mexico 2020, by region [Dataset]. https://www.ai-chatbox.pro/?_=%2Fstatistics%2F1323026%2Findigenous-language-speakers-by-state-mexico%2F%23XgboD02vawLbpWJjSPEePEUG%2FVFd%2Bik%3D
    Explore at:
    Dataset updated
    Sep 10, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2, 2020 - Mar 27, 2020
    Area covered
    Mexico
    Description

    There were more than seven million speakers of indigenous languages in Mexico as of 2020. Chiapas and Oaxaca ranked as the federal entities with the largest population aged over three years who speak an indigenous language, with 1.5 and 1.2 million people respectively. Moreover, Nahuatl was the most spoken indigenous language or group of languages.

  4. Number of indigenous language speakers in Mexico State 2020

    • statista.com
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Number of indigenous language speakers in Mexico State 2020 [Dataset]. https://www.statista.com/statistics/1386885/number-indigenous-language-speakers-mexico-state/
    Explore at:
    Dataset updated
    Aug 7, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2020
    Area covered
    Mexico
    Description

    In the year 2020, Mazahua stood out as the predominant indigenous language among the prominent ones spoken in Mexico State, with a count over 111,000 people proficient in the language. Not far behind was Otomi, with a significant number of 102.600 speakers.

  5. s

    Data from: Spanish (Mexico) Dataset

    • hmn.shaip.com
    • shaip.com
    Updated Dec 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). Spanish (Mexico) Dataset [Dataset]. https://hmn.shaip.com/offerings/speech-data-catalog/spanish-mexico-dataset/
    Explore at:
    Dataset updated
    Dec 24, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Mexico
    Description

    Home Spanish (Mexico) DatasetConjunto de datos español (México)High-Quality Spanish Mexico TTS Dataset for AI & Speech Models Contact Us OverviewTitleSpanish (Mexico) Language DatasetDataset TypeTTSDescriptionSingle-utterance recordings, which tend to fall in…

  6. Number of indigenous language speakers in Guerrero 2020

    • statista.com
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Number of indigenous language speakers in Guerrero 2020 [Dataset]. https://www.statista.com/statistics/1389720/number-indigenous-language-speakers-guerrero-mexico/
    Explore at:
    Dataset updated
    Aug 7, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2020
    Area covered
    Mexico
    Description

    In 2020, the Mexican state of Guerrero exhibited a rich variety of indigenous languages. Among these, Nahuatl emerged as the predominant language, spoken by an estimated 157,740 individuals. Additionally, the presence of Mixtec and Tlapanec languages made a significant impact.

  7. p

    Chinese Language Schools in State of Mexico, Mexico - 2 Verified Listings...

    • poidata.io
    csv, excel, json
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poidata.io (2025). Chinese Language Schools in State of Mexico, Mexico - 2 Verified Listings Database [Dataset]. https://www.poidata.io/report/chinese-language-school/mexico/state-of-mexico
    Explore at:
    excel, csv, jsonAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    Poidata.io
    Area covered
    State of Mexico, Mexico
    Description

    Comprehensive dataset of 2 Chinese language schools in State of Mexico, Mexico as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.

  8. Number of indigenous language speakers in Nuevo Leon 2020

    • statista.com
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Number of indigenous language speakers in Nuevo Leon 2020 [Dataset]. https://www.statista.com/statistics/1385793/number-indigenous-language-speakers-nuevo-leon-mexico/
    Explore at:
    Dataset updated
    Jul 5, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2020
    Area covered
    Nuevo Leon, Mexico
    Description

    In 2020, Nahuatl emerged as the most widely spoken indigenous language among the most prominent ones in the Mexican state of Nuevo Leon, boasting 54,110 speakers. Following closely behind was Huasteco, with the substantial figure of 19,460 speakers.

  9. E

    SALA II Spanish from Mexico database

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Aug 28, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2007). SALA II Spanish from Mexico database [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0171/
    Explore at:
    Dataset updated
    Aug 28, 2007
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Area covered
    Mexico
    Description

    The SALA II Spanish from Mexico database collected in Mexico was recorded within the scope of the SALA II project.The SALA II Spanish from Mexico database contains the recordings of 1,075 Mexican speakers (539 males and 536 females) recorded over the Mexican mobile telephone network.The following acoustic conditions were selected as representative of a mobile user's environment: * Passenger in moving car, railway, bus, etc. (155 speakers) * Public place (279 speakers) * Stationary pedestrian by road side (223 speakers) * Home/office environment (364 speakers) * Passenger in moving car using a hands-free kit (54 speakers) This database is distributed as 1 DVD-ROM The speech files are stored as sequences of 8-bit, 8kHz a-law speech files and are not compressed, according to the specifications of SALA II. Each prompt utterance is stored within a separate file and has an accompanying ASCII SAM label file.This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SALA II format and content specifications.Each speaker uttered the following items: * 6 application words * 1 sequence of 10 isolated digits * 4 connected digits (1 sheet number -6 digits, 1 telephone number -9/11 digits, 1 credit card number -14/16 digits, 1 PIN code -6 digits) * 3 dates (1 spontaneous date e.g. birthday, 1 word style prompted date, 1 relative and general date expression) * 2 spotting phrase using an embedded application word * 2 isolated digits * 3 spelled words (1surname, 1 directory assistance city name, 1 real/artificial name for coverage) * 1 currency money amount * 1 natural number * 5 directory assistance names (1 surname out of a set of 500, 1 city of birth/growing up, 1 most frequent city out of a set of 500, 1 most frequent company/agency out of a set of 500, 1 "forename surname" out of a set of 150 ) * 2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question) * 9 phonetically rich sentences * 2 time phrases (1 spontaneous time of day, 1word style time phrase) * 4 phonetically rich words The following age distribution has been obtained: 7 speakers are under 16, 643 speakers are between 16 and 30, 248 speakers are between 31 and 45, 169 speakers are between 46 and 60, and 8 speakers are over 60.A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

  10. F

    Mexican Spanish General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mexican Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Mexican Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world Mexican Spanish communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic Mexican accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of Mexican Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native Mexican Spanish speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of Mexico to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Spanish speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for Mexican Spanish.
    Voice Assistants: Build smart assistants capable of understanding natural Mexican conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  11. f

    Data_Sheet_3_The Role of Language in Structuring Social Networks Following...

    • figshare.com
    txt
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cecilia Padilla-Iglesias; Karen L. Kramer (2023). Data_Sheet_3_The Role of Language in Structuring Social Networks Following Market Integration in a Yucatec Maya Population.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2021.656963.s003
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Cecilia Padilla-Iglesias; Karen L. Kramer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Language is the human universal mode of communication, and is dynamic and constantly in flux accommodating user needs as individuals interface with a changing world. However, we know surprisingly little about how language responds to market integration, a pressing force affecting indigenous communities worldwide today. While models of culture change often emphasize the replacement of one language, trait, or phenomenon with another following socioeconomic transitions, we present a more nuanced framework. We use demographic, economic, linguistic, and social network data from a rural Maya community that spans a 27-year period and the transition to market integration. By adopting this multivariate approach for the acquisition and use of languages, we find that while the number of bilingual speakers has significantly increased over time, bilingualism appears stable rather than transitionary. We provide evidence that when indigenous and majority languages provide complementary social and economic payoffs, both can be maintained. Our results predict the circumstances under which indigenous language use may be sustained or at risk. More broadly, the results point to the evolutionary dynamics that shaped the current distribution of the world’s linguistic diversity.

  12. f

    Data_Sheet_5_The Role of Language in Structuring Social Networks Following...

    • figshare.com
    txt
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cecilia Padilla-Iglesias; Karen L. Kramer (2023). Data_Sheet_5_The Role of Language in Structuring Social Networks Following Market Integration in a Yucatec Maya Population.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2021.656963.s005
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers
    Authors
    Cecilia Padilla-Iglesias; Karen L. Kramer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Language is the human universal mode of communication, and is dynamic and constantly in flux accommodating user needs as individuals interface with a changing world. However, we know surprisingly little about how language responds to market integration, a pressing force affecting indigenous communities worldwide today. While models of culture change often emphasize the replacement of one language, trait, or phenomenon with another following socioeconomic transitions, we present a more nuanced framework. We use demographic, economic, linguistic, and social network data from a rural Maya community that spans a 27-year period and the transition to market integration. By adopting this multivariate approach for the acquisition and use of languages, we find that while the number of bilingual speakers has significantly increased over time, bilingualism appears stable rather than transitionary. We provide evidence that when indigenous and majority languages provide complementary social and economic payoffs, both can be maintained. Our results predict the circumstances under which indigenous language use may be sustained or at risk. More broadly, the results point to the evolutionary dynamics that shaped the current distribution of the world’s linguistic diversity.

  13. p

    English Language Schools in Mexico - 2,719 Verified Listings Database

    • poidata.io
    csv, excel, json
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poidata.io (2025). English Language Schools in Mexico - 2,719 Verified Listings Database [Dataset]. https://www.poidata.io/report/english-language-school/mexico
    Explore at:
    json, excel, csvAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Poidata.io
    Area covered
    Mexico
    Description

    Comprehensive dataset of 2,719 English language schools in Mexico as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.

  14. Number of indigenous language speakers in Sonora 2020

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of indigenous language speakers in Sonora 2020 [Dataset]. https://www.statista.com/statistics/1388253/number-indigenous-language-speakers-sonora-mexico/
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2020
    Area covered
    Mexico
    Description

    In the year 2020, the linguistic diversity within the Mexican state of Sonora was mostly dominated by Mayo emerging as the primary indigenous language, spoken by approximately ****** individuals. Not far behind was Yaqui, with the significant figure of ****** speakers.

  15. p

    English Language Camps in Mexico - 32 Verified Listings Database

    • poidata.io
    csv, excel, json
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poidata.io (2025). English Language Camps in Mexico - 32 Verified Listings Database [Dataset]. https://www.poidata.io/report/english-language-camp/mexico
    Explore at:
    json, excel, csvAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    Poidata.io
    Area covered
    Mexico
    Description

    Comprehensive dataset of 32 English language camps in Mexico as of June, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.

  16. p

    Trends in Reading and Language Arts Proficiency (2011-2022): Mexico...

    • publicschoolreview.com
    Updated Mar 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2024). Trends in Reading and Language Arts Proficiency (2011-2022): Mexico Elementary School vs. New York vs. Mexico Central School District [Dataset]. https://www.publicschoolreview.com/mexico-elementary-school-profile
    Explore at:
    Dataset updated
    Mar 20, 2024
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mexico Central School District
    Description

    This dataset tracks annual reading and language arts proficiency from 2011 to 2022 for Mexico Elementary School vs. New York and Mexico Central School District

  17. w

    Dataset of books called An areal-typological study of American Indian...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called An areal-typological study of American Indian languages north of Mexico [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=An+areal-typological+study+of+American+Indian+languages+north+of+Mexico
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mexico
    Description

    This dataset is about books. It has 1 row and is filtered where the book is An areal-typological study of American Indian languages north of Mexico. It features 7 columns including author, publication date, language, and book publisher.

  18. p

    English Language Schools in Tamaulipas, Mexico - 61 Available (Free Sample)

    • poidata.io
    csv
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poidata.io (2025). English Language Schools in Tamaulipas, Mexico - 61 Available (Free Sample) [Dataset]. https://www.poidata.io/report/english-language-school/mexico/tamaulipas
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 24, 2025
    Dataset provided by
    Poidata.io
    Area covered
    Tamaulipas, Mexico
    Description

    This dataset provides information on 61 in Tamaulipas, Mexico as of June, 2025. It includes details such as email addresses (where publicly available), phone numbers (where publicly available), and geocoded addresses. Explore market trends, identify potential business partners, and gain valuable insights into the industry. Download a complimentary sample of 10 records to see what's included.

  19. Spanish Language Datasets | 1.8M+ Sentences | NLP | TTS | Dictionary Display...

    • datarade.ai
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Languages (2025). Spanish Language Datasets | 1.8M+ Sentences | NLP | TTS | Dictionary Display | Game | Translations | European & Latin Amer. Coverage [Dataset]. https://datarade.ai/data-products/spanish-language-datasets-1-8m-sentences-nlp-tts-dic-oxford-languages
    Explore at:
    .csv, .json, .mp3, .txt, .wav, .xls, .xmlAvailable download formats
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Oxford Languageshttps://www.lexico.com/
    Area covered
    Ecuador, Paraguay, Panama, Bolivia (Plurinational State of), Chile, Cuba, Nicaragua, Colombia, Honduras, Costa Rica
    Description

    Our Spanish language datasets are carefully compiled and annotated by language and linguistic experts; you can find them available for licensing:

    1. Spanish Monolingual Dictionary Data
    2. Spanish Bilingual Dictionary Data
    3. Spanish Sentences Data
    4. Synonyms and Antonyms Data
    5. Audio Data
    6. Word list Data

    Key Features (approximate numbers):

    1. Spanish Monolingual Dictionary Data

    Our Spanish monolingual reliably offers clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Spanish language.

    • Headwords: 73,000
    • Senses: 123,000
    • Sentence examples: 104,000
    • Format: XML and JSON formats
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually
    1. Spanish Bilingual Dictionary Data

    The bilingual data provides translations in both directions, from English to Spanish and from Spanish to English. It is annually reviewed and updated by our in-house team of language experts. Offers significant coverage of the language, providing a large volume of translated words of excellent quality.

    • Translations: 221,300
    • Senses: 103,500
    • Example sentences: 74,500
    • Example translations: 83,800
    • Format: XML and JSON formats
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually
    1. Spanish Sentences Data

    Spanish sentences retrieved from the corpus are ideal for NLP model training, presenting approximately 20 million words. The sentences provide a great coverage of Spanish-speaking countries and are accordingly tagged to a particular country or dialect.

    • Sentences volume: 1,840,000
    • Format: XML and JSON format
    • Delivery: Email (link-based file sharing) and REST API
    1. Spanish Synonyms and Antonyms Data

    This Spanish language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for building linguistically aware AI systems and language technologies.

    • Synonyms: 127,700
    • Antonyms: 9,500
    • Format: XML format
    • Delivery: Email (link-based file sharing)
    • Updated frequency: annually
    1. Spanish Audio Data (word-level)

    Curated word-level audio data for the Spanish language, which covers all varieties of world Spanish, providing rich dialectal diversity in the Spanish language.

    • Audio files: 20,900
    • Format: XLSX (for index), MP3 and WAV (audio files)
    1. Spanish Word List Data

    This language data contains a carefully curated and comprehensive list of 450,000 Spanish words.

    • Wordforms: 450,000
    • Format: CSV and TXT formats
    • Delivery: Email (link-based file sharing)

    Use Cases:

    We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD).

    If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Oxford.Languages@oup.com to start the conversation.

    Pricing:

    Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

    Contact our team or email us at Oxford.Languages@oup.com to explore pricing options and discover how our language data can support your goals.

  20. F

    Travel Scripted Monologue Speech Data: Spanish (Mexico)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Travel Scripted Monologue Speech Data: Spanish (Mexico) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/travel-scripted-speech-monologues-spanish-mexico
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Mexico
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Mexican Spanish Scripted Monologue Speech Dataset for the Travel Domain. This meticulously curated dataset is designed to advance the development of Spanish language speech recognition models, particularly for the Travel industry.

    Speech Data

    This training dataset comprises over 6,000 high-quality scripted prompt recordings in Mexican Spanish. These recordings cover various topics and scenarios relevant to the Travel domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 native Spanish speakers from different regions of Mexico.
    Regions: Ensures a balanced representation of Mexican Spanish accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Recording Nature: Audio recordings of scripted prompts/monologues.
    Audio Duration: Average duration of 5 to 30 seconds per recording.
    Formats: WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.
    Environment: Recordings are conducted in quiet settings without background noise and echo.
    Topic Diversity: The dataset encompasses a wide array of topics and conversational scenarios to ensure comprehensive coverage of the Travel sector. Topics include:
    Customer Service Interactions
    Booking and Reservations
    Travel Inquiries
    Technical Support
    General Information and Advice
    Promotional and Sales Events
    Domain Specific Statements
    Other Elements: To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in Travel interactions:
    Names: Region-specific names of males and females in various formats.
    Addresses: Region-specific addresses in different spoken formats.
    Dates & Times: Inclusion of date and time in various travel contexts, such as booking dates, departure and arrival times.
    Destinations: Specific names of cities, countries, and tourist attractions relevant to the travel sector.
    Numbers & Prices: Various numbers and prices related to ticket costs, hotel rates, and transaction amounts.
    Booking IDs and Confirmation Numbers: Inclusion of booking identification and confirmation details for realistic customer service scenarios.

    Each scripted prompt is crafted to reflect real-life scenarios encountered in the Travel domain, ensuring applicability in training robust natural language processing and speech recognition models.

    Transcription Data

    In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.

    Content: Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.
    Format: Transcriptions are provided in plain text (.TXT) format, with files named to match their associated audio files for easy reference.
    <div

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Languages in Mexico 2020 [Dataset]. https://www.statista.com/statistics/275440/languages-in-mexico/
Organization logo

Languages in Mexico 2020

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2020
Area covered
Mexico
Description

In 2020, about 93.8 percent of the Mexican population was monolingual in Spanish. Around five percent spoke a combination of Spanish and indigenous languages. Spanish is the third-most spoken native language worldwide, after Mandarin Chinese and Hindi.

Mexican Spanish

Spanish was first being used in Mexico in the 16th century, at the time of Spanish colonization during the Conquest campaigns of what is now Mexico and the Caribbean. As of 2018, Mexico is the country with the largest number of native Spanish speakers worldwide. Mexican Spanish is influenced by English and Nahuatl, and has about 120 million users. The Mexican government uses Spanish in the majority of its proceedings, however it recognizes 68 national languages, 63 of which are indigenous.

Indigenous languages spoken

Of the indigenous languages spoken, two of the most widely used are Nahuatl and Maya. Due to a history of marginalization of indigenous groups, most indigenous languages are endangered, and many linguists warn they might cease to be used after a span of just a few decades. In recent years, legislative attempts such as the San Andréas Accords have been made to protect indigenous groups, who make up about 25 million of Mexico’s 125 million total inhabitants, though the efficacy of such measures is yet to be seen.

Search
Clear search
Close search
Google apps
Main menu