84 datasets found

Number of native Spanish speakers worldwide 2024, by country
statista.com
boostndoto.org
+5more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.
Spanish speakers in countries where Spanish is not an official language 2024...
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Spanish speakers in countries where Spanish is not an official language 2024 [Dataset]. https://www.statista.com/statistics/1276290/number-spanish-speakers-non-hispanic-countries-worldwide/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
The United States is the non-hispanic country with the largest number of native Spanish speakers in the world, with approximately 41.89 million people with a native command of the language in 2024. However, the European Union had the largest group of non-native speakers with limited proficiency of Spanish, at around 28 million people. Furthermore, Mexico is the country with the largest number of native Spanish speakers in the world as of 2024.
Number of students learning Spanish worldwide 2024, by country
statista.com
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of students learning Spanish worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/1276319/number-spanish-language-students-country-worldwide/
Explore at:
Dataset updated
Jan 22, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide, Spain
Description
The United States is the country with the largest number of Spanish language students, at approximately 8.59 million people in 2024. The second country is Brazil, with around 4.05 million students of the Spanish language. Moreover, the United States is also the non-hispanic country with the largest number of native Spanish speakers in the world.
Spanish Language Datasets | 1.8M+ Sentences | Translation Data | TTS |...
datarade.ai
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxford Languages (2025). Spanish Language Datasets | 1.8M+ Sentences | Translation Data | TTS | Dictionary Display | Translations | EU & LATAM Coverage [Dataset]. https://datarade.ai/data-products/spanish-language-datasets-1-8m-sentences-nlp-tts-dic-oxford-languages
Explore at:
.json, .xml, .csv, .xls, .txt, .mp3, .wavAvailable download formats
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Oxford Languageshttps://lexico.com/es
Area covered
Honduras, Costa Rica, Bolivia (Plurinational State of), Colombia, Ecuador, Paraguay, Chile, Nicaragua, Panama, Cuba
Description
Linguistically annotated Spanish language datasets with headwords, definitions, senses, examples, POS tags, semantic metadata, and usage info. Ideal for dictionary tools, NLP, and TTS model training or fine-tuning.

Our Spanish language datasets are carefully compiled and annotated by language and linguistic experts; you can find them available for licensing:

Spanish Monolingual Dictionary Data

Spanish Bilingual Dictionary Data

Spanish Sentences Data

Synonyms and Antonyms Data

Audio Data

Spanish Word List Data

Key Features (approximate numbers):

Spanish Monolingual Dictionary Data

Our Spanish monolingual reliably offers clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Spanish language.

Words: 73,000

Senses: 123,000

Example sentences: 104,000

Format: XML and JSON formats

Delivery: Email (link-based file sharing) and REST API

Updated frequency: annually

Spanish Bilingual Dictionary Data

The bilingual data provides translations in both directions, from English to Spanish and from Spanish to English. It is annually reviewed and updated by our in-house team of language experts. Offers significant coverage of the language, providing a large volume of translated words of excellent quality.

Translations: 221,300

Senses: 103,500

Example sentences: 74,500

Example translations: 83,800

Format: XML and JSON formats

Delivery: Email (link-based file sharing) and REST API

Updated frequency: annually

Spanish Sentences Data

Spanish sentences retrieved from the corpus are ideal for NLP model training, presenting approximately 20 million words. The sentences provide a great coverage of Spanish-speaking countries and are accordingly tagged to a particular country or dialect.

Sentences volume: 1,840,000

Format: XML and JSON format

Delivery: Email (link-based file sharing) and REST API

Spanish Synonyms and Antonyms Data

This Spanish language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for building linguistically aware AI systems and language technologies.

Synonyms: 127,700

Antonyms: 9,500

Format: XML format

Delivery: Email (link-based file sharing)

Updated frequency: annually

Spanish Audio Data (word-level)

Curated word-level audio data for the Spanish language, which covers all varieties of world Spanish, providing rich dialectal diversity in the Spanish language.

Audio files: 20,900

Format: XLSX (for index), MP3 and WAV (audio files)

Spanish Word List Data

This language data contains a carefully curated and comprehensive list of 450,000 Spanish words.

Wordforms: 450,000

Format: CSV and TXT formats

Delivery: Email (link-based file sharing)

Use Cases:

We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD).

If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Oxford.Languages@oup.com to start the conversation.

Pricing:

Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

Contact our team or email us at Oxford.Languages@oup.com to explore pricing options and discover how our language data can support your goals.

About the sample:

The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only.

If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information
Spanish Spontaneous Dialogue speech dataset
kaggle.com
zip
Updated Jun 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frank Wong (2024). Spanish Spontaneous Dialogue speech dataset [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/spanish-spontaneous-dialogue-speech-dataset
Explore at:
zip(93236 bytes)Available download formats
Dataset updated
Jun 7, 2024
Authors
Frank Wong
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Spanish(Spain) Spontaneous Dialogue Telephony speech dataset

Description

Spanish(Spain) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(600 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.For more details, please refer to the link: https://www.nexdata.ai/datasets/speechrecog/1234?source=Kaggle

Format

8kHz 8bit, a-law/u-law pcm, mono channel

Content category

Dialogue based on given topics

Recording condition

Low background noise (indoor)

Recording device

Telephony

Country

Spain(ESP)

Language(Region) Code

es-ES

Language

Spanish

Speaker

600 people in total, 49% male and 51% female

Features of annotation

Transcription text, timestamp, speaker ID, gender

Accuracy rate

Word accuracy rate(WAR) 98%

Licensing Information

Commercial License
h
messirve
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spanish Info Retrieval, messirve [Dataset]. https://huggingface.co/datasets/spanish-ir/messirve
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Spanish Info Retrieval
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
July 2025 UPDATE: We released version 1.1, adding almost 200k new queries 🎉🎉🎉. Use with: country = "full" # "ar", "bo", ... version = "1.1" dataset = datasets.load_dataset("spanish-ir/messirve", country, revision=version) print(dataset)

Dataset Card for MessIRve

MessIRve is a large-scale dataset for Spanish IR, designed to better capture the information needs of Spanish speakers across different countries. Queries are obtained from Google's autocomplete API… See the full description on the dataset page: https://huggingface.co/datasets/spanish-ir/messirve.
f
Table_1_Parental Burnout Assessment (PBA) in Different Hispanic Countries:...
figshare.com
frontiersin.figshare.com
docx
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Denisse Manrique-Millones; Georgy M. Vasin; Sergio Dominguez-Lara; Rosa Millones-Rivalles; Ricardo T. Ricci; Milagros Abregu Rey; María Josefina Escobar; Daniela Oyarce; Pablo Pérez-Díaz; María Pía Santelices; Claudia Pineda-Marín; Javier Tapia; Mariana Artavia; Maday Valdés Pacheco; María Isabel Miranda; Raquel Sánchez Rodríguez; Clara Isabel Morgades-Bamba; Ainize Peña-Sarrionandia; Fernando Salinas-Quiroz; Paola Silva Cabrera; Moïra Mikolajczak; Isabelle Roskam (2023). Table_1_Parental Burnout Assessment (PBA) in Different Hispanic Countries: An Exploratory Structural Equation Modeling Approach.DOCX [Dataset]. http://doi.org/10.3389/fpsyg.2022.827014.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2022.827014.s001
Dataset updated
Jun 14, 2023
Dataset provided by
Frontiers
Authors
Denisse Manrique-Millones; Georgy M. Vasin; Sergio Dominguez-Lara; Rosa Millones-Rivalles; Ricardo T. Ricci; Milagros Abregu Rey; María Josefina Escobar; Daniela Oyarce; Pablo Pérez-Díaz; María Pía Santelices; Claudia Pineda-Marín; Javier Tapia; Mariana Artavia; Maday Valdés Pacheco; María Isabel Miranda; Raquel Sánchez Rodríguez; Clara Isabel Morgades-Bamba; Ainize Peña-Sarrionandia; Fernando Salinas-Quiroz; Paola Silva Cabrera; Moïra Mikolajczak; Isabelle Roskam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Parental burnout is a unique and context-specific syndrome resulting from a chronic imbalance of risks over resources in the parenting domain. The current research aims to evaluate the psychometric properties of the Spanish version of the Parental Burnout Assessment (PBA) across Spanish-speaking countries with two consecutive studies. In Study 1, we analyzed the data through a bifactor model within an Exploratory Structural Equation Modeling (ESEM) on the pooled sample of participants (N = 1,979) obtaining good fit indices. We then attained measurement invariance across both gender and countries in a set of nested models with gradually increasing parameter constraints. Latent means comparisons across countries showed that among the participants’ countries, Chile had the highest parental burnout score, likewise, comparisons across gender evidenced that mothers displayed higher scores than fathers, as shown in previous studies. Reliability coefficients were high. In Study 2 (N = 1,171), we tested the relations between parental burnout and three specific consequences, i.e., escape and suicidal ideations, parental neglect, and parental violence toward one’s children. The medium to large associations found provided support for the PBA’s predictive validity. Overall, we concluded that the Spanish version of the PBA has good psychometric properties. The results support its relevance for the assessment of parental burnout among Spanish-speaking parents, offering new opportunities for cross-cultural research in the parenting domain.
t
HISPANIC OR LATINO AND RACE - DP05_PIN_T - Dataset - CKAN
portal.tad3.org
Updated Nov 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). HISPANIC OR LATINO AND RACE - DP05_PIN_T - Dataset - CKAN [Dataset]. https://portal.tad3.org/dataset/hispanic-or-latino-and-race-dp05_pin_t
Explore at:
Dataset updated
Nov 17, 2024
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
ACS DEMOGRAPHIC AND HOUSING ESTIMATES HISPANIC OR LATINO AND RACE - DP05 Universe - Total population Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 The terms “Hispanic,” “Latino,” and “Spanish” are used interchangeably. Some respondents identify with all three terms while others may identify with only one of these three specific terms. People who identify with the terms “Hispanic,” “Latino,” or “Spanish” are those who classify themselves in one of the specific Hispanic, Latino, or Spanish categories listed on the questionnaire (“Mexican, Mexican Am., or Chicano,” “Puerto Rican,” or “Cuban”) as well as those who indicate that they are “another Hispanic, Latino, or Spanish origin.” People who do not identify with one of the specific origins listed on the questionnaire but indicate that they are “another Hispanic, Latino, or Spanish origin” are those whose origins are from Spain, the Spanish-speaking countries of Central or South America, or another Spanish culture or origin. Origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person’s parents or ancestors before their arrival in the UnitedStates. People who identify their origin as Hispanic, Latino, or Spanish may be of any race.
Spanish-language e-book price 2018-2023, by country
statista.com
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Spanish-language e-book price 2018-2023, by country [Dataset]. https://www.statista.com/statistics/1032412/spanish-language-ebook-price-worldwide-country/
Explore at:
Dataset updated
Nov 27, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In 2023, a Spanish-language e-book cost on average ***** euros in Spain, where such e-books were the most expensive in comparison to other Spanish-speaking countries. Mexico and Peru followed, where Spanish-language e-books cost an average of *** euros and *** euros respectively.
Nexdata | Spanish Speech Data by Mobile Phone | 435 Hours
datarade.ai
data.nexdata.ai
Updated Nov 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Nexdata | Spanish Speech Data by Mobile Phone | 435 Hours [Dataset]. https://datarade.ai/data-products/nexdata-spanish-speech-data-by-mobile-phone-435-hours-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Nov 11, 2025
Dataset authored and provided by
Nexdata
Area covered
Spain
Description
Spanish(Spain) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers, news and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(989 people in total), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Format

16kHz, 16bit, uncompressed wav, mono channel;

Recording condition

Low background noise(indoor), without echo;

Content category

Generic domain; news; human-machine interaction; smart home command and control; in-car command and control; numbers

Recording device

Android Smartphone, iPhone;

Speaker

989 speakers totally, with 49% male and 51% female ; and 57% speakers of all are in the age group of 17-25,39% speakers of all are in the age group of 26-45, 4% speakers of all are in the age group of 46-60;

Country

Spain(ESP);

Language(Region) Code

es-ES;

Language

Spanish;

Features of annotation

Transcription text;

Accuracy Rate

Sentence Accuracy Rate (SAR) 95%
Spanish Spontaneous Dialogue Telephony speech
kaggle.com
zip
Updated Jun 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frank Wong (2024). Spanish Spontaneous Dialogue Telephony speech [Dataset]. https://www.kaggle.com/datasets/nexdatafrank/spanish-spontaneous-dialogue-telephony-speech/code
Explore at:
zip(215338 bytes)Available download formats
Dataset updated
Jun 11, 2024
Authors
Frank Wong
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
88-Hours-Mexican-Spanish-Conversational-Speech-Data-by-Telephone

Description

Spanish(Mexico) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics. Transcribed with text content, timestamp, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers(122 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied. For more details, please refer to the link:https://www.nexdata.ai/datasets/speechrecog/1352?source=Kaggle

Format

8kHz 8bit, a-law/u-law pcm, mono channel

Content category

Dialogue based on given topics

Recording condition

Low background noise (indoor)

Recording device

Telephony

Country

Mexico(MEX)

Language(Region) Code

es-MX

Language

Spanish

Speaker

122 people in total, 53% male and 47% female

Features of annotation

Transcription text, timestamp, speaker ID, gender, noise

Accuracy rate

Word accuracy rate(WAR) 98%

Licensing Information

Commercial License

488h Spanish phone calls dataset

kaggle.com

zip

Updated Jul 30, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

simon graves (2025). 488h Spanish phone calls dataset [Dataset]. https://www.kaggle.com/datasets/simongraves/spanish-speech-recognition-dataset

Explore at:

zip(93217 bytes)Available download formats

Dataset updated

Jul 30, 2025

Authors

simon graves

License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

Spanish Telephone Dialogues Dataset - 488 Hours

Dataset comprises 488 hours of high-quality telephone audio recordings in Spanish, featuring 600 native speakers and achieving a 95% sentence accuracy rate. Designed for advancing speech recognition models and language processing, this extensive speech data corpus covers diverse topics and domains, making it ideal for training robust automatic speech recognition (ASR) systems. - Get the data

Dataset characteristics:

Characteristic	Data
Description	Audio of telephone dialogues in Spanish for training NLP models in real-world conversational scenarios.
Data types	Audio
Tasks	Speech recognition, NLP
Country	Spain (ESP)
Hours of telephone dialogue	488
Number of speakers	600
Labeling	Annotation (text content, speaker's ID, gender, age and other attributes)
Gender	Male (49%), Female (51%)
Recording device	Telephone

Here's a sample dataset to check out. For full access, go here.

Dataset structure

audio - audio file
text - text transcription
Spanish Speech Recognition.csv - metadata for the data

Similar Datasets:

The most spoken languages worldwide 2025
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description
In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.
Spanish_Visitors_Analysis
kaggle.com
zip
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
R. H. Amezqueta (2024). Spanish_Visitors_Analysis [Dataset]. https://www.kaggle.com/datasets/rudyhernndez/spanish-visitors-analysis
Explore at:
zip(20312200 bytes)Available download formats
Dataset updated
Mar 4, 2024
Authors
R. H. Amezqueta
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Data sources:

Travelers in Spain by tourist spots and country of residence (selection of 154 municipalities). Data from the 2017 hotel occupancy survey / Publication date: 09/10/2018 Portal de Datos Abiertos de Esri España. https://opendata.esri.es/datasets/ComunidadSIG::viajeros-entrados-por-puntos-turisticos-y-pais-de-residencia-/explore?location=28.382780%2C-15.044915%2C8.00

Municipal, provincial and regional limits. Centro Nacional de Información Geográfica (CNIG) - National Center for Geographic Information https://centrodedescargas.cnig.es/CentroDescargas/catalogo.do?Serie=LILIM

CartoBase ANE
Centro Nacional de Información Geográfica (CNIG) - National Center for Geographic Information https://centrodedescargas.cnig.es/CentroDescargas/catalogo.do?Serie=LILIM

Files * Visitors_Turist_Sites: This Geodataframe is based on a selection of 154 municipalities where, from the Hotel Occupancy Survey (Encuesta de Ocupación Hotelera) conducted by the National Institute of Statistics (Spain), distinctions are made between different nationalities of visitors.

Spanish_Provinces_Peninsula: Provincial limits of Spain (Iberian Peninsula and Balearic Islands)

Spanish_Provinces_CanaryIslands: Provincial limits of Spain (Canary Islands)

Geo_world: The cartographic bases of the National Atlas of Spain (ANE) - World Map.

License: All this data is licensed under CC-BY 4.0.** https://creativecommons.org/licenses/by/4.0/deed.es
o
Longitudinal Study of the Second Generation in Spain, Waves 1, 2, & 3
openicpsr.org
Updated Nov 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alejandro Portes; Rosa Aparicio (2021). Longitudinal Study of the Second Generation in Spain, Waves 1, 2, & 3 [Dataset]. http://doi.org/10.3886/E155023V1
Explore at:
Unique identifier
https://doi.org/10.3886/E155023V1
Dataset updated
Nov 19, 2021
Dataset provided by
University of Miami, Princeton University
Ortega y Gassett and Gregorio Marañon Foundation (FOM: La Fundación Ortega-Marañón)
Authors
Alejandro Portes; Rosa Aparicio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Spain
Description
Combined Longitudinal Study of the Second Generation in Spain data set, Waves 1, 2, and 3. This is the publicly available version of the ILSEG data (ILSEG is the Spanish acronym for Investigación Longitudinal de la Segunda Generación, Longitudinal Study of the Second Generation). Questions address the situations and plans for the future of young Spaniards who are children of immigrants to Spain, who were living in Madrid and Barcelona and attending secondary school in 2007-2008 and the 2011-2012 and 2015-2016 follow ups). The longitudinal study of the second Generation (ILSEG in its Spanish initials) represents the first attempt to conduct a large-scale study of the adaptation of children of immigrants to Spanish society over time. To that end, a large and statistically representative sample of children born to foreign parents in Spain or those brought at an early age to the country was identified and interviewed in metropolitan Madrid and Barcelona for wave 1. In total, almost 7,000 children of immigrants attending basic secondary school in close to 200 educational centers in both cities took part in the study. Because of sample attrition, wave 2 introduced a replacement sample. Additionally, a native born sample of children of Spaniards was also included to enable comparisons between native and immigrant-origin populations of the same age cohort.Topics include basic demographics, national origins, Spanish language acquisition, foreign language knowledge and retention, parents' education and employment, respondents' education and aspirations, religion, household arrangements, life experiences, and attitudes about Spanish society. Demographic variables include age, sex, birth country, language proficiency (Spanish and Catalan), language spoken in the home, number of siblings, mother's and father's birth country, religion, national identity, parent's sex, parent's marital status, parent's birth year, and the year the parent arrived in Spain.
Refugee requests in Spain
kaggle.com
zip
Updated Jan 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Blanco Gonzalez (2024). Refugee requests in Spain [Dataset]. https://www.kaggle.com/datasets/mariablancogonzalez/refugee-requests-in-spain
Explore at:
zip(11685 bytes)Available download formats
Dataset updated
Jan 23, 2024
Authors
Maria Blanco Gonzalez
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Spain
Description
Datasets on refugee claims in Spain between 2013 and 2021. This dataset is composed of two data frames. Each data frame is distributed by male and female requests.

AsiloCA: request made focused on each autonomous community. Some usefull features information:

CA -> Spanish autonomous community

Solicitantes -> number of asylum requests

Año -> year of the request

Pais -> country

AsiloEspaña: requests made focused on the countries of origin. Some usefull features information:

Nacionalidad -> applicant's nationality

Hombres -> number of men requests

Mujeres -> number of women requests

Total -> total number of requests made by country and year

Admitidas -> total number of admited requests

Año -> year
Flow of emigration abroad of people aged 25 and over by year, sex, country...
ine.es
csv, html, json +4
Updated Dec 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INE - Instituto Nacional de Estadística (2021). Flow of emigration abroad of people aged 25 and over by year, sex, country of birth (Spanish/foreign) and level of studies (grouping of levels) [Dataset]. https://www.ine.es/jaxiT3/Tabla.htm?t=49983&L=1
Explore at:
json, html, xls, text/pc-axis, csv, txt, xlsxAvailable download formats
Dataset updated
Dec 16, 2021
Dataset provided by
National Statistics Institutehttp://www.ine.es/
Authors
INE - Instituto Nacional de Estadística
License
https://www.ine.es/aviso_legalhttps://www.ine.es/aviso_legal
Time period covered
Jan 1, 2019 - Jan 1, 2021
Variables measured
Sex, National Total, Country of birth, Level of education, Demographic Concepts
Description
Migration Statistic: Flow of emigration abroad of people aged 25 and over by year, sex, country of birth (Spanish/foreign) and level of studies (grouping of levels). Annual. National.
Hispanic population U.S. 2023, by state
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Hispanic population U.S. 2023, by state [Dataset]. https://www.statista.com/statistics/259850/hispanic-population-of-the-us-by-state/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
In 2023, California had the highest Hispanic population in the United States, with over 15.76 million people claiming Hispanic heritage. Texas, Florida, New York, and Illinois rounded out the top five states for Hispanic residents in that year. History of Hispanic people Hispanic people are those whose heritage stems from a former Spanish colony. The Spanish Empire colonized most of Central and Latin America in the 15th century, which began when Christopher Columbus arrived in the Americas in 1492. The Spanish Empire expanded its territory throughout Central America and South America, but the colonization of the United States did not include the Northeastern part of the United States. Despite the number of Hispanic people living in the United States having increased, the median income of Hispanic households has fluctuated slightly since 1990. Hispanic population in the United States Hispanic people are the second-largest ethnic group in the United States, making Spanish the second most common language spoken in the country. In 2021, about one-fifth of Hispanic households in the United States made between 50,000 to 74,999 U.S. dollars. The unemployment rate of Hispanic Americans has fluctuated significantly since 1990, but has been on the decline since 2010, with the exception of 2020 and 2021, due to the impact of the coronavirus (COVID-19) pandemic.
Nexdata | Spanish(Spain) Unscripted Call Center Telephony speech dataset |...
datarade.ai
data.nexdata.ai
Updated Nov 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2025). Nexdata | Spanish(Spain) Unscripted Call Center Telephony speech dataset | 81 Hours [Dataset]. https://datarade.ai/data-products/nexdata-spanish-spain-unscripted-call-center-telephony-spe-nexdata
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Nov 9, 2025
Dataset authored and provided by
Nexdata
Area covered
Spain
Description
Spanish(Spain) Unscripted Call Center Telephony speech dataset, covers telecom domain. Including terms and emotions in call center scenario, mirrors real-world interactions. Transcribed with text content, speaker's ID and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Format

8kHz 16bit, wav, mono channel

Recording condition

Phone recording system, with low background noise (call center scenario)

Recording content

Spontaneous inbound and outbound callings in typical domain, such as telecom

Country

Spain(ESP),etc.

Language(Region) Code

es-ES, etc.

Language

Spanish

Features of annotation

Transcription text, timestamps, speaker ID, noise symbols, sensitive information

Accuracy

Word Accuracy Rate (WAR) 98% (punctuation, sentence symbols, accent and other non-speech labeling are not included in accuracy statistics due to subjectivity)
h
hispanic-people-liveness-detection-video-dataset
huggingface.co
Updated Apr 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unique Data (2024). hispanic-people-liveness-detection-video-dataset [Dataset]. https://huggingface.co/datasets/UniqueData/hispanic-people-liveness-detection-video-dataset
Explore at:
Dataset updated
Apr 24, 2024
Authors
Unique Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Biometric Attack Dataset, Hispanic People

The similar dataset that includes all ethnicities - Anti Spoofing Real Dataset

The dataset for face anti spoofing and face recognition includes images and videos of hispanic people. 32,600+ photos & video of 16,300 people from 20 countries. The dataset helps in enchancing the performance of the model by providing wider range of data for a specific ethnic group. The videos were gathered by capturing faces of genuine individuals… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/hispanic-people-liveness-detection-video-dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista, Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/

Number of native Spanish speakers worldwide 2024, by country

Explore at:

8 scholarly articles cite this dataset (View in Google Scholar)

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

World

Description

Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.

Clear search

Close search

Google apps

Main menu

Number of native Spanish speakers worldwide 2024, by country

Spanish speakers in countries where Spanish is not an official language 2024...

Number of students learning Spanish worldwide 2024, by country

Spanish Language Datasets | 1.8M+ Sentences | Translation Data | TTS |...

Spanish Spontaneous Dialogue speech dataset

Spanish(Spain) Spontaneous Dialogue Telephony speech dataset

Description

Format

Content category

Recording condition

Recording device

Country

Language(Region) Code

Language

Speaker

Features of annotation

Accuracy rate

Licensing Information

messirve

Table_1_Parental Burnout Assessment (PBA) in Different Hispanic Countries:...

HISPANIC OR LATINO AND RACE - DP05_PIN_T - Dataset - CKAN

Spanish-language e-book price 2018-2023, by country

Nexdata | Spanish Speech Data by Mobile Phone | 435 Hours

Spanish Spontaneous Dialogue Telephony speech

88-Hours-Mexican-Spanish-Conversational-Speech-Data-by-Telephone

Description

Format

Content category

Recording condition

Recording device

Country

Language(Region) Code

Language

Speaker

Features of annotation

Accuracy rate

Licensing Information

488h Spanish phone calls dataset

Spanish Telephone Dialogues Dataset - 488 Hours

Dataset characteristics:

Here's a sample dataset to check out. For full access, go here.

Dataset structure

Similar Datasets:

The most spoken languages worldwide 2025

Spanish_Visitors_Analysis

Longitudinal Study of the Second Generation in Spain, Waves 1, 2, & 3

Refugee requests in Spain

Flow of emigration abroad of people aged 25 and over by year, sex, country...

Hispanic population U.S. 2023, by state

Nexdata | Spanish(Spain) Unscripted Call Center Telephony speech dataset |...

hispanic-people-liveness-detection-video-dataset

Number of native Spanish speakers worldwide 2024, by country