Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.
The United States is the non-hispanic country with the largest number of native Spanish speakers in the world, with approximately 41.89 million people with a native command of the language in 2024. However, the European Union had the largest group of non-native speakers with limited proficiency of Spanish, at around 28 million people. Furthermore, Mexico is the country with the largest number of native Spanish speakers in the world as of 2024.
In 2023, California had the highest Hispanic population in the United States, with over 15.76 million people claiming Hispanic heritage. Texas, Florida, New York, and Illinois rounded out the top five states for Hispanic residents in that year. History of Hispanic people Hispanic people are those whose heritage stems from a former Spanish colony. The Spanish Empire colonized most of Central and Latin America in the 15th century, which began when Christopher Columbus arrived in the Americas in 1492. The Spanish Empire expanded its territory throughout Central America and South America, but the colonization of the United States did not include the Northeastern part of the United States. Despite the number of Hispanic people living in the United States having increased, the median income of Hispanic households has fluctuated slightly since 1990. Hispanic population in the United States Hispanic people are the second-largest ethnic group in the United States, making Spanish the second most common language spoken in the country. In 2021, about one-fifth of Hispanic households in the United States made between 50,000 to 74,999 U.S. dollars. The unemployment rate of Hispanic Americans has fluctuated significantly since 1990, but has been on the decline since 2010, with the exception of 2020 and 2021, due to the impact of the coronavirus (COVID-19) pandemic.
In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.
This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.
Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.
As of 2023, more than ** percent of people in the United States spoke a language other than English at home. California had the highest share among all U.S. states, with ** percent of its population speaking a language other than English at home.
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
The US Spanish Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 550 adult Spanish speakers in the US (255 males, 295 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place), and consisting of about 208 hours of audio data. 2) The second set comprises the recordings of 50 child Spanish speakers in the US (28 boys, 22 girls), recorded over 4 microphone channels in 1 recording environment (children room), and consisting of about 14.7 hours of audio data. This database is partitioned into 22 DVDs (first set) and 3 DVDs (second set).The speech databases made within the Speecon project were validated by SPEX, the Netherlands, to assess their compliance with the Speecon format and content specifications.Each of the four speech channels is recorded at 16 kHz, 16 bit, uncompressed unsigned integers in Intel format (lo-hi byte order). To each signal file corresponds an ASCII SAM label file which contains the relevant descriptive information.Each speaker uttered the following items:Calibration data: 6 noise recordings The “silence word” recordingFree spontaneous items (adults only):2 minutes (session time) of free spontaneous, rich context items (story telling) (an open number of spontaneous topics out of a set of 30 topics)17 Elicited spontaneous items (adults only):3 dates, 2 times, 3 proper names, 2 city names, 1 letter sequence, 2 answers to questions, 3 telephone numbers, 1 language Read speech:30 phonetically rich sentences uttered by adults and 60 uttered by children5 phonetically rich words (adults only)4 isolated digits1 isolated digit sequence4 connected digit sequences1 telephone number3 natural numbers1 money amount2 time phrases (T1 : analogue, T2 : digital)3 dates (D1 : analogue, D2 : relative and general date, D3 : digital)3 letter sequences1 proper name2 city or street names2 questions2 special keyboard characters 1 Web address1 email address208 application specific words and phrases per session (adults)74 toy commands, 14 phone commands and 34 general commands (children)The following age distribution has been obtained: Adults: 223 speakers are between 15 and 30, 191 speakers are between 31 and 45, and 136 speakers are over 46.Children: 15 speakers are between 8 and 10, 35 speakers are between 11 and 14.A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
Language spoken at home and the ability to speak English for the population age 5 and over as reported by the US Census Bureau's, American Community Survey (ACS) 5-year estimates table C16001.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, for 2020, the 2020 Census provides the official counts of the population and housing units for the nation, states, counties, cities, and towns. For 2016 to 2019, the Population Estimates Program provides estimates of the population for the nation, states, counties, cities, and towns and intercensal housing unit estimates for the nation, states, and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2016-2020 American Community Survey 5-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..For information on definitions of the OMB-defined racial classifications, see the "Race" and "Race Concepts" sections of the American Community Survey and Puerto Rico Community Survey 2019 Subject Definitions document at https://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2019_ACSSubjectDefinitions.pdf..The Hispanic origin and race codes were updated in 2020. For more information on the Hispanic origin and race code changes, please visit the American Community Survey Technical Documentation website..The 2016-2020 American Community Survey (ACS) data generally reflect the September 2018 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances, the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineation lists due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
2013-2023 Virginia Population by Age by Language Spoken at Home by Ability to Speak English for the Population 5 years and over by Census Block Group. Contains estimates and margins of error.
U.S. Census Bureau; American Community Survey, American Community Survey 5-Year Estimates, Table B16004 Data accessed from: Census Bureau's API for American Community Survey (https://www.census.gov/data/developers/data-sets.html)
The United States Census Bureau's American Community Survey (ACS): -What is the American Community Survey? (https://www.census.gov/programs-surveys/acs/about.html) -Geography & ACS (https://www.census.gov/programs-surveys/acs/geography-acs.html) -Technical Documentation (https://www.census.gov/programs-surveys/acs/technical-documentation.html)
Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section. (https://www.census.gov/programs-surveys/acs/technical-documentation/code-lists.html)
Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section. (https://www.census.gov/acs/www/methodology/sample_size_and_data_quality/)
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties.
Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation https://www.census.gov/programs-surveys/acs/technical-documentation.html). The effect of nonsampling error is not represented in these tables.
Table from the American Community Survey (ACS) B16003 of age by language spoken at home for the population 5 years and over in limited English-speaking households. These are multiple, nonoverlapping vintages of the 5-year ACS estimates of population and housing attributes starting in 2010 shown by the corresponding census tract vintage. Also includes the most recent release annually.King County, Washington census tracts with nonoverlapping vintages of the 5-year American Community Survey (ACS) estimates starting in 2010. Vintage identified in the "ACS Vintage" field.The census tract boundaries match the vintage of the ACS data (currently 2010 and 2020) so please note the geographic changes between the decades. Tracts have been coded as being within the City of Seattle as well as assigned to neighborhood groups called "Community Reporting Areas". These areas were created after the 2000 census to provide geographically consistent neighborhoods through time for reporting U.S. Census Bureau data. This is not an attempt to identify neighborhood boundaries as defined by neighborhoods themselves.Vintages: 2010, 2015, 2020, 2021, 2022, 2023ACS Table(s): B16003Data downloaded from: <a href='https://data.c
In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.
In 2022, around 48.59 percent of New Mexico's population was of Hispanic origin, compared to the national percentage of 19.45. California, Texas, and Arizona also registered shares over 30 percent. The distribution of the U.S. population by ethnicity can be accessed here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Wisconsin Card Sorting Test (WCST) is among the most commonly used tests of executive functioning. We aimed to generate normative data on the 64-item version of this test (WCST-64) for Spanish-speakers living in the U.S.–Mexico Border region. Participants included 189 native Spanish-speakers (Age: 19–60; Education: 0–20; 59.3% female) from the Neuropsychological Norms for the U.S.–Mexico Border Region in Spanish (NP-NUMBRS) project who completed the WCST-64. Univariable and interactive associations between demographic variables and raw scores were examined via Spearman correlations, Wilcoxon Rank-sum tests and linear regressions. T-scores for various WCST-64 measures (Total Errors, Perseverative Responses, Perseverative Errors, Conceptual Level Responses and Number of Categories) were obtained using fractional polynomial equations with weights for age, education, and gender. Percentile scores were reported for Failures to Maintain Set. Rates of impairment (T-score < 40) were calculated by applying the newly developed norms and published norms for non-Hispanic English-speaking Whites and Blacks. Older age was associated with worse performance and education was linked to better performance on most WCST-64 raw scores, with stronger education effects among females than males. The norms developed here resulted in expected rates of impairment (14–16% across measures). Applying published norms for non-Hispanic Blacks resulted in generally comparable impairment rates. In contrast, applying previously published norms for non-Hispanic Whites overestimated impairment (38–52% across measures). These data will enhance interpretation performance on the WCST-64 for Spanish-speakers living in the U.S.–Mexico Border region. Future work will need to examine the generalizability of these norms to other Hispanic/Latino groups.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This US Spanish Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.
Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.
The dataset features 30 hours of dual-channel call center recordings between native US Spanish speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.
This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.
Such domain-rich variety ensures model generalization across common real estate support conversations.
All recordings are accompanied by precise, manually verified transcriptions in JSON format.
These transcriptions streamline ASR and NLP development for Spanish real estate voice applications.
Detailed metadata accompanies each participant and conversation:
This enables smart filtering, dialect-focused model training, and structured dataset exploration.
This dataset is ideal for voice AI and NLP systems built for the real estate sector:
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
US Language Learner Market size was estimated at USD 74.06 Billion in 2024 and is projected to reach USD 234.55 Billion by 2031, growing at a CAGR of 15.50% from 2024 to 2031.
Key Market Drivers • Increase in Globalized Communication: Heightened demand for multilingual communication skills is observed due to the growing interconnectedness of the world, fostered by international trade and travel. This demand is particularly true in the US, where businesses are increasingly operated on a global scale. As a result, the need to learn new languages to effectively collaborate with international partners is driven by globalization. • Evolving Educational Landscape: A shift towards incorporating language learning at a younger age is being observed in the educational system in the US. This shift is driven by a recognition of the cognitive benefits of multilingualism and the growing importance of foreign languages in the job market. As a result, language learning at a foundational level is exposed to a larger segment of the US population, creating a more fertile ground for continued language acquisition later in life. • Technological Advancements in Learning Methods: The rise of mobile technology and online learning platforms has significantly impacted the US language learner market. Language learning has been made more accessible and convenient than ever before by these advancements. A vast array of interactive and personalized language courses can now be accessed by learners on their own time and schedule. This surge in language learning participation within the US is fueled by this ease of access. • Growing Hispanic Population: A significant and rapidly growing Hispanic population is observed in the US. This demographic shift has led to a heightened demand for Spanish language learning within the country. Spanish language skills are increasingly seen as valuable not only for personal communication but also for professional opportunities in a diverse workforce. The growth of the Spanish language learning segment within the US market is propelled by this demand.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundFinancial toxicity (FT) reflects multi-dimensional personal economic hardships borne by cancer patients. It is unknown whether measures of FT—to date derived largely from English-speakers—adequately capture economic experiences and financial hardships of medically underserved low English proficiency US Hispanic cancer patients. We piloted a Spanish language FT instrument in this population.MethodsWe piloted a Spanish version of the Economic Strain and Resilience in Cancer (ENRICh) FT measure using qualitative cognitive interviews and surveys in un-/under-insured or medically underserved, low English proficiency, Spanish-speaking Hispanics (UN-Spanish, n = 23) receiving ambulatory oncology care at a public healthcare safety net hospital in the Houston metropolitan area. Exploratory analyses compared ENRICh FT scores amongst the UN-Spanish group to: (1) un-/under-insured English-speaking Hispanics (UN-English, n = 23) from the same public facility and (2) insured English-speaking Hispanics (INS-English, n = 31) from an academic comprehensive cancer center. Multivariable logistic models compared the outcome of severe FT (score > 6).ResultsUN-Spanish Hispanic participants reported high acceptability of the instrument (only 0% responded that the instrument was “very difficult to answer” and 4% that it was “very difficult to understand the questions”; 8% responded that it was “very difficult to remember resources used” and 8% that it was “very difficult to remember the burdens experienced”; and 4% responded that it was “very uncomfortable to respond”). Internal consistency of the FT measure was high (Cronbach’s α = 0.906). In qualitative responses, UN-Spanish Hispanics frequently identified a total lack of credit, savings, or income and food insecurity as aspects contributing to FT. UN-Spanish and UN-English Hispanic patients were younger, had lower education and income, resided in socioeconomically deprived neighborhoods and had more advanced cancer vs. INS-English Hispanics. There was a higher likelihood of severe FT in UN-Spanish (OR = 2.73, 95% CI 0.77–9.70; p = 0.12) and UN-English (OR = 4.13, 95% CI 1.13–15.12; p = 0.03) vs. INS-English Hispanics. A higher likelihood of severely depleted FT coping resources occurred in UN-Spanish (OR = 4.00, 95% CI 1.07–14.92; p = 0.04) and UN-English (OR = 5.73, 95% CI 1.49–22.1; p = 0.01) vs. INS-English. The likelihood of FT did not differ between UN-Spanish and UN-English in both models (p = 0.59 and p = 0.62 respectively).ConclusionIn medically underserved, uninsured Hispanic patients with cancer, comprehensive Spanish-language FT assessment in low English proficiency participants was feasible, acceptable, and internally consistent. Future studies employing tailored FT assessment and intervention should encompass the key privations and hardships in this population.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 8 rows and is filtered where the book subjects is Spanish language-Latin America-Spoken Spanish. It features 9 columns including author, publication date, language, and book publisher.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the US Spanish Call Center Speech Dataset for the Healthcare domain designed to enhance the development of call center speech recognition models specifically for the Healthcare industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.
This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Healthcare domain, designed to build robust and accurate customer service speech technology.
This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.
This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.
To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:
These ready-to-use transcriptions accelerate the development of the Healthcare domain call center conversational AI and ASR models for the US Spanish language.
The dataset provides comprehensive metadata for each conversation and participant:
This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of US Spanish call center speech recognition models.
This dataset can be used for various applications in the fields of speech recognition, natural language processing, and conversational AI, specifically tailored to the Healthcare domain. Potential use cases include:
Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.