100+ datasets found

Number of native Spanish speakers worldwide 2024, by country
statista.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
Explore at:
Dataset updated
Jan 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.
Spanish speakers in countries where Spanish is not an official language 2024...
statista.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Spanish speakers in countries where Spanish is not an official language 2024 [Dataset]. https://www.statista.com/statistics/1276290/number-spanish-speakers-non-hispanic-countries-worldwide/
Explore at:
Dataset updated
Jan 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
The United States is the non-hispanic country with the largest number of native Spanish speakers in the world, with approximately 41.89 million people with a native command of the language in 2024. However, the European Union had the largest group of non-native speakers with limited proficiency of Spanish, at around 28 million people. Furthermore, Mexico is the country with the largest number of native Spanish speakers in the world as of 2024.
Hispanic population U.S. 2023, by state
statista.com
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Hispanic population U.S. 2023, by state [Dataset]. https://www.statista.com/statistics/259850/hispanic-population-of-the-us-by-state/
Explore at:
Dataset updated
Oct 18, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
In 2023, California had the highest Hispanic population in the United States, with over 15.76 million people claiming Hispanic heritage. Texas, Florida, New York, and Illinois rounded out the top five states for Hispanic residents in that year. History of Hispanic people Hispanic people are those whose heritage stems from a former Spanish colony. The Spanish Empire colonized most of Central and Latin America in the 15th century, which began when Christopher Columbus arrived in the Americas in 1492. The Spanish Empire expanded its territory throughout Central America and South America, but the colonization of the United States did not include the Northeastern part of the United States. Despite the number of Hispanic people living in the United States having increased, the median income of Hispanic households has fluctuated slightly since 1990. Hispanic population in the United States Hispanic people are the second-largest ethnic group in the United States, making Spanish the second most common language spoken in the country. In 2021, about one-fifth of Hispanic households in the United States made between 50,000 to 74,999 U.S. dollars. The unemployment rate of Hispanic Americans has fluctuated significantly since 1990, but has been on the decline since 2010, with the exception of 2020 and 2021, due to the impact of the coronavirus (COVID-19) pandemic.
Ranking of languages spoken at home in the U.S. 2023
statista.com
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Ranking of languages spoken at home in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
Explore at:
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.
2013 American Community Survey - Table Packages: Detailed Language Spoken in...
catalog.data.gov
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). 2013 American Community Survey - Table Packages: Detailed Language Spoken in the U.S. [Dataset]. https://catalog.data.gov/dataset/2013-american-community-survey-table-packages-detailed-language-spoken-in-the-u-s
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
United States
Description
This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.
d
Population of the Limited English Proficient (LEP) Speakers by Community...
datasets.ai
data.cityofnewyork.us
+1more
23, 40, 55, 8
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of New York (2024). Population of the Limited English Proficient (LEP) Speakers by Community District [Dataset]. https://datasets.ai/datasets/population-of-the-limited-english-proficient-lep-speakers-by-community-district
Explore at:
55, 23, 8, 40Available download formats
Dataset updated
Aug 6, 2024
Dataset authored and provided by
City of New York
Description
Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.
Share of U.S. population speaking a language besides English at home 2023,...
statista.com
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of U.S. population speaking a language besides English at home 2023, by state [Dataset]. https://www.statista.com/statistics/312940/share-of-us-population-speaking-a-language-other-than-english-at-home-by-state/
Explore at:
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
As of 2023, more than ** percent of people in the United States spoke a language other than English at home. California had the highest share among all U.S. states, with ** percent of its population speaking a language other than English at home.
E
US Spanish Speecon database
catalogue.elra.info
Updated Feb 22, 2007
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2007). US Spanish Speecon database [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0211/
Explore at:
Dataset updated
Feb 22, 2007
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Area covered
United States
Description
The US Spanish Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 550 adult Spanish speakers in the US (255 males, 295 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place), and consisting of about 208 hours of audio data. 2) The second set comprises the recordings of 50 child Spanish speakers in the US (28 boys, 22 girls), recorded over 4 microphone channels in 1 recording environment (children room), and consisting of about 14.7 hours of audio data. This database is partitioned into 22 DVDs (first set) and 3 DVDs (second set).The speech databases made within the Speecon project were validated by SPEX, the Netherlands, to assess their compliance with the Speecon format and content specifications.Each of the four speech channels is recorded at 16 kHz, 16 bit, uncompressed unsigned integers in Intel format (lo-hi byte order). To each signal file corresponds an ASCII SAM label file which contains the relevant descriptive information.Each speaker uttered the following items:Calibration data: 6 noise recordings The “silence word” recordingFree spontaneous items (adults only):2 minutes (session time) of free spontaneous, rich context items (story telling) (an open number of spontaneous topics out of a set of 30 topics)17 Elicited spontaneous items (adults only):3 dates, 2 times, 3 proper names, 2 city names, 1 letter sequence, 2 answers to questions, 3 telephone numbers, 1 language Read speech:30 phonetically rich sentences uttered by adults and 60 uttered by children5 phonetically rich words (adults only)4 isolated digits1 isolated digit sequence4 connected digit sequences1 telephone number3 natural numbers1 money amount2 time phrases (T1 : analogue, T2 : digital)3 dates (D1 : analogue, D2 : relative and general date, D3 : digital)3 letter sequences1 proper name2 city or street names2 questions2 special keyboard characters 1 Web address1 email address208 application specific words and phrases per session (adults)74 toy commands, 14 phone commands and 34 general commands (children)The following age distribution has been obtained: Adults: 223 speakers are between 15 and 30, 191 speakers are between 31 and 45, and 136 speakers are over 46.Children: 15 speakers are between 8 and 10, 35 speakers are between 11 and 14.A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
n
Data from: Language Spoken at Home
linc.osbm.nc.gov
ncosbm.opendatasoft.com
csv, excel, geojson +1
Updated Oct 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Language Spoken at Home [Dataset]. https://linc.osbm.nc.gov/explore/dataset/language-spoken-at-home/
Explore at:
geojson, csv, json, excelAvailable download formats
Dataset updated
Oct 3, 2024
Description
Language spoken at home and the ability to speak English for the population age 5 and over as reported by the US Census Bureau's, American Community Survey (ACS) 5-year estimates table C16001.
2020 American Community Survey: B16005G | NATIVITY BY LANGUAGE SPOKEN AT...
data.census.gov
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS, 2020 American Community Survey: B16005G | NATIVITY BY LANGUAGE SPOKEN AT HOME BY ABILITY TO SPEAK ENGLISH FOR THE POPULATION 5 YEARS AND OVER (TWO OR MORE RACES) (ACS 5-Year Estimates Detailed Tables) [Dataset]. https://data.census.gov/table/ACSDT5Y2020.B16005G
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2020
Description
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, for 2020, the 2020 Census provides the official counts of the population and housing units for the nation, states, counties, cities, and towns. For 2016 to 2019, the Population Estimates Program provides estimates of the population for the nation, states, counties, cities, and towns and intercensal housing unit estimates for the nation, states, and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2016-2020 American Community Survey 5-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..For information on definitions of the OMB-defined racial classifications, see the "Race" and "Race Concepts" sections of the American Community Survey and Puerto Rico Community Survey 2019 Subject Definitions document at https://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2019_ACSSubjectDefinitions.pdf..The Hispanic origin and race codes were updated in 2020. For more information on the Hispanic origin and race code changes, please visit the American Community Survey Technical Documentation website..The 2016-2020 American Community Survey (ACS) data generally reflect the September 2018 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances, the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineation lists due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
V
Virginia Population by Language Spoken at Home by Ability to Speak English...
data.virginia.gov
csv
Updated Jan 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of INTERMODAL Planning and Investment (2025). Virginia Population by Language Spoken at Home by Ability to Speak English by Census Block Group (ACS 5-Year) [Dataset]. https://data.virginia.gov/dataset/virginia-population-by-language-spoken-at-home-by-ability-to-speak-english-by-census-block-group
Explore at:
csv(28410756)Available download formats
Dataset updated
Jan 3, 2025
Dataset authored and provided by
Office of INTERMODAL Planning and Investment
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
Virginia
Description
2013-2023 Virginia Population by Age by Language Spoken at Home by Ability to Speak English for the Population 5 years and over by Census Block Group. Contains estimates and margins of error.

U.S. Census Bureau; American Community Survey, American Community Survey 5-Year Estimates, Table B16004 Data accessed from: Census Bureau's API for American Community Survey (https://www.census.gov/data/developers/data-sets.html)

The United States Census Bureau's American Community Survey (ACS): -What is the American Community Survey? (https://www.census.gov/programs-surveys/acs/about.html) -Geography & ACS (https://www.census.gov/programs-surveys/acs/geography-acs.html) -Technical Documentation (https://www.census.gov/programs-surveys/acs/technical-documentation.html)

Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section. (https://www.census.gov/programs-surveys/acs/technical-documentation/code-lists.html)

Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section. (https://www.census.gov/acs/www/methodology/sample_size_and_data_quality/)

Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties.

Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation https://www.census.gov/programs-surveys/acs/technical-documentation.html). The effect of nonsampling error is not represented in these tables.
LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED...
catalog.data.gov
Updated Jan 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Seattle ArcGIS Online (2025). LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED ENGLISH SPEAKING HOUSEHOLDS (B16003) [Dataset]. https://catalog.data.gov/dataset/language-spoken-at-home-for-the-population-5-years-and-over-in-limited-english-speaking-ho
Explore at:
Dataset updated
Jan 31, 2025
Dataset provided by
https://arcgis.com/
Description
Table from the American Community Survey (ACS) B16003 of age by language spoken at home for the population 5 years and over in limited English-speaking households. These are multiple, nonoverlapping vintages of the 5-year ACS estimates of population and housing attributes starting in 2010 shown by the corresponding census tract vintage. Also includes the most recent release annually.King County, Washington census tracts with nonoverlapping vintages of the 5-year American Community Survey (ACS) estimates starting in 2010. Vintage identified in the "ACS Vintage" field.The census tract boundaries match the vintage of the ACS data (currently 2010 and 2020) so please note the geographic changes between the decades. Tracts have been coded as being within the City of Seattle as well as assigned to neighborhood groups called "Community Reporting Areas". These areas were created after the 2000 census to provide geographically consistent neighborhoods through time for reporting U.S. Census Bureau data. This is not an attempt to identify neighborhood boundaries as defined by neighborhoods themselves.Vintages: 2010, 2015, 2020, 2021, 2022, 2023ACS Table(s): B16003Data downloaded from: <a href='https://data.c
The most spoken languages worldwide 2025
statista.com
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Explore at:
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description
In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.
Percentage of Hispanic population in the U.S. by state 2023
statista.com
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Percentage of Hispanic population in the U.S. by state 2023 [Dataset]. https://www.statista.com/statistics/259865/percentage-of-hispanic-population-in-the-us-by-state/
Explore at:
Dataset updated
Oct 21, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
In 2022, around 48.59 percent of New Mexico's population was of Hispanic origin, compared to the national percentage of 19.45. California, Texas, and Arizona also registered shares over 30 percent. The distribution of the U.S. population by ethnicity can be accessed here.
f
Data from: Demographically adjusted normative data for the Wisconsin Card...
figshare.com
xlsx
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
María J. Marquine; David Yassai-Gonzalez; Alan Perez-Tejada; Anya Umlauf; Lily Kamalyan; Alejandra Morlett Paredes; Paola Suarez; Monica Rivera Mindt; Donald Franklin; Lidia Artiola i Fortuny; Mariana Cherner; Robert K. Heaton (2023). Demographically adjusted normative data for the Wisconsin Card Sorting Test-64 item: Results from the Neuropsychological Norms for the U.S.–Mexico Border Region in Spanish (NP-NUMBRS) project [Dataset]. http://doi.org/10.6084/m9.figshare.13312580.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13312580.v1
Dataset updated
Jun 2, 2023
Dataset provided by
Taylor & Francis
Authors
María J. Marquine; David Yassai-Gonzalez; Alan Perez-Tejada; Anya Umlauf; Lily Kamalyan; Alejandra Morlett Paredes; Paola Suarez; Monica Rivera Mindt; Donald Franklin; Lidia Artiola i Fortuny; Mariana Cherner; Robert K. Heaton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Mexico-United States border, Mexico, United States
Description
The Wisconsin Card Sorting Test (WCST) is among the most commonly used tests of executive functioning. We aimed to generate normative data on the 64-item version of this test (WCST-64) for Spanish-speakers living in the U.S.–Mexico Border region. Participants included 189 native Spanish-speakers (Age: 19–60; Education: 0–20; 59.3% female) from the Neuropsychological Norms for the U.S.–Mexico Border Region in Spanish (NP-NUMBRS) project who completed the WCST-64. Univariable and interactive associations between demographic variables and raw scores were examined via Spearman correlations, Wilcoxon Rank-sum tests and linear regressions. T-scores for various WCST-64 measures (Total Errors, Perseverative Responses, Perseverative Errors, Conceptual Level Responses and Number of Categories) were obtained using fractional polynomial equations with weights for age, education, and gender. Percentile scores were reported for Failures to Maintain Set. Rates of impairment (T-score < 40) were calculated by applying the newly developed norms and published norms for non-Hispanic English-speaking Whites and Blacks. Older age was associated with worse performance and education was linked to better performance on most WCST-64 raw scores, with stronger education effects among females than males. The norms developed here resulted in expected rates of impairment (14–16% across measures). Applying published norms for non-Hispanic Blacks resulted in generally comparable impairment rates. In contrast, applying previously published norms for non-Hispanic Whites overestimated impairment (38–52% across measures). These data will enhance interpretation performance on the WCST-64 for Spanish-speakers living in the U.S.–Mexico Border region. Future work will need to examine the generalizability of these norms to other Hispanic/Latino groups.
F
US Spanish Call Center Data for Realestate AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). US Spanish Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-spanish-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US Spanish Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.
Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.
Speech Data
The dataset features 30 hours of dual-channel call center recordings between native US Spanish speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.
•Participant Diversity:
•
Speakers: 60 native US Spanish speakers from our verified contributor community.

•
Regions: Representing different provinces across USA to ensure accent and dialect variation.

•
Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted agent-customer discussions.

•
Call Duration: Average 5–15 minutes per call.

•
Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.

•
Recording Environment: Captured in noise-free and echo-free conditions.

Topic Diversity
This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.
•Inbound Calls:
•Property Inquiries
•Rental Availability
•Renovation Consultation
•Property Features & Amenities
•Investment Property Evaluation
•Ownership History & Legal Info, and more
•Outbound Calls:
•New Listing Notifications
•Post-Purchase Follow-ups
•Property Recommendations
•Value Updates
•Customer Satisfaction Surveys, and others
Such domain-rich variety ensures model generalization across common real estate support conversations.
Transcription
All recordings are accompanied by precise, manually verified transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., background noise, pauses)
•High transcription accuracy with word error rate below 5% via dual-layer human review.
These transcriptions streamline ASR and NLP development for Spanish real estate voice applications.
Metadata
Detailed metadata accompanies each participant and conversation:
•
Participant Metadata: ID, age, gender, location, accent, and dialect.

•
Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

This enables smart filtering, dialect-focused model training, and structured dataset exploration.
Usage and Applications
This dataset is ideal for voice AI and NLP systems built for the real estate sector:
US Language Learner Market By Language (Spanish, Mandarin Chinese, French,...
verifiedmarketresearch.com
Updated Jun 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
Dataset updated
Jun 23, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
United States
Description
US Language Learner Market size was estimated at USD 74.06 Billion in 2024 and is projected to reach USD 234.55 Billion by 2031, growing at a CAGR of 15.50% from 2024 to 2031.

Key Market Drivers • Increase in Globalized Communication: Heightened demand for multilingual communication skills is observed due to the growing interconnectedness of the world, fostered by international trade and travel. This demand is particularly true in the US, where businesses are increasingly operated on a global scale. As a result, the need to learn new languages to effectively collaborate with international partners is driven by globalization. • Evolving Educational Landscape: A shift towards incorporating language learning at a younger age is being observed in the educational system in the US. This shift is driven by a recognition of the cognitive benefits of multilingualism and the growing importance of foreign languages in the job market. As a result, language learning at a foundational level is exposed to a larger segment of the US population, creating a more fertile ground for continued language acquisition later in life. • Technological Advancements in Learning Methods: The rise of mobile technology and online learning platforms has significantly impacted the US language learner market. Language learning has been made more accessible and convenient than ever before by these advancements. A vast array of interactive and personalized language courses can now be accessed by learners on their own time and schedule. This surge in language learning participation within the US is fueled by this ease of access. • Growing Hispanic Population: A significant and rapidly growing Hispanic population is observed in the US. This demographic shift has led to a heightened demand for Spanish language learning within the country. Spanish language skills are increasingly seen as valuable not only for personal communication but also for professional opportunities in a diverse workforce. The growth of the Spanish language learning segment within the US market is propelled by this demand.
f
Data_Sheet_1_Pilot study of a Spanish language measure of financial toxicity...
frontiersin.figshare.com
docx
Updated Jul 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julia J. Shi; Gwendolyn J. McGinnis; Susan K. Peterson; Nicolette Taku; Ying-Shiuan Chen; Robert K. Yu; Chi-Fang Wu; Tito R. Mendoza; Sanjay S. Shete; Hilary Ma; Robert J. Volk; Sharon H. Giordano; Ya-Chen T. Shih; Diem-Khanh Nguyen; Kelsey W. Kaiser; Grace L. Smith (2023). Data_Sheet_1_Pilot study of a Spanish language measure of financial toxicity in underserved Hispanic cancer patients with low English proficiency.docx [Dataset]. http://doi.org/10.3389/fpsyg.2023.1188783.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2023.1188783.s001
Dataset updated
Jul 10, 2023
Dataset provided by
Frontiers
Authors
Julia J. Shi; Gwendolyn J. McGinnis; Susan K. Peterson; Nicolette Taku; Ying-Shiuan Chen; Robert K. Yu; Chi-Fang Wu; Tito R. Mendoza; Sanjay S. Shete; Hilary Ma; Robert J. Volk; Sharon H. Giordano; Ya-Chen T. Shih; Diem-Khanh Nguyen; Kelsey W. Kaiser; Grace L. Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundFinancial toxicity (FT) reflects multi-dimensional personal economic hardships borne by cancer patients. It is unknown whether measures of FT—to date derived largely from English-speakers—adequately capture economic experiences and financial hardships of medically underserved low English proficiency US Hispanic cancer patients. We piloted a Spanish language FT instrument in this population.MethodsWe piloted a Spanish version of the Economic Strain and Resilience in Cancer (ENRICh) FT measure using qualitative cognitive interviews and surveys in un-/under-insured or medically underserved, low English proficiency, Spanish-speaking Hispanics (UN-Spanish, n = 23) receiving ambulatory oncology care at a public healthcare safety net hospital in the Houston metropolitan area. Exploratory analyses compared ENRICh FT scores amongst the UN-Spanish group to: (1) un-/under-insured English-speaking Hispanics (UN-English, n = 23) from the same public facility and (2) insured English-speaking Hispanics (INS-English, n = 31) from an academic comprehensive cancer center. Multivariable logistic models compared the outcome of severe FT (score > 6).ResultsUN-Spanish Hispanic participants reported high acceptability of the instrument (only 0% responded that the instrument was “very difficult to answer” and 4% that it was “very difficult to understand the questions”; 8% responded that it was “very difficult to remember resources used” and 8% that it was “very difficult to remember the burdens experienced”; and 4% responded that it was “very uncomfortable to respond”). Internal consistency of the FT measure was high (Cronbach’s α = 0.906). In qualitative responses, UN-Spanish Hispanics frequently identified a total lack of credit, savings, or income and food insecurity as aspects contributing to FT. UN-Spanish and UN-English Hispanic patients were younger, had lower education and income, resided in socioeconomically deprived neighborhoods and had more advanced cancer vs. INS-English Hispanics. There was a higher likelihood of severe FT in UN-Spanish (OR = 2.73, 95% CI 0.77–9.70; p = 0.12) and UN-English (OR = 4.13, 95% CI 1.13–15.12; p = 0.03) vs. INS-English Hispanics. A higher likelihood of severely depleted FT coping resources occurred in UN-Spanish (OR = 4.00, 95% CI 1.07–14.92; p = 0.04) and UN-English (OR = 5.73, 95% CI 1.49–22.1; p = 0.01) vs. INS-English. The likelihood of FT did not differ between UN-Spanish and UN-English in both models (p = 0.59 and p = 0.62 respectively).ConclusionIn medically underserved, uninsured Hispanic patients with cancer, comprehensive Spanish-language FT assessment in low English proficiency participants was feasible, acceptable, and internally consistent. Future studies employing tailored FT assessment and intervention should encompass the key privations and hardships in this population.
w
Dataset of books about Spanish language-Latin America-Spoken Spanish
workwithdata.com
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books about Spanish language-Latin America-Spoken Spanish [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_subject&fop0=%3D&fval0=Spanish+language-Latin+America-Spoken+Spanish&j=1&j0=book_subjects
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Latin America
Description
This dataset is about books. It has 8 rows and is filtered where the book subjects is Spanish language-Latin America-Spoken Spanish. It features 9 columns including author, publication date, language, and book publisher.
F
Healthcare Call Center Speech Data: Spanish (USA)
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Healthcare Call Center Speech Data: Spanish (USA) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-spanish-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US Spanish Call Center Speech Dataset for the Healthcare domain designed to enhance the development of call center speech recognition models specifically for the Healthcare industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.
Speech Data
This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Healthcare domain, designed to build robust and accurate customer service speech technology.
•Participant Diversity:
•
Speakers: 60 expert native US Spanish speakers from the FutureBeeAI Community.

•
Regions: Different states/provinces of USA, ensuring a balanced representation of US accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

•Recording Details:
•
Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.

•
Call Duration: Average duration of 5 to 15 minutes per call.

•
Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.

•
Environment: Without background noise and without echo.

Topic Diversity
This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.
•Inbound Calls:
•Appointment Scheduling
•New Patient Registration
•Surgery Consultation
•Consultation regarding Diet, and many more
•Outbound Calls:
•Appointment Reminder
•Health and Wellness Subscription Programs
•Lab Tests Results
•Health Risk Assessments
•Preventive Care Reminders, and many more
This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.
Transcription
To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:
•
Speaker-wise Segmentation: Time-coded segments for both agents and customers.

•
Non-Speech Labels: Tags and labels for non-speech elements.

•
Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

These ready-to-use transcriptions accelerate the development of the Healthcare domain call center conversational AI and ASR models for the US Spanish language.
Metadata
The dataset provides comprehensive metadata for each conversation and participant:
•
Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.

•
Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.

This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of US Spanish call center speech recognition models.
Usage and Applications
This dataset can be used for various applications in the fields of speech recognition, natural language processing, and conversational AI, specifically tailored to the Healthcare domain. Potential use cases include:
•

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/

Number of native Spanish speakers worldwide 2024, by country

Explore at:

7 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jan 15, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

World

Description

Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.

Clear search

Close search

Google apps

Main menu

Number of native Spanish speakers worldwide 2024, by country

Spanish speakers in countries where Spanish is not an official language 2024...

Hispanic population U.S. 2023, by state

Ranking of languages spoken at home in the U.S. 2023

2013 American Community Survey - Table Packages: Detailed Language Spoken in...

Population of the Limited English Proficient (LEP) Speakers by Community...

Share of U.S. population speaking a language besides English at home 2023,...

US Spanish Speecon database

Data from: Language Spoken at Home

2020 American Community Survey: B16005G | NATIVITY BY LANGUAGE SPOKEN AT...

Virginia Population by Language Spoken at Home by Ability to Speak English...

LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED...

The most spoken languages worldwide 2025

Percentage of Hispanic population in the U.S. by state 2023

Data from: Demographically adjusted normative data for the Wisconsin Card...

US Spanish Call Center Data for Realestate AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

US Language Learner Market By Language (Spanish, Mandarin Chinese, French,...

Data_Sheet_1_Pilot study of a Spanish language measure of financial toxicity...

Dataset of books about Spanish language-Latin America-Spoken Spanish

Healthcare Call Center Speech Data: Spanish (USA)

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Number of native Spanish speakers worldwide 2024, by country