65 datasets found

Hispanic population U.S. 2023, by state
statista.com
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Hispanic population U.S. 2023, by state [Dataset]. https://www.statista.com/statistics/259850/hispanic-population-of-the-us-by-state/
Explore at:
Dataset updated
Oct 18, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
In 2023, California had the highest Hispanic population in the United States, with over 15.76 million people claiming Hispanic heritage. Texas, Florida, New York, and Illinois rounded out the top five states for Hispanic residents in that year. History of Hispanic people Hispanic people are those whose heritage stems from a former Spanish colony. The Spanish Empire colonized most of Central and Latin America in the 15th century, which began when Christopher Columbus arrived in the Americas in 1492. The Spanish Empire expanded its territory throughout Central America and South America, but the colonization of the United States did not include the Northeastern part of the United States. Despite the number of Hispanic people living in the United States having increased, the median income of Hispanic households has fluctuated slightly since 1990. Hispanic population in the United States Hispanic people are the second-largest ethnic group in the United States, making Spanish the second most common language spoken in the country. In 2021, about one-fifth of Hispanic households in the United States made between 50,000 to 74,999 U.S. dollars. The unemployment rate of Hispanic Americans has fluctuated significantly since 1990, but has been on the decline since 2010, with the exception of 2020 and 2021, due to the impact of the coronavirus (COVID-19) pandemic.
Number of native Spanish speakers worldwide 2024, by country
statista.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
Explore at:
Dataset updated
Jan 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.
Spanish speakers in countries where Spanish is not an official language 2024...
statista.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Spanish speakers in countries where Spanish is not an official language 2024 [Dataset]. https://www.statista.com/statistics/1276290/number-spanish-speakers-non-hispanic-countries-worldwide/
Explore at:
Dataset updated
Jan 15, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
The United States is the non-hispanic country with the largest number of native Spanish speakers in the world, with approximately 41.89 million people with a native command of the language in 2024. However, the European Union had the largest group of non-native speakers with limited proficiency of Spanish, at around 28 million people. Furthermore, Mexico is the country with the largest number of native Spanish speakers in the world as of 2024.
Percentage of Hispanic population in the U.S. by state 2023
statista.com
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Percentage of Hispanic population in the U.S. by state 2023 [Dataset]. https://www.statista.com/statistics/259865/percentage-of-hispanic-population-in-the-us-by-state/
Explore at:
Dataset updated
Oct 21, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
In 2022, around 48.59 percent of New Mexico's population was of Hispanic origin, compared to the national percentage of 19.45. California, Texas, and Arizona also registered shares over 30 percent. The distribution of the U.S. population by ethnicity can be accessed here.
F
US Spanish General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). US Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-us-spanish
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US Spanish communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic US accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native US Spanish speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of USA to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple Spanish speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for US Spanish.

•
Voice Assistants: Build smart assistants capable of understanding natural US conversations.

<span
F
US Spanish Call Center Data for Realestate AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). US Spanish Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-spanish-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US Spanish Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.
Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.
Speech Data
The dataset features 30 hours of dual-channel call center recordings between native US Spanish speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.
•Participant Diversity:
•
Speakers: 60 native US Spanish speakers from our verified contributor community.

•
Regions: Representing different provinces across USA to ensure accent and dialect variation.

•
Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted agent-customer discussions.

•
Call Duration: Average 5–15 minutes per call.

•
Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.

•
Recording Environment: Captured in noise-free and echo-free conditions.

Topic Diversity
This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.
•Inbound Calls:
•Property Inquiries
•Rental Availability
•Renovation Consultation
•Property Features & Amenities
•Investment Property Evaluation
•Ownership History & Legal Info, and more
•Outbound Calls:
•New Listing Notifications
•Post-Purchase Follow-ups
•Property Recommendations
•Value Updates
•Customer Satisfaction Surveys, and others
Such domain-rich variety ensures model generalization across common real estate support conversations.
Transcription
All recordings are accompanied by precise, manually verified transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., background noise, pauses)
•High transcription accuracy with word error rate below 5% via dual-layer human review.
These transcriptions streamline ASR and NLP development for Spanish real estate voice applications.
Metadata
Detailed metadata accompanies each participant and conversation:
•
Participant Metadata: ID, age, gender, location, accent, and dialect.

•
Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

This enables smart filtering, dialect-focused model training, and structured dataset exploration.
Usage and Applications
This dataset is ideal for voice AI and NLP systems built for the real estate sector:
The most spoken languages worldwide 2025
statista.com
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Explore at:
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description
In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.
2013 American Community Survey - Table Packages: Detailed Language Spoken in...
catalog.data.gov
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Census Bureau (2023). 2013 American Community Survey - Table Packages: Detailed Language Spoken in the U.S. [Dataset]. https://catalog.data.gov/dataset/2013-american-community-survey-table-packages-detailed-language-spoken-in-the-u-s
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
United States
Description
This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.
F
US Spanish Call Center Data for Healthcare AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). US Spanish Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-spanish-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US Spanish Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Spanish speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
Speech Data
The dataset features 30 Hours of dual-channel call center conversations between native US Spanish speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
•Participant Diversity:
•
Speakers: 60 verified native US Spanish speakers from our contributor community.

•
Regions: Diverse provinces across USA to ensure broad dialectal representation.

•
Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.

•RecordingDetails:
•
Conversation Nature: Naturally flowing, unscripted conversations.

•
Call Duration: Each session ranges between 5 to 15 minutes.

•
Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clear conditions without background noise or echo.

Topic Diversity
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
•Inbound Calls:
•Appointment Scheduling
•New Patient Registration
•Surgical Consultation
•Dietary Advice and Consultations
•Insurance Coverage Inquiries
•Follow-up Treatment Requests, and more
•OutboundCalls:
•Appointment Reminders
•Preventive Care Campaigns
•Test Results & Lab Reports
•Health Risk Assessment Calls
•Vaccination Updates
•Wellness Subscription Outreach, and more
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Transcription
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
•Transcription Includes:
•Speaker-identified Dialogues
•Time-coded Segments
•Non-speech Annotations (e.g., silence, cough)
•High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.
Metadata
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
•
Participant Metadata: ID, gender, age, region, accent, and dialect.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

Usage and Applications
This dataset can be used across a range of healthcare and voice AI use cases:
•
<b
Number of students learning Spanish worldwide 2024, by country
statista.com
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of students learning Spanish worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/1276319/number-spanish-language-students-country-worldwide/
Explore at:
Dataset updated
Jan 22, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Spain, Worldwide
Description
The United States is the country with the largest number of Spanish language students, at approximately 8.59 million people in 2024. The second country is Brazil, with around 4.05 million students of the Spanish language. Moreover, the United States is also the non-hispanic country with the largest number of native Spanish speakers in the world.
b
Percent of Residents - Hispanic
data.baltimorecity.gov
hub.arcgis.com
Updated Feb 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Baltimore Neighborhood Indicators Alliance (2020). Percent of Residents - Hispanic [Dataset]. https://data.baltimorecity.gov/maps/bc346d573ee74963beaa8a8b69eb7dfb
Explore at:
Dataset updated
Feb 27, 2020
Dataset authored and provided by
Baltimore Neighborhood Indicators Alliance
Area covered

Description
The percentage of persons, out of the total number of persons living in an area, self-identifying their ethnicity as Hispanic or Latino. Hispanic origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person’s parents or ancestors before they arrived in the United States. People who identify their origin as Hispanic, Latino, or Spanish may be of any race. Source: U.S. Census Bureau, American Community SurveyYears Available: 2010, 2011-2015, 2012-2016, 2013-2017, 2014-2018, 2015-2019, 2020, 2017-2021, 2018-2022, 2019-2023Please note: We do not recommend comparing overlapping years of data due to the nature of this dataset. For more information, please visit: https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data.html
F
US Spanish Call Center Data for Retail & E-Commerce AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). US Spanish Call Center Data for Retail & E-Commerce AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-spanish-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US Spanish Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.
Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native US Spanish speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.
•Participant Diversity:
•
Speakers: 60 native US Spanish speakers from our verified contributor pool.

•
Regions: Representing multiple provinces across USA to ensure coverage of various accents and dialects.

•
Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.

•
Call Duration: Ranges from 5 to 15 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clean conditions with no echo or background noise.

Topic Diversity
This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.
•Inbound Calls:
•Product Inquiries
•Order Cancellations
•Refund & Exchange Requests
•Subscription Queries, and more
•Outbound Calls:
•Order Confirmations
•Upselling & Promotions
•Account Updates
•Loyalty Program Offers
•Customer Verifications, and others
Such variety enhances your model’s ability to generalize across retail-specific voice interactions.
Transcription
All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•30 hours-coded Segments
•Non-speech Tags (e.g., pauses, cough)
•High transcription accuracy with word error rate < 5% due to double-layered quality checks.
These transcriptions are production-ready, making model training faster and more accurate.
Metadata
Rich metadata is available for each participant and conversation:
•
Participant Metadata: ID, age, gender, accent, dialect, and location.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.
Usage and Applications
This dataset is ideal for a range of voice AI and NLP applications:
•
Automatic Speech Recognition (ASR): Fine-tune Spanish speech-to-text systems.

<span
F
Consumer Unit Characteristics: Percent Not Hispanic or Latino by Number of...
fred.stlouisfed.org
json
Updated Sep 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Consumer Unit Characteristics: Percent Not Hispanic or Latino by Number of Earners: Consumer Units of Two or More People, Three or More Earners [Dataset]. https://fred.stlouisfed.org/series/CXU980286LB0707M
Explore at:
jsonAvailable download formats
Dataset updated
Sep 25, 2024
License
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Description
Graph and download economic data for Consumer Unit Characteristics: Percent Not Hispanic or Latino by Number of Earners: Consumer Units of Two or More People, Three or More Earners (CXU980286LB0707M) from 2004 to 2023 about consumer unit, latino, hispanic, percent, persons, consumer, and USA.
ACS Language Spoken at Home Variables - Boundaries
hub.arcgis.com
hrtc-oc-cerf.hub.arcgis.com
+3more
Updated Oct 20, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2018). ACS Language Spoken at Home Variables - Boundaries [Dataset]. https://hub.arcgis.com/maps/527ea2b5ba814c8ca1c34a2945e1b751
Explore at:
Dataset updated
Oct 20, 2018
Dataset authored and provided by
Esrihttp://esri.com/
Area covered

Description
This layer shows language group of language spoken at home by age. This is shown by tract, county, and state boundaries. This service is updated annually to contain the most currently released American Community Survey (ACS) 5-year data, and contains estimates and margins of error. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis. This layer is symbolized to show the percentage of the population age 5+ who speak Spanish at home. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right. Current Vintage: 2019-2023ACS Table(s): B16007Data downloaded from: Census Bureau's API for American Community Survey Date of API call: December 12, 2024National Figures: data.census.govThe United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. For more information about ACS layers, visit the FAQ. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:This layer is updated automatically when the most current vintage of ACS data is released each year, usually in December. The layer always contains the latest available ACS 5-year estimates. It is updated annually within days of the Census Bureau's release schedule. Click here to learn more about ACS data releases.Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2023 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters).The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.
Hispanic population in the U.S. 2023, by origin
statista.com
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Hispanic population in the U.S. 2023, by origin [Dataset]. https://www.statista.com/statistics/234852/us-hispanic-population/
Explore at:
Dataset updated
Oct 21, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description
As of 2023, around 37.99 million people of Mexican descent were living in the United States - the largest of any Hispanic group. Puerto Ricans, Salvadorans, Cubans, and Dominicans rounded out the top five Hispanic groups living in the U.S. in that year.
F
US Spanish Call Center Data for Delivery & Logistics AI
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). US Spanish Call Center Data for Delivery & Logistics AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/delivery-call-center-conversation-spanish-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
This US Spanish Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.
Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.
Speech Data
The dataset contains 30 hours of dual-channel call center recordings between native US Spanish speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.
•Participant Diversity:
•
Speakers: 60 native US Spanish speakers from our verified contributor pool.

•
Regions: Multiple provinces of USA for accent and dialect diversity.

•
Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.

•Recording Details:
•
Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.

•
Call Duration: 5 to 15 minutes on average.

•
Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.

•
Recording Environment: Captured in clean, noise-free, echo-free conditions.

Topic Diversity
This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.
•Inbound Calls:
•Order Tracking
•Delivery Complaints
•Undeliverable Addresses
•Return Process Enquiries
•Delivery Method Selection
•Order Modifications, and more
•Outbound Calls:
•Delivery Confirmations
•Subscription Offer Calls
•Incorrect Address Follow-ups
•Missed Delivery Notifications
•Delivery Feedback Surveys
•Out-of-Stock Alerts, and others
This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.
Transcription
All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.
•Transcription Includes:
•Speaker-Segmented Dialogues
•Time-coded Segments
•Non-speech Tags (e.g., pauses, noise)
•High transcription accuracy with word error rate under 5% via dual-layer quality checks.
These transcriptions support fast, reliable model development for Spanish voice AI applications in the delivery sector.
Metadata
Detailed metadata is included for each participant and conversation:
•
Participant Metadata: ID, age, gender, region, accent, dialect.

•
Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.

This metadata aids in training specialized models, filtering demographics, and running advanced analytics.
Usage and Applications
This
Census of Population and Housing, 2000 [United States]: Summary File 2, Utah...
icpsr.umich.edu
ascii, sas, spss +1
Updated May 24, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States. Bureau of the Census (2013). Census of Population and Housing, 2000 [United States]: Summary File 2, Utah [Dataset]. http://doi.org/10.3886/ICPSR13277.v2
Explore at:
spss, sas, stata, asciiAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR13277.v2
Dataset updated
May 24, 2013
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
United States. Bureau of the Census
License
https://www.icpsr.umich.edu/web/ICPSR/studies/13277/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/13277/terms
Time period covered
2000
Area covered
Utah, United States
Description
Summary File 2 contains 100-percent United States decennial Census data, which is the information compiled from the questions asked of all people and about every housing unit. Population items include sex, age, race, Hispanic or Latino origin, household relationship, and group quarters occupancy. Housing items include occupancy status, vacancy status, and tenure (owner-occupied or renter- occupied). The 100-percent data are presented in 36 population tables ("PCT") and 11 housing tables ("HCT") down to the census tract level. Each table is iterated for 250 population groups: the total population, 132 race groups, 78 American Indian and Alaska Native tribe categories (reflecting 39 individual tribes), and 39 Hispanic or Latino groups. The presentation of tables for any of the 250 population groups is subject to a population threshold of 100 or more people, that is, if there were fewer than 100 people in a specific population group in a specific geographic area, their population and housing characteristics data are not available for that geographic area.
F
Consumer Unit Characteristics: Percent Not Hispanic or Latino by Size of...
fred.stlouisfed.org
json
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Consumer Unit Characteristics: Percent Not Hispanic or Latino by Size of Consumer Unit: Two or More People in Consumer Unit [Dataset]. https://fred.stlouisfed.org/series/CXU980286LB0503M
Explore at:
jsonAvailable download formats
Dataset updated
Sep 25, 2024
License
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Description
Graph and download economic data for Consumer Unit Characteristics: Percent Not Hispanic or Latino by Size of Consumer Unit: Two or More People in Consumer Unit (CXU980286LB0503M) from 2004 to 2023 about consumer unit, latino, hispanic, percent, persons, consumer, and USA.
d
Risk Communication on Social Media to Spanish-Speaking Populations
search.dataone.org
hydroshare.org
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jared Stewart; Peter Howe; Yajie Li (2021). Risk Communication on Social Media to Spanish-Speaking Populations [Dataset]. https://search.dataone.org/view/sha256%3Af8835c7e6213de47ee7ecac507f1e082157c3d3fc3faf7382544243acc382d2b
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Jared Stewart; Peter Howe; Yajie Li
Description
Heat is the leading cause of weather related fatalities in the United States. It is important that agencies and organizations understand heat and other extreme weather related risks, especially as climate change exacerbates the frequency and severity of extreme weather events. This research focused on communication strategies used by the National Weather Service, via official twitter feeds, from areas with high Hispanic populations. It can be concluded that many forecasting offices do not currently meet the communication needs of Spanish-Speaking populations and that critical alerts about life threatening risks should be made more frequently in Spanish.
Primary language spoken by the Medicaid and CHIP population
catalog.data.gov
data.virginia.gov
+2more
Updated Jul 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2025). Primary language spoken by the Medicaid and CHIP population [Dataset]. https://catalog.data.gov/dataset/primary-language-spoken-by-the-medicaid-and-chip-population
Explore at:
Dataset updated
Jul 11, 2025
Dataset provided by
Centers for Medicare & Medicaid Services
Description
This data set includes annual counts and percentages of Medicaid and Children’s Health Insurance Program (CHIP) enrollees by primary language spoken (English, Spanish, and all other languages). Results are shown overall; by state; and by five subpopulation topics: race and ethnicity, age group, scope of Medicaid and CHIP benefits, urban or rural residence, and eligibility category. These results were generated using Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files (TAF) Release 1 data and the Race/Ethnicity Imputation Companion File. This data set includes Medicaid and CHIP enrollees in all 50 states, the District of Columbia, Puerto Rico, and the U.S. Virgin Islands who were enrolled for at least one day in the calendar year, except where otherwise noted. Enrollees in Guam, American Samoa, the Northern Mariana Islands, and select states with data quality issues with the primary language variable in TAF are not included. Results shown for the race and ethnicity subpopulation topic exclude enrollees in the U.S. Virgin Islands. Results shown overall (where subpopulation topic is "Total enrollees") exclude enrollees younger than age 5 and enrollees in the U.S. Virgin Islands. Results for states with TAF data quality issues in the year have a value of "Unusable data." Some rows in the data set have a value of "DS," which indicates that data were suppressed according to the Centers for Medicare & Medicaid Services’ Cell Suppression Policy for values between 1 and 10. This data set is based on the brief: "Primary language spoken by the Medicaid and CHIP population in 2020." Enrollees are assigned to a primary language category based on their reported ISO language code in TAF (English/missing, Spanish, and all other language codes) (Primary Language). Enrollees are assigned to a race and ethnicity subpopulation using the state-reported race and ethnicity information in TAF when it is available and of good quality; if it is missing or unreliable, race and ethnicity is indirectly estimated using an enhanced version of Bayesian Improved Surname Geocoding (BISG) (Race and ethnicity of the national Medicaid and CHIP population in 2020). Enrollees are assigned to an age group subpopulation using age as of December 31st of the calendar year. Enrollees are assigned to the comprehensive benefits or limited benefits subpopulation according to the criteria in the "Identifying Beneficiaries with Full-Scope, Comprehensive, and Limited Benefits in the TAF" DQ Atlas brief. Enrollees are assigned to an urban or rural subpopulation based on the 2010 Rural-Urban Commuting Area (RUCA) code associated with their home or mailing address ZIP code in TAF (Rural Medicaid and CHIP enrollees in 2020). Enrollees are assigned to an eligibility category subpopulation using their latest reported eligibility group code, CHIP code, and age in the calendar year. Please refer to the full brief for additional context about the methodology and detailed findings. Future updates to this data set will include more recent data years as the TAF data become available.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2024). Hispanic population U.S. 2023, by state [Dataset]. https://www.statista.com/statistics/259850/hispanic-population-of-the-us-by-state/

Hispanic population U.S. 2023, by state

Explore at:

16 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Oct 18, 2024

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

2023

Area covered

United States

Description

In 2023, California had the highest Hispanic population in the United States, with over 15.76 million people claiming Hispanic heritage. Texas, Florida, New York, and Illinois rounded out the top five states for Hispanic residents in that year. History of Hispanic people Hispanic people are those whose heritage stems from a former Spanish colony. The Spanish Empire colonized most of Central and Latin America in the 15th century, which began when Christopher Columbus arrived in the Americas in 1492. The Spanish Empire expanded its territory throughout Central America and South America, but the colonization of the United States did not include the Northeastern part of the United States. Despite the number of Hispanic people living in the United States having increased, the median income of Hispanic households has fluctuated slightly since 1990. Hispanic population in the United States Hispanic people are the second-largest ethnic group in the United States, making Spanish the second most common language spoken in the country. In 2021, about one-fifth of Hispanic households in the United States made between 50,000 to 74,999 U.S. dollars. The unemployment rate of Hispanic Americans has fluctuated significantly since 1990, but has been on the decline since 2010, with the exception of 2020 and 2021, due to the impact of the coronavirus (COVID-19) pandemic.

Clear search

Close search

Google apps

Main menu

Hispanic population U.S. 2023, by state

Number of native Spanish speakers worldwide 2024, by country

Spanish speakers in countries where Spanish is not an official language 2024...

Percentage of Hispanic population in the U.S. by state 2023

US Spanish General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

US Spanish Call Center Data for Realestate AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

The most spoken languages worldwide 2025

2013 American Community Survey - Table Packages: Detailed Language Spoken in...

US Spanish Call Center Data for Healthcare AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Number of students learning Spanish worldwide 2024, by country

Percent of Residents - Hispanic

US Spanish Call Center Data for Retail & E-Commerce AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Consumer Unit Characteristics: Percent Not Hispanic or Latino by Number of...

ACS Language Spoken at Home Variables - Boundaries

Hispanic population in the U.S. 2023, by origin

US Spanish Call Center Data for Delivery & Logistics AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Census of Population and Housing, 2000 [United States]: Summary File 2, Utah...

Consumer Unit Characteristics: Percent Not Hispanic or Latino by Size of...

Risk Communication on Social Media to Spanish-Speaking Populations

Primary language spoken by the Medicaid and CHIP population

Hispanic population U.S. 2023, by state