65 datasets found
  1. Hispanic population U.S. 2023, by state

    • statista.com
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Hispanic population U.S. 2023, by state [Dataset]. https://www.statista.com/statistics/259850/hispanic-population-of-the-us-by-state/
    Explore at:
    Dataset updated
    Oct 18, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, California had the highest Hispanic population in the United States, with over 15.76 million people claiming Hispanic heritage. Texas, Florida, New York, and Illinois rounded out the top five states for Hispanic residents in that year. History of Hispanic people Hispanic people are those whose heritage stems from a former Spanish colony. The Spanish Empire colonized most of Central and Latin America in the 15th century, which began when Christopher Columbus arrived in the Americas in 1492. The Spanish Empire expanded its territory throughout Central America and South America, but the colonization of the United States did not include the Northeastern part of the United States. Despite the number of Hispanic people living in the United States having increased, the median income of Hispanic households has fluctuated slightly since 1990. Hispanic population in the United States Hispanic people are the second-largest ethnic group in the United States, making Spanish the second most common language spoken in the country. In 2021, about one-fifth of Hispanic households in the United States made between 50,000 to 74,999 U.S. dollars. The unemployment rate of Hispanic Americans has fluctuated significantly since 1990, but has been on the decline since 2010, with the exception of 2020 and 2021, due to the impact of the coronavirus (COVID-19) pandemic.

  2. Number of native Spanish speakers worldwide 2024, by country

    • statista.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.

  3. Spanish speakers in countries where Spanish is not an official language 2024...

    • statista.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Spanish speakers in countries where Spanish is not an official language 2024 [Dataset]. https://www.statista.com/statistics/1276290/number-spanish-speakers-non-hispanic-countries-worldwide/
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    The United States is the non-hispanic country with the largest number of native Spanish speakers in the world, with approximately 41.89 million people with a native command of the language in 2024. However, the European Union had the largest group of non-native speakers with limited proficiency of Spanish, at around 28 million people. Furthermore, Mexico is the country with the largest number of native Spanish speakers in the world as of 2024.

  4. Percentage of Hispanic population in the U.S. by state 2023

    • statista.com
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Percentage of Hispanic population in the U.S. by state 2023 [Dataset]. https://www.statista.com/statistics/259865/percentage-of-hispanic-population-in-the-us-by-state/
    Explore at:
    Dataset updated
    Oct 21, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2022, around 48.59 percent of New Mexico's population was of Hispanic origin, compared to the national percentage of 19.45. California, Texas, and Arizona also registered shares over 30 percent. The distribution of the U.S. population by ethnicity can be accessed here.

  5. F

    US Spanish General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). US Spanish General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-us-spanish
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US Spanish General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of Spanish speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US Spanish communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade Spanish speech models that understand and respond to authentic US accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US Spanish. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native US Spanish speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of USA to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple Spanish speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for US Spanish.
    Voice Assistants: Build smart assistants capable of understanding natural US conversations.
    <span

  6. F

    US Spanish Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). US Spanish Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-spanish-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US Spanish Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native US Spanish speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native US Spanish speakers from our verified contributor community.
    Regions: Representing different provinces across USA to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for Spanish real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

  7. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  8. 2013 American Community Survey - Table Packages: Detailed Language Spoken in...

    • catalog.data.gov
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2023). 2013 American Community Survey - Table Packages: Detailed Language Spoken in the U.S. [Dataset]. https://catalog.data.gov/dataset/2013-american-community-survey-table-packages-detailed-language-spoken-in-the-u-s
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    United States
    Description

    This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.

  9. F

    US Spanish Call Center Data for Healthcare AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). US Spanish Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-spanish-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US Spanish Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Spanish speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.

    Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.

    Speech Data

    The dataset features 30 Hours of dual-channel call center conversations between native US Spanish speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.

    Participant Diversity:
    Speakers: 60 verified native US Spanish speakers from our contributor community.
    Regions: Diverse provinces across USA to ensure broad dialectal representation.
    Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.
    RecordingDetails:
    Conversation Nature: Naturally flowing, unscripted conversations.
    Call Duration: Each session ranges between 5 to 15 minutes.
    Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clear conditions without background noise or echo.

    Topic Diversity

    The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).

    Inbound Calls:
    Appointment Scheduling
    New Patient Registration
    Surgical Consultation
    Dietary Advice and Consultations
    Insurance Coverage Inquiries
    Follow-up Treatment Requests, and more
    OutboundCalls:
    Appointment Reminders
    Preventive Care Campaigns
    Test Results & Lab Reports
    Health Risk Assessment Calls
    Vaccination Updates
    Wellness Subscription Outreach, and more

    These real-world interactions help build speech models that understand healthcare domain nuances and user intent.

    Transcription

    Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.

    Transcription Includes:
    Speaker-identified Dialogues
    Time-coded Segments
    Non-speech Annotations (e.g., silence, cough)
    High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.

    Metadata

    Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.

    Participant Metadata: ID, gender, age, region, accent, and dialect.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    Usage and Applications

    This dataset can be used across a range of healthcare and voice AI use cases:

    <b

  10. Number of students learning Spanish worldwide 2024, by country

    • statista.com
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of students learning Spanish worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/1276319/number-spanish-language-students-country-worldwide/
    Explore at:
    Dataset updated
    Jan 22, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Spain, Worldwide
    Description

    The United States is the country with the largest number of Spanish language students, at approximately 8.59 million people in 2024. The second country is Brazil, with around 4.05 million students of the Spanish language. Moreover, the United States is also the non-hispanic country with the largest number of native Spanish speakers in the world.

  11. b

    Percent of Residents - Hispanic

    • data.baltimorecity.gov
    • hub.arcgis.com
    Updated Feb 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baltimore Neighborhood Indicators Alliance (2020). Percent of Residents - Hispanic [Dataset]. https://data.baltimorecity.gov/maps/bc346d573ee74963beaa8a8b69eb7dfb
    Explore at:
    Dataset updated
    Feb 27, 2020
    Dataset authored and provided by
    Baltimore Neighborhood Indicators Alliance
    Area covered
    Description

    The percentage of persons, out of the total number of persons living in an area, self-identifying their ethnicity as Hispanic or Latino. Hispanic origin can be viewed as the heritage, nationality group, lineage, or country of birth of the person or the person’s parents or ancestors before they arrived in the United States. People who identify their origin as Hispanic, Latino, or Spanish may be of any race. Source: U.S. Census Bureau, American Community SurveyYears Available: 2010, 2011-2015, 2012-2016, 2013-2017, 2014-2018, 2015-2019, 2020, 2017-2021, 2018-2022, 2019-2023Please note: We do not recommend comparing overlapping years of data due to the nature of this dataset. For more information, please visit: https://www.census.gov/programs-surveys/acs/guidance/comparing-acs-data.html

  12. F

    US Spanish Call Center Data for Retail & E-Commerce AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). US Spanish Call Center Data for Retail & E-Commerce AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/retail-call-center-conversation-spanish-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US Spanish Call Center Speech Dataset for the Retail and E-commerce industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish speakers. Featuring over 30 hours of real-world, unscripted audio, it provides authentic human-to-human customer service conversations vital for training robust ASR models.

    Curated by FutureBeeAI, this dataset empowers voice AI developers, data scientists, and language model researchers to build high-accuracy, production-ready models across retail-focused use cases.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native US Spanish speakers. Captured in realistic scenarios, these conversations span diverse retail topics from product inquiries to order cancellations, providing a wide context range for model training and testing.

    Participant Diversity:
    Speakers: 60 native US Spanish speakers from our verified contributor pool.
    Regions: Representing multiple provinces across USA to ensure coverage of various accents and dialects.
    Participant Profile: Balanced gender mix (60% male, 40% female) with age distribution from 18 to 70 years.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted interactions between agents and customers.
    Call Duration: Ranges from 5 to 15 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clean conditions with no echo or background noise.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls with varied conversational outcomes like positive, negative, and neutral, ensuring real-world scenario coverage.

    Inbound Calls:
    Product Inquiries
    Order Cancellations
    Refund & Exchange Requests
    Subscription Queries, and more
    Outbound Calls:
    Order Confirmations
    Upselling & Promotions
    Account Updates
    Loyalty Program Offers
    Customer Verifications, and others

    Such variety enhances your model’s ability to generalize across retail-specific voice interactions.

    Transcription

    All audio files are accompanied by manually curated, time-coded verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    30 hours-coded Segments
    Non-speech Tags (e.g., pauses, cough)
    High transcription accuracy with word error rate < 5% due to double-layered quality checks.

    These transcriptions are production-ready, making model training faster and more accurate.

    Metadata

    Rich metadata is available for each participant and conversation:

    Participant Metadata: ID, age, gender, accent, dialect, and location.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    This granularity supports advanced analytics, dialect filtering, and fine-tuned model evaluation.

    Usage and Applications

    This dataset is ideal for a range of voice AI and NLP applications:

    Automatic Speech Recognition (ASR): Fine-tune Spanish speech-to-text systems.
    <span

  13. F

    Consumer Unit Characteristics: Percent Not Hispanic or Latino by Number of...

    • fred.stlouisfed.org
    json
    Updated Sep 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Consumer Unit Characteristics: Percent Not Hispanic or Latino by Number of Earners: Consumer Units of Two or More People, Three or More Earners [Dataset]. https://fred.stlouisfed.org/series/CXU980286LB0707M
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 25, 2024
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Description

    Graph and download economic data for Consumer Unit Characteristics: Percent Not Hispanic or Latino by Number of Earners: Consumer Units of Two or More People, Three or More Earners (CXU980286LB0707M) from 2004 to 2023 about consumer unit, latino, hispanic, percent, persons, consumer, and USA.

  14. ACS Language Spoken at Home Variables - Boundaries

    • hub.arcgis.com
    • hrtc-oc-cerf.hub.arcgis.com
    • +3more
    Updated Oct 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2018). ACS Language Spoken at Home Variables - Boundaries [Dataset]. https://hub.arcgis.com/maps/527ea2b5ba814c8ca1c34a2945e1b751
    Explore at:
    Dataset updated
    Oct 20, 2018
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    This layer shows language group of language spoken at home by age. This is shown by tract, county, and state boundaries. This service is updated annually to contain the most currently released American Community Survey (ACS) 5-year data, and contains estimates and margins of error. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis. This layer is symbolized to show the percentage of the population age 5+ who speak Spanish at home. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right. Current Vintage: 2019-2023ACS Table(s): B16007Data downloaded from: Census Bureau's API for American Community Survey Date of API call: December 12, 2024National Figures: data.census.govThe United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. For more information about ACS layers, visit the FAQ. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:This layer is updated automatically when the most current vintage of ACS data is released each year, usually in December. The layer always contains the latest available ACS 5-year estimates. It is updated annually within days of the Census Bureau's release schedule. Click here to learn more about ACS data releases.Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2023 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters).The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.

  15. Hispanic population in the U.S. 2023, by origin

    • statista.com
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Hispanic population in the U.S. 2023, by origin [Dataset]. https://www.statista.com/statistics/234852/us-hispanic-population/
    Explore at:
    Dataset updated
    Oct 21, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    As of 2023, around 37.99 million people of Mexican descent were living in the United States - the largest of any Hispanic group. Puerto Ricans, Salvadorans, Cubans, and Dominicans rounded out the top five Hispanic groups living in the U.S. in that year.

  16. F

    US Spanish Call Center Data for Delivery & Logistics AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). US Spanish Call Center Data for Delivery & Logistics AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/delivery-call-center-conversation-spanish-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US Spanish Call Center Speech Dataset for the Delivery and Logistics industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for Spanish-speaking customers. With over 30 hours of real-world, unscripted call center audio, this dataset captures authentic delivery-related conversations essential for training high-performance ASR models.

    Curated by FutureBeeAI, this dataset empowers AI teams, logistics tech providers, and NLP researchers to build accurate, production-ready models for customer support automation in delivery and logistics.

    Speech Data

    The dataset contains 30 hours of dual-channel call center recordings between native US Spanish speakers. Captured across various delivery and logistics service scenarios, these conversations cover everything from order tracking to missed delivery resolutions offering a rich, real-world training base for AI models.

    Participant Diversity:
    Speakers: 60 native US Spanish speakers from our verified contributor pool.
    Regions: Multiple provinces of USA for accent and dialect diversity.
    Participant Profile: Balanced gender distribution (60% male, 40% female) with ages ranging from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted customer-agent dialogues.
    Call Duration: 5 to 15 minutes on average.
    Audio Format: Stereo WAV, 16-bit depth, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in clean, noise-free, echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound delivery-related conversations, covering varied outcomes (positive, negative, neutral) to train adaptable voice models.

    Inbound Calls:
    Order Tracking
    Delivery Complaints
    Undeliverable Addresses
    Return Process Enquiries
    Delivery Method Selection
    Order Modifications, and more
    Outbound Calls:
    Delivery Confirmations
    Subscription Offer Calls
    Incorrect Address Follow-ups
    Missed Delivery Notifications
    Delivery Feedback Surveys
    Out-of-Stock Alerts, and others

    This comprehensive coverage reflects real-world logistics workflows, helping voice AI systems interpret context and intent with precision.

    Transcription

    All recordings come with high-quality, human-generated verbatim transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., pauses, noise)
    High transcription accuracy with word error rate under 5% via dual-layer quality checks.

    These transcriptions support fast, reliable model development for Spanish voice AI applications in the delivery sector.

    Metadata

    Detailed metadata is included for each participant and conversation:

    Participant Metadata: ID, age, gender, region, accent, dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical attributes.

    This metadata aids in training specialized models, filtering demographics, and running advanced analytics.

    Usage and Applications

    This

  17. Census of Population and Housing, 2000 [United States]: Summary File 2, Utah...

    • icpsr.umich.edu
    ascii, sas, spss +1
    Updated May 24, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States. Bureau of the Census (2013). Census of Population and Housing, 2000 [United States]: Summary File 2, Utah [Dataset]. http://doi.org/10.3886/ICPSR13277.v2
    Explore at:
    spss, sas, stata, asciiAvailable download formats
    Dataset updated
    May 24, 2013
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    United States. Bureau of the Census
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/13277/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/13277/terms

    Time period covered
    2000
    Area covered
    Utah, United States
    Description

    Summary File 2 contains 100-percent United States decennial Census data, which is the information compiled from the questions asked of all people and about every housing unit. Population items include sex, age, race, Hispanic or Latino origin, household relationship, and group quarters occupancy. Housing items include occupancy status, vacancy status, and tenure (owner-occupied or renter- occupied). The 100-percent data are presented in 36 population tables ("PCT") and 11 housing tables ("HCT") down to the census tract level. Each table is iterated for 250 population groups: the total population, 132 race groups, 78 American Indian and Alaska Native tribe categories (reflecting 39 individual tribes), and 39 Hispanic or Latino groups. The presentation of tables for any of the 250 population groups is subject to a population threshold of 100 or more people, that is, if there were fewer than 100 people in a specific population group in a specific geographic area, their population and housing characteristics data are not available for that geographic area.

  18. F

    Consumer Unit Characteristics: Percent Not Hispanic or Latino by Size of...

    • fred.stlouisfed.org
    json
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Consumer Unit Characteristics: Percent Not Hispanic or Latino by Size of Consumer Unit: Two or More People in Consumer Unit [Dataset]. https://fred.stlouisfed.org/series/CXU980286LB0503M
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 25, 2024
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Description

    Graph and download economic data for Consumer Unit Characteristics: Percent Not Hispanic or Latino by Size of Consumer Unit: Two or More People in Consumer Unit (CXU980286LB0503M) from 2004 to 2023 about consumer unit, latino, hispanic, percent, persons, consumer, and USA.

  19. d

    Risk Communication on Social Media to Spanish-Speaking Populations

    • search.dataone.org
    • hydroshare.org
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jared Stewart; Peter Howe; Yajie Li (2021). Risk Communication on Social Media to Spanish-Speaking Populations [Dataset]. https://search.dataone.org/view/sha256%3Af8835c7e6213de47ee7ecac507f1e082157c3d3fc3faf7382544243acc382d2b
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    Jared Stewart; Peter Howe; Yajie Li
    Description

    Heat is the leading cause of weather related fatalities in the United States. It is important that agencies and organizations understand heat and other extreme weather related risks, especially as climate change exacerbates the frequency and severity of extreme weather events. This research focused on communication strategies used by the National Weather Service, via official twitter feeds, from areas with high Hispanic populations. It can be concluded that many forecasting offices do not currently meet the communication needs of Spanish-Speaking populations and that critical alerts about life threatening risks should be made more frequently in Spanish.

  20. Primary language spoken by the Medicaid and CHIP population

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Jul 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Medicare & Medicaid Services (2025). Primary language spoken by the Medicaid and CHIP population [Dataset]. https://catalog.data.gov/dataset/primary-language-spoken-by-the-medicaid-and-chip-population
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    Centers for Medicare & Medicaid Services
    Description

    This data set includes annual counts and percentages of Medicaid and Children’s Health Insurance Program (CHIP) enrollees by primary language spoken (English, Spanish, and all other languages). Results are shown overall; by state; and by five subpopulation topics: race and ethnicity, age group, scope of Medicaid and CHIP benefits, urban or rural residence, and eligibility category. These results were generated using Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files (TAF) Release 1 data and the Race/Ethnicity Imputation Companion File. This data set includes Medicaid and CHIP enrollees in all 50 states, the District of Columbia, Puerto Rico, and the U.S. Virgin Islands who were enrolled for at least one day in the calendar year, except where otherwise noted. Enrollees in Guam, American Samoa, the Northern Mariana Islands, and select states with data quality issues with the primary language variable in TAF are not included. Results shown for the race and ethnicity subpopulation topic exclude enrollees in the U.S. Virgin Islands. Results shown overall (where subpopulation topic is "Total enrollees") exclude enrollees younger than age 5 and enrollees in the U.S. Virgin Islands. Results for states with TAF data quality issues in the year have a value of "Unusable data." Some rows in the data set have a value of "DS," which indicates that data were suppressed according to the Centers for Medicare & Medicaid Services’ Cell Suppression Policy for values between 1 and 10. This data set is based on the brief: "Primary language spoken by the Medicaid and CHIP population in 2020." Enrollees are assigned to a primary language category based on their reported ISO language code in TAF (English/missing, Spanish, and all other language codes) (Primary Language). Enrollees are assigned to a race and ethnicity subpopulation using the state-reported race and ethnicity information in TAF when it is available and of good quality; if it is missing or unreliable, race and ethnicity is indirectly estimated using an enhanced version of Bayesian Improved Surname Geocoding (BISG) (Race and ethnicity of the national Medicaid and CHIP population in 2020). Enrollees are assigned to an age group subpopulation using age as of December 31st of the calendar year. Enrollees are assigned to the comprehensive benefits or limited benefits subpopulation according to the criteria in the "Identifying Beneficiaries with Full-Scope, Comprehensive, and Limited Benefits in the TAF" DQ Atlas brief. Enrollees are assigned to an urban or rural subpopulation based on the 2010 Rural-Urban Commuting Area (RUCA) code associated with their home or mailing address ZIP code in TAF (Rural Medicaid and CHIP enrollees in 2020). Enrollees are assigned to an eligibility category subpopulation using their latest reported eligibility group code, CHIP code, and age in the calendar year. Please refer to the full brief for additional context about the methodology and detailed findings. Future updates to this data set will include more recent data years as the TAF data become available.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2024). Hispanic population U.S. 2023, by state [Dataset]. https://www.statista.com/statistics/259850/hispanic-population-of-the-us-by-state/
Organization logo

Hispanic population U.S. 2023, by state

Explore at:
16 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 18, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description

In 2023, California had the highest Hispanic population in the United States, with over 15.76 million people claiming Hispanic heritage. Texas, Florida, New York, and Illinois rounded out the top five states for Hispanic residents in that year. History of Hispanic people Hispanic people are those whose heritage stems from a former Spanish colony. The Spanish Empire colonized most of Central and Latin America in the 15th century, which began when Christopher Columbus arrived in the Americas in 1492. The Spanish Empire expanded its territory throughout Central America and South America, but the colonization of the United States did not include the Northeastern part of the United States. Despite the number of Hispanic people living in the United States having increased, the median income of Hispanic households has fluctuated slightly since 1990. Hispanic population in the United States Hispanic people are the second-largest ethnic group in the United States, making Spanish the second most common language spoken in the country. In 2021, about one-fifth of Hispanic households in the United States made between 50,000 to 74,999 U.S. dollars. The unemployment rate of Hispanic Americans has fluctuated significantly since 1990, but has been on the decline since 2010, with the exception of 2020 and 2021, due to the impact of the coronavirus (COVID-19) pandemic.

Search
Clear search
Close search
Google apps
Main menu