76 datasets found
  1. Share of U.S. population speaking a language besides English at home 2023,...

    • statista.com
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of U.S. population speaking a language besides English at home 2023, by state [Dataset]. https://www.statista.com/statistics/312940/share-of-us-population-speaking-a-language-other-than-english-at-home-by-state/
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    As of 2023, more than ** percent of people in the United States spoke a language other than English at home. California had the highest share among all U.S. states, with ** percent of its population speaking a language other than English at home.

  2. Latin America: level of English proficiency 2023, by country

    • statista.com
    • ai-chatbox.pro
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Latin America: level of English proficiency 2023, by country [Dataset]. https://www.statista.com/statistics/1053066/english-proficiency-latin-america/
    Explore at:
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    LAC, Americas, Latin America
    Description

    Argentina scored 562 out of a maximum of 800 points in the English Proficiency Index 2023. That was the highest score among all Latin American countries included in the survey. The Argentine capital, Buenos Aires, also received the highest English proficiency score among all the Latin American cities analyzed. Mexico and Haiti received the lowest scores in the region.

  3. d

    Population and Languages of the Limited English Proficient (LEP) Speakers by...

    • catalog.data.gov
    • data.cityofnewyork.us
    Updated Jan 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). Population and Languages of the Limited English Proficient (LEP) Speakers by Community District [Dataset]. https://catalog.data.gov/dataset/population-and-languages-of-the-limited-english-proficient-lep-speakers-by-community-distr
    Explore at:
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    data.cityofnewyork.us
    Description

    Many residents of New York City speak more than one language; a number of them speak and understand non-English languages more fluently than English. This dataset, derived from the Census Bureau's American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level. There are 59 community districts throughout NYC, with each district being represented by a Community Board.

  4. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  5. F

    American English General Conversation Speech Dataset for ASR

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.

    Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.

    Speech Data

    The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.

    Participant Diversity:
    Speakers: 60 verified native US English speakers from FutureBeeAI’s contributor community.
    Regions: Representing various provinces of United States of America to ensure dialectal diversity and demographic balance.
    Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.
    Recording Details:
    Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.
    Duration: Each conversation ranges from 15 to 60 minutes.
    Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.
    Environment: Quiet, echo-free settings with no background noise.

    Topic Diversity

    The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.

    Sample Topics Include:
    Family & Relationships
    Food & Recipes
    Education & Career
    Healthcare Discussions
    Social Issues
    Technology & Gadgets
    Travel & Local Culture
    Shopping & Marketplace Experiences, and many more.

    Transcription

    Each audio file is paired with a human-verified, verbatim transcription available in JSON format.

    Transcription Highlights:
    Speaker-segmented dialogues
    Time-coded utterances
    Non-speech elements (pauses, laughter, etc.)
    High transcription accuracy, achieved through double QA pass, average WER < 5%

    These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.

    Metadata

    The dataset comes with granular metadata for both speakers and recordings:

    Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.
    Recording Metadata: Topic, duration, audio format, device type, and sample rate.

    Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.

    Usage and Applications

    This dataset is a versatile resource for multiple English speech and language AI applications:

    ASR Development: Train accurate speech-to-text systems for US English.
    Voice Assistants: Build smart assistants capable of understanding natural American conversations.
    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:

  6. Latin America: English proficiency level 2023, by city

    • statista.com
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Latin America: English proficiency level 2023, by city [Dataset]. https://www.statista.com/statistics/1053076/english-proficiency-cities-latin-america/
    Explore at:
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Latin America, LAC
    Description

    Buenos Aires, Argentina scored 584 out of a maximum of 800 points in the English Proficiency Index 2023. That was the highest score among all the Latin American cities surveyed. In that year, nine cities in the region socred in the "high proficiency" range (550-599 points).

  7. d

    American English

    • deepfo.com
    csv, excel, html, xml
    Updated Jun 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepfo.com by Polyolbion SL, Barcelona, Spain (2018). American English [Dataset]. https://deepfo.com/en/most/American-English
    Explore at:
    csv, xml, html, excelAvailable download formats
    Dataset updated
    Jun 29, 2018
    Dataset authored and provided by
    Deepfo.com by Polyolbion SL, Barcelona, Spain
    License

    https://deepfo.com/documentacion.php?idioma=enhttps://deepfo.com/documentacion.php?idioma=en

    Area covered
    United States
    Description

    American English. name, Ethnologue status, language family, office, IETF, alphabet, ISO 6391, ISO 6392, ISO 6393, speakers, Manuscript, micronation, publisher, former country, is a dialect

  8. s

    Santa Barbara Corpus of Spoken American English

    • marketplace.sshopencloud.eu
    Updated Apr 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Santa Barbara Corpus of Spoken American English [Dataset]. https://marketplace.sshopencloud.eu/dataset/EmX4NA
    Explore at:
    Dataset updated
    Apr 24, 2020
    Area covered
    Santa Barbara, United States
    Description

    The Santa Barbara Corpus of Spoken American English is based on a large body of recordings of naturally occurring spoken interaction from all over the United States. The Santa Barbara Corpus represents a wide variety of people of different regional origins, ages, occupations, genders, and ethnic and social backgrounds. The predominant form of language use represented is face-to-face conversation, but the corpus also documents many other ways that that people use language in their everyday lives: telephone conversations, card games, food preparation, on-the-job talk, classroom lectures, sermons, story-telling, town hall meetings, tour-guide spiels, and more. The Santa Barbara Corpus was compiled by researchers in the Linguistics Department of the University of California, Santa Barbara. The Director of the Santa Barbara Corpus is John W. Du Bois, working with Associate Editors Wallace L. Chafe and Sandra A. Thompson (all of UC Santa Barbara), and Charles Meyer (UMass, Boston). For the publication of Parts 3 and 4, the authors are John W. Du Bois and Robert Englebretson.

  9. N

    England, AR Hispanic or Latino Population Distribution by Ancestries Dataset...

    • neilsberg.com
    csv, json
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). England, AR Hispanic or Latino Population Distribution by Ancestries Dataset : Detailed Breakdown of Hispanic or Latino Origins // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/england-ar-population-by-race/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    England, Arkansas
    Variables measured
    Hispanic or Latino population with Cuban ancestry, Hispanic or Latino population with Mexican ancestry, Hispanic or Latino population with Puerto Rican ancestry, Hispanic or Latino population with Other Hispanic or Latino ancestry, Hispanic or Latino population with Cuban ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Mexican ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Puerto Rican ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Other Hispanic or Latino ancestry as Percent of Total Hispanic Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) Origin / Ancestry for Hispanic population and (b) respective population as a percentage of the total Hispanic population, we initially analyzed and categorized the data for each of the ancestries across the Hispanic or Latino population. It is ensured that the population estimates used in this dataset pertain exclusively to ancestries for the Hispanic or Latino population. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the England Hispanic or Latino population. It includes the distribution of the Hispanic or Latino population, of England, by their ancestries, as identified by the Census Bureau. The dataset can be utilized to understand the origin of the Hispanic or Latino population of England.

    Key observations

    Among the Hispanic population in England, regardless of the race, the largest group is of Mexican origin, with a population of 116 (100% of the total Hispanic population).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Origin for Hispanic or Latino population include:

    • Mexican
    • Puerto Rican
    • Cuban
    • Other Hispanic or Latino

    Variables / Data Columns

    • Origin: This column displays the origin for Hispanic or Latino population for the England
    • Population: The population of the specific origin for Hispanic or Latino population in the England is shown in this column.
    • % of Total Hispanic Population: This column displays the percentage distribution of each Hispanic origin as a proportion of England total Hispanic or Latino population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for England Population by Race & Ethnicity. You can refer the same here

  10. 2013 American Community Survey - Table Packages: Detailed Language Spoken in...

    • catalog.data.gov
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2023). 2013 American Community Survey - Table Packages: Detailed Language Spoken in the U.S. [Dataset]. https://catalog.data.gov/dataset/2013-american-community-survey-table-packages-detailed-language-spoken-in-the-u-s
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    United States
    Description

    This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.

  11. 2020 American Community Survey: B16005G | NATIVITY BY LANGUAGE SPOKEN AT...

    • data.census.gov
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS, 2020 American Community Survey: B16005G | NATIVITY BY LANGUAGE SPOKEN AT HOME BY ABILITY TO SPEAK ENGLISH FOR THE POPULATION 5 YEARS AND OVER (TWO OR MORE RACES) (ACS 5-Year Estimates Detailed Tables) [Dataset]. https://data.census.gov/table/ACSDT5Y2020.B16005G
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2020
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, for 2020, the 2020 Census provides the official counts of the population and housing units for the nation, states, counties, cities, and towns. For 2016 to 2019, the Population Estimates Program provides estimates of the population for the nation, states, counties, cities, and towns and intercensal housing unit estimates for the nation, states, and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2016-2020 American Community Survey 5-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..For information on definitions of the OMB-defined racial classifications, see the "Race" and "Race Concepts" sections of the American Community Survey and Puerto Rico Community Survey 2019 Subject Definitions document at https://www2.census.gov/programs-surveys/acs/tech_docs/subject_definitions/2019_ACSSubjectDefinitions.pdf..The Hispanic origin and race codes were updated in 2020. For more information on the Hispanic origin and race code changes, please visit the American Community Survey Technical Documentation website..The 2016-2020 American Community Survey (ACS) data generally reflect the September 2018 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances, the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineation lists due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.

  12. P

    CALLHOME American English Speech Dataset

    • paperswithcode.com
    Updated Mar 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). CALLHOME American English Speech Dataset [Dataset]. https://paperswithcode.com/dataset/callhome-american-english-speech
    Explore at:
    Dataset updated
    Mar 12, 2024
    Area covered
    United States
    Description

    The CALLHOME English Corpus is a collection of unscripted telephone conversations between native speakers of English. Here are the key details:

    Participants: 120 individuals. Type of Study: Naturalistic. Location: USA. Media Type: Audio. DOI: doi:10.21415/T5KP54. Contents: The corpus contains 120 telephone conversations, each lasting up to 30 minutes. Origins: All calls originated in North America, with 90 calls placed to various locations overseas and 30 calls within North America. Transcripts: The transcripts cover contiguous 5 or 10-minute segments from recorded conversations. Speaker Awareness: All speakers were aware that they were being recorded. Topics: Participants had no guidelines on what to talk about; most called family members or close friends overseas. Purpose: Collected primarily to support the project on Large Vocabulary Conversational Speech Recognition (LVCSR), sponsored by the U.S. Department of Defense.

  13. Common languages used for web content 2025, by share of websites

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  14. c

    Global English Proficiency Test market size is USD 2965.5 million in 2024.

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jul 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2024). Global English Proficiency Test market size is USD 2965.5 million in 2024. [Dataset]. https://www.cognitivemarketresearch.com/english-proficiency-test-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jul 20, 2024
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global English Proficiency Test market size is USD 2965.5 million in 2024. It will expand at a compound annual growth rate (CAGR) of 9.70% from 2024 to 2031.

    North America held the major market share for more than 40% of the global revenue with a market size of USD 1186.20 million in 2024 and will grow at a compound annual growth rate (CAGR) of 7.9% from 2024 to 2031.
    Europe accounted for a market share of over 30% of the global revenue with a market size of USD 889.65 million.
    Asia Pacific held a market share of around 23% of the global revenue with a market size of USD 682.07 million in 2024 and will grow at a compound annual growth rate (CAGR) of 11.7% from 2024 to 2031.
    Latin America had a market share for more than 5% of the global revenue with a market size of USD 148.28 million in 2024 and will grow at a compound annual growth rate (CAGR) of 9.1% from 2024 to 2031.
    Middle East and Africa hada market share of around 2% of the global revenue and was estimated at a market size of USD 59.31 million in 2024 and will grow at a compound annual growth rate (CAGR) of 9.4% from 2024 to 2031.
    Employers category is experiencing the fastest growth in the English Proficiency Test Market. The rising trend of multinational companies and the global nature of business operations necessitate a workforce proficient in English.
    

    Market Dynamics of English Proficiency Test Market

    Key Drivers for English Proficiency Test Market

    Increasing Global Mobility of Students and Professionals to Increase the Demand Globally
    

    The increasing global mobility of students and professionals is a significant driver in the English Proficiency Test Market. As educational institutions worldwide, particularly in English-speaking countries, attract a growing number of international students, the need for standardized English proficiency assessments becomes critical. Similarly, professionals seeking employment opportunities in multinational corporations or pursuing career advancements in global markets must demonstrate their English language capabilities. This trend is fueled by globalization and the widespread recognition of English as the lingua franca of business, academia, and technology, thereby boosting the demand for reliable and comprehensive English proficiency tests.

    Rising Demand for English in Non-English Speaking Regions to Propel Market Growth
    

    The rising demand for English language skills in non-English speaking regions is another crucial driver of the English Proficiency Test Market. As countries in Asia, Latin America, and Europe increasingly integrate into the global economy, proficiency in English becomes a valuable asset for individuals and businesses. Governments and educational systems in these regions are incorporating English language education into their curricula, and companies are investing in language training for their employees to enhance competitiveness. This growing emphasis on English proficiency is creating substantial opportunities for test providers to expand their offerings and cater to a broader audience, further propelling market growth.

    Restraint Factor for the English Proficiency Test Market

    High Cost of Test Preparation and Registration Fees to Limit the Sales
    

    A significant restraint in the English Proficiency Test Market is the high cost of test preparation and registration fees. Many potential test-takers, especially students and professionals from developing countries, find these costs prohibitive. The expense of preparatory courses, study materials, and the tests themselves can deter individuals from taking the exams, limiting their opportunities for education and employment in English-speaking regions. This financial barrier not only affects individuals but also impacts the overall market growth, as it reduces the number of people who can afford to demonstrate their English proficiency through standardized tests.

    Limited Accessibility in Rural and Remote Regions
    

    One key restraint in the English proficiency test market is the limited availability of authorized test centers and digital infrastructure in rural and remote areas. Many prospective candidates, especially in developing countries, face challenges in reaching testing locations or accessing reliable internet for online examinations. This restricts participation and limits the market’s growth potential in under...

  15. A

    Data from: Audiovisual Database of Spoken American English

    • abacus.library.ubc.ca
    iso, txt
    Updated Mar 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abacus Data Network (2022). Audiovisual Database of Spoken American English [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml?persistentId=hdl:11272.1/AB2/8KIBXB
    Explore at:
    iso(7945883648), txt(1308)Available download formats
    Dataset updated
    Mar 18, 2022
    Dataset provided by
    Abacus Data Network
    Area covered
    United States
    Description

    AbstractIntroduction The Audiovisual Database of Spoken American English, Linguistic Data Consortium (LDC) catalog number LDC2009V01 and isbn 1-58563-496-4, was developed at Butler University, Indianapolis, IN in 2007 for use by a a variety of researchers to evaluate speech production and speech recognition. It contains approximately seven hours of audiovisual recordings of fourteen American English speakers producing syllables, word lists and sentences used in both academic and clinical settings. All talkers were from the North Midland dialect region -- roughly defined as Indianapolis and north within the state of Indiana -- and had lived in that region for the majority of the time from birth to 18 years of age. Each participant read 238 different words and 166 different sentences. The sentences spoken were drawn from the following sources: Central Institute for the Deaf (CID) Everyday Sentences (Lists A-J) Northwestern University Auditory Test No. 6 (Lists I-IV) Vowels in /hVd/ context (separate words) Texas Instruments/Massachusetts Institute for Technology (TIMIT) sentences The CID Everyday Sentences were created in the 1950s from a sample developed by the Armed Forces National Research Committee on Hearing and Bio-Acoustics. They are considered to represent everyday American speech and have the following characteristics: the vocabulary is appropriate to adults; the words appear with high frequency in one or more of the well-known word counts of the English language; proper names and proper nouns are not used; common non-slang idioms and contractions are used freely; phonetic loading and "tongue-twisting" are avoided; redundancy is high; the level of abstraction is low; and grammatical structure varies freely. Northwestern University Auditory Test No. 6 is a phonemically-balanced set of monosyllabic English words used clinically to test speech perception in adults with hearing loss. The /hVd/ vowel list was created to elicit all of the vowel sounds of American English. The TIMIT sentences are a subset (34 sentences) of the 2342 phonetically-rich sentences read by speakers in the TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. TIMIT was designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT speakers were from eight dialect regions of the United States. The Audiovisual Database of Spoken American English will be of interest in various disciplines: to linguists for studies of phonetics, phonology, and prosody of American English; to speech scientists for investigations of motor speech production and auditory-visual speech perception; to engineers and computer scientists for investigations of machine audio-visual speech recognition (AVSR); and to speech and hearing scientists for clinical purposes, such as the examination and improvement of speech perception by listeners with hearing loss. Data Participants were recorded individually during a single session. A participant first completed a statement of informed consent and a questionnaire to gather biographical data and then was asked by the experimenter to mark his or her Indiana hometown on a state map. The experimenter and participant then moved to a small, sound-treated studio where the participant was seated in front of three navy blue baffles. A laptop computer was elevated to eye-level on a speaker stand and placed approximately 50-60 cm in front of the participant. Prompts were presented to the participant in a Microsoft PowerPoint presentation. The experimenter was seated directly next to the participant, but outside the camera angle, and advanced the PowerPoint slides at a comfortable pace. Participants were recorded with a Panasonic DVC-80 digital video camera to miniDV digital video cassette tapes. All participants wore a Sennheiser MKE-2060 directional/cardioid lapel microphone throughout the recordings. Each speaker produced a total of 94 segmented files which were converted from Final Cut Express to Quicktime (.mov) files and then saved in the appropriately marked folder. If a speaker mispronounced a sentence or word during the recording process, the mispronunciations were edited out of the segments to be archived. The remaining parts of the recording, including the correct repetition of each prompt, were then sequenced together to create a continuous and complete segment. The fourteen participants were between 19 and 61 years of age (with a mean age of 30 years) and native speakers of American English.

  16. a

    Languages and English Ability - Seattle Neighborhoods

    • data-seattlecitygis.opendata.arcgis.com
    • data.seattle.gov
    • +4more
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Seattle ArcGIS Online (2024). Languages and English Ability - Seattle Neighborhoods [Dataset]. https://data-seattlecitygis.opendata.arcgis.com/datasets/SeattleCityGIS::languages-and-english-ability-seattle-neighborhoods
    Explore at:
    Dataset updated
    Feb 22, 2024
    Dataset authored and provided by
    City of Seattle ArcGIS Online
    Area covered
    Seattle
    Description

    Table from the American Community Survey (ACS) 5-year series on languages spoken and English ability related topics for City of Seattle Council Districts, Comprehensive Plan Growth Areas and Community Reporting Areas. Table includes B16004 Age by Language Spoken at Home by Ability to Speak English, C16002 Household Language by Household Limited English-Speaking Status. Data is pulled from block group tables for the most recent ACS vintage and summarized to the neighborhoods based on block group assignment.Table created for and used in the Neighborhood Profiles application.Vintages: 2023ACS Table(s): B16004, C16002Data downloaded from: Census Bureau's Explore Census Data The United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2020 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.

  17. 2023 American Community Survey: B16002 | Detailed Household Language by...

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS, 2023 American Community Survey: B16002 | Detailed Household Language by Household Limited English Speaking Status (ACS 1-Year Estimates Detailed Tables) [Dataset]. https://data.census.gov/table/ACSDT1Y2023.B16002?q=arabic%26
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2023
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units and the group quarters population for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2023 American Community Survey 1-Year Estimates.ACS data generally reflect the geographic boundaries of legal and statistical areas as of January 1 of the estimate year. For more information, see Geography Boundaries by Year..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Users must consider potential differences in geographic boundaries, questionnaire content or coding, or other methodological issues when comparing ACS data from different years. Statistically significant differences shown in ACS Comparison Profiles, or in data users' own analysis, may be the result of these differences and thus might not necessarily reflect changes to the social, economic, housing, or demographic characteristics being compared. For more information, see Comparing ACS Data..A "limited English speaking household" is one in which no member 14 years old and over (1) speaks only English or (2) speaks a non-English language and speaks English "very well." In other words, all members 14 years old and over have at least some difficulty with English. By definition, English-only households cannot belong to this group. Previous Census Bureau data products have referred to these households as "linguistically isolated" and "Households in which no one 14 and over speaks English only or speaks a language other than English at home and speaks English 'very well'." This table is directly comparable to tables from earlier years that used these labels..The household language assigned to the housing unit is the non-English language spoken by the first person with a non-English language in the following order: reference person, spouse, parent, sibling, child, grandchild, in-law, other relative, unmarried partner, housemate/roommate, roomer/boarder, foster child, or other nonrelative. If no member of the household age 5 and over speaks a language other than English at home then the household language is English only..In 2016, changes were made to the languages and language categories presented in tables B16001, C16001, and B16002. For more information, see: 2016 Language Data User note..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample obser...

  18. F

    American English Call Center Data for Realestate AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). American English Call Center Data for Realestate AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-english-usa
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    United States
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This US English Call Center Speech Dataset for the Real Estate industry is purpose-built to accelerate the development of speech recognition, spoken language understanding, and conversational AI systems tailored for English -speaking Real Estate customers. With over 30 hours of unscripted, real-world audio, this dataset captures authentic conversations between customers and real estate agents ideal for building robust ASR models.

    Curated by FutureBeeAI, this dataset equips voice AI developers, real estate tech platforms, and NLP researchers with the data needed to create high-accuracy, production-ready models for property-focused use cases.

    Speech Data

    The dataset features 30 hours of dual-channel call center recordings between native US English speakers. Captured in realistic real estate consultation and support contexts, these conversations span a wide array of property-related topics from inquiries to investment advice offering deep domain coverage for AI model development.

    Participant Diversity:
    Speakers: 60 native US English speakers from our verified contributor community.
    Regions: Representing different provinces across United States of America to ensure accent and dialect variation.
    Participant Profile: Balanced gender mix (60% male, 40% female) and age range from 18 to 70.
    Recording Details:
    Conversation Nature: Naturally flowing, unscripted agent-customer discussions.
    Call Duration: Average 5–15 minutes per call.
    Audio Format: Stereo WAV, 16-bit, recorded at 8kHz and 16kHz.
    Recording Environment: Captured in noise-free and echo-free conditions.

    Topic Diversity

    This speech corpus includes both inbound and outbound calls, featuring positive, neutral, and negative outcomes across a wide range of real estate scenarios.

    Inbound Calls:
    Property Inquiries
    Rental Availability
    Renovation Consultation
    Property Features & Amenities
    Investment Property Evaluation
    Ownership History & Legal Info, and more
    Outbound Calls:
    New Listing Notifications
    Post-Purchase Follow-ups
    Property Recommendations
    Value Updates
    Customer Satisfaction Surveys, and others

    Such domain-rich variety ensures model generalization across common real estate support conversations.

    Transcription

    All recordings are accompanied by precise, manually verified transcriptions in JSON format.

    Transcription Includes:
    Speaker-Segmented Dialogues
    Time-coded Segments
    Non-speech Tags (e.g., background noise, pauses)
    High transcription accuracy with word error rate below 5% via dual-layer human review.

    These transcriptions streamline ASR and NLP development for English real estate voice applications.

    Metadata

    Detailed metadata accompanies each participant and conversation:

    Participant Metadata: ID, age, gender, location, accent, and dialect.
    Conversation Metadata: Topic, call type, sentiment, sample rate, and technical details.

    This enables smart filtering, dialect-focused model training, and structured dataset exploration.

    Usage and Applications

    This dataset is ideal for voice AI and NLP systems built for the real estate sector:

    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  19. A

    CALLFRIEND American English-Southern Dialect Second Edition

    • abacus.library.ubc.ca
    iso, txt
    Updated Mar 18, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abacus Data Network (2022). CALLFRIEND American English-Southern Dialect Second Edition [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml?persistentId=hdl:11272.1/AB2/O0EZK5
    Explore at:
    iso(2999220224), txt(1308)Available download formats
    Dataset updated
    Mar 18, 2022
    Dataset provided by
    Abacus Data Network
    Area covered
    United States
    Description

    AbstractIntroduction CALLFRIEND American English-Southern Dialect Second Edition was developed by LDC and consists of approximately 26 hours of unscripted telephone conversations between native speakers of Southern dialects of American English. This second edition updates the audio files to wav format, simplifies the directory structure and adds documentation and metadata. The first edition is available as CALLFRIEND American English-Southern Dialect (LDC96S47). The CALLFRIEND series is a collection of telephone conversations in several languages conducted by LDC in support of language identification technology development. Languages covered in the collection include American English, Canadian French, Egyptian Arabic, Farsi, German, Hindi, Japanese, Korean, Mandarin Chinese, Spanish, Tamil and Vietnamese. Data All data was collected before July 1997. Participants could speak with a person of their choice on any topic; most called family members and friends. All calls originated in North America. The recorded conversations last up to 30 minutes. The data was recorded as 8kHz u-law SPH encoded stereo files, with one end of the phone call on each channel. In this release, files were converted to WAV format, and information from the original SPH headers is described in the documentation. SPH files are not included in this second edition. The audio files were originally split into train, dev and test folders of 20 recordings each, but they are combined in this release. Completed calls passed through a human auditing process to verify that the target language was spoken by the participants, to check the quality of the recordings, and to record information about dialect, noise and distortion.

  20. 2023 American Community Survey: B16001 | Language Spoken at Home by Ability...

    • data.census.gov
    Updated Sep 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS (2024). 2023 American Community Survey: B16001 | Language Spoken at Home by Ability to Speak English for the Population 5 Years and Over (ACS 5-Year Estimates Detailed Tables) [Dataset]. https://data.census.gov/all/tables?q=B16001&g=040XX00US50
    Explore at:
    Dataset updated
    Sep 12, 2024
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2023
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units and the group quarters population for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2019-2023 American Community Survey 5-Year Estimates.ACS data generally reflect the geographic boundaries of legal and statistical areas as of January 1 of the estimate year. For more information, see Geography Boundaries by Year..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Users must consider potential differences in geographic boundaries, questionnaire content or coding, or other methodological issues when comparing ACS data from different years. Statistically significant differences shown in ACS Comparison Profiles, or in data users' own analysis, may be the result of these differences and thus might not necessarily reflect changes to the social, economic, housing, or demographic characteristics being compared. For more information, see Comparing ACS Data..In 2016, changes were made to the languages and language categories presented in tables B16001, C16001, and B16002. For more information, see: 2016 Language Data User note..Geographical restrictions have been applied to Table B16001 - LANGUAGE SPOKEN AT HOME BY ABILITY TO SPEAK ENGLISH FOR THE POPULATION 5 YEARS AND OVER for the 5-year data estimates. These restrictions are in place to protect data privacy for the speakers of smaller languages. Geographic areas published for the 5-year B16001 table include: Nation (010), States (040), Metropolitan Statistical Area-Metropolitan Divisions (314), Combined Statistical Areas (330), Congressional Districts (500), and Public Use Microdata Sample Areas (PUMAs) (795). For more information on these geographical delineations, see the Metropolitan Statistical Area Reference Files. County and tract-level data are no longer available for table B16001; for specific language data for these smaller geographies, please use table C16001. Additional languages are also available in the Public Use Microdata Sample (PUMS), at the State and Public Use Microdata Sample Area (PUMA) levels of geography..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin o...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Share of U.S. population speaking a language besides English at home 2023, by state [Dataset]. https://www.statista.com/statistics/312940/share-of-us-population-speaking-a-language-other-than-english-at-home-by-state/
Organization logo

Share of U.S. population speaking a language besides English at home 2023, by state

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description

As of 2023, more than ** percent of people in the United States spoke a language other than English at home. California had the highest share among all U.S. states, with ** percent of its population speaking a language other than English at home.

Search
Clear search
Close search
Google apps
Main menu