100+ datasets found
  1. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  2. Number of native Spanish speakers worldwide 2024, by country

    • statista.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.

  3. Ranking of languages spoken at home in the U.S. 2023

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Ranking of languages spoken at home in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.

  4. O

    2017 San Diego County Demographics - Language Spoken at Home for the...

    • data.sandiegocounty.gov
    application/rdfxml +5
    Updated Feb 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    County of San Diego (2020). 2017 San Diego County Demographics - Language Spoken at Home for the Population 5 Years and Ability to Speak English (Detailed) [Dataset]. https://data.sandiegocounty.gov/Demographics/2017-San-Diego-County-Demographics-Language-Spoken/b7iq-x9dz
    Explore at:
    csv, xml, application/rdfxml, application/rssxml, tsv, jsonAvailable download formats
    Dataset updated
    Feb 22, 2020
    Dataset authored and provided by
    County of San Diego
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    San Diego County
    Description

    Language questions were only asked of persons 5 years and older. The language question is about current use of a non-English language at home, not about ability to speak another language or the use of such a language in the past or elsewhere. People who speak a language other than English outside of the home are not reported as speaking a language other than English. Respondents that spoke a language other than English at home, where also asked whether they could speak English "very well" or less than "very well. See how the Census Bureau measures Language Use for more information at https://www.census.gov/topics/population/language-use/about.html.

    Source: U.S. Census Bureau; 2013-2017 American Community Survey 5-Year Estimates, Table C16001.

  5. 2013 American Community Survey - Table Packages: Detailed Language Spoken in...

    • catalog.data.gov
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2023). 2013 American Community Survey - Table Packages: Detailed Language Spoken in the U.S. [Dataset]. https://catalog.data.gov/dataset/2013-american-community-survey-table-packages-detailed-language-spoken-in-the-u-s
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    United States
    Description

    This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.

  6. Common languages used for web content 2025, by share of websites

    • ai-chatbox.pro
    • statista.com
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.ai-chatbox.pro/?_=%2Fstatistics%2F262946%2Fshare-of-the-most-common-languages-on-the-internet%2F%23XgboD02vawLKoDs%2BT%2BQLIV8B6B4Q9itA
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    World
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  7. a

    Languages and English Ability - Seattle Neighborhoods

    • data-seattlecitygis.opendata.arcgis.com
    • data.seattle.gov
    • +4more
    Updated Feb 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Seattle ArcGIS Online (2024). Languages and English Ability - Seattle Neighborhoods [Dataset]. https://data-seattlecitygis.opendata.arcgis.com/datasets/SeattleCityGIS::languages-and-english-ability-seattle-neighborhoods
    Explore at:
    Dataset updated
    Feb 22, 2024
    Dataset authored and provided by
    City of Seattle ArcGIS Online
    Area covered
    Seattle
    Description

    Table from the American Community Survey (ACS) 5-year series on languages spoken and English ability related topics for City of Seattle Council Districts, Comprehensive Plan Growth Areas and Community Reporting Areas. Table includes B16004 Age by Language Spoken at Home by Ability to Speak English, C16002 Household Language by Household Limited English-Speaking Status. Data is pulled from block group tables for the most recent ACS vintage and summarized to the neighborhoods based on block group assignment.Table created for and used in the Neighborhood Profiles application.Vintages: 2023ACS Table(s): B16004, C16002Data downloaded from: Census Bureau's Explore Census Data The United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2020 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.

  8. a

    Percent Spanish Speakers

    • king-snocoplanning.opendata.arcgis.com
    • hub.arcgis.com
    • +1more
    Updated Aug 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    King County (2016). Percent Spanish Speakers [Dataset]. https://king-snocoplanning.opendata.arcgis.com/datasets/kingcounty::percent-spanish-speakers
    Explore at:
    Dataset updated
    Aug 10, 2016
    Dataset authored and provided by
    King County
    Area covered
    Description

    Languages:Percent Spanish Speakers: Basic demographics by census tracts in King County based on current American Community Survey 5 Year Average (ACS). Included demographics are: total population; foreign born; median household income; English language proficiency; languages spoken; race and ethnicity; sex; and age. Numbers and derived percentages are estimates based on the current year's ACS. GEO_ID_TRT is the key field and may be used to join to other demographic Census data tables.

  9. n

    Data from: Language Spoken at Home

    • linc.osbm.nc.gov
    • ncosbm.opendatasoft.com
    csv, excel, geojson +1
    Updated Oct 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Language Spoken at Home [Dataset]. https://linc.osbm.nc.gov/explore/dataset/language-spoken-at-home/
    Explore at:
    geojson, csv, json, excelAvailable download formats
    Dataset updated
    Oct 3, 2024
    Description

    Language spoken at home and the ability to speak English for the population age 5 and over as reported by the US Census Bureau's, American Community Survey (ACS) 5-year estimates table C16001.

  10. e

    Percent of Population with Limited Ability to Speak English

    • coronavirus-resources.esri.com
    • data.amerigeoss.org
    • +1more
    Updated Jul 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Urban Observatory by Esri (2019). Percent of Population with Limited Ability to Speak English [Dataset]. https://coronavirus-resources.esri.com/maps/78a668915cbc4bf983330608f3d687aa
    Explore at:
    Dataset updated
    Jul 3, 2019
    Dataset authored and provided by
    Urban Observatory by Esri
    Area covered
    Description

    This map shows the percent of population with a limited ability to speak English by census tract. Search to your community and investigate the top language needs in nearby census tracts.*DATA AS OF 2011-2015*Data Source: U.S. Census Bureau's American Community Survey 5-year estimates, 2011-2015, Table B16001.Complete list of all languages available in this data set (29):Spanish or Spanish Creole; French (including Patois, Cajun); French Creole; Italian; Portuguese; German; Yiddish; Greek; Russian; Polish; Serbo-Croatian; Armenian; Persian; Gujarati; Hindi; Urdu; Chinese; Japanese; Korean; Mon-Khmer, Cambodian; Hmong; Thai; Laotian; Vietnamese; Tagalog; Navajo; Hungarian; Arabic; Hebrew. Those who have limited English ability and speak other languages are included in the percentage depicted in the map, but other languages will not appear in the ranked list or in the table.Accompanying feature layer and viewing app are also available.

  11. f

    Data_Sheet_1_Pilot study of a Spanish language measure of financial toxicity...

    • frontiersin.figshare.com
    docx
    Updated Jul 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julia J. Shi; Gwendolyn J. McGinnis; Susan K. Peterson; Nicolette Taku; Ying-Shiuan Chen; Robert K. Yu; Chi-Fang Wu; Tito R. Mendoza; Sanjay S. Shete; Hilary Ma; Robert J. Volk; Sharon H. Giordano; Ya-Chen T. Shih; Diem-Khanh Nguyen; Kelsey W. Kaiser; Grace L. Smith (2023). Data_Sheet_1_Pilot study of a Spanish language measure of financial toxicity in underserved Hispanic cancer patients with low English proficiency.docx [Dataset]. http://doi.org/10.3389/fpsyg.2023.1188783.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 10, 2023
    Dataset provided by
    Frontiers
    Authors
    Julia J. Shi; Gwendolyn J. McGinnis; Susan K. Peterson; Nicolette Taku; Ying-Shiuan Chen; Robert K. Yu; Chi-Fang Wu; Tito R. Mendoza; Sanjay S. Shete; Hilary Ma; Robert J. Volk; Sharon H. Giordano; Ya-Chen T. Shih; Diem-Khanh Nguyen; Kelsey W. Kaiser; Grace L. Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundFinancial toxicity (FT) reflects multi-dimensional personal economic hardships borne by cancer patients. It is unknown whether measures of FT—to date derived largely from English-speakers—adequately capture economic experiences and financial hardships of medically underserved low English proficiency US Hispanic cancer patients. We piloted a Spanish language FT instrument in this population.MethodsWe piloted a Spanish version of the Economic Strain and Resilience in Cancer (ENRICh) FT measure using qualitative cognitive interviews and surveys in un-/under-insured or medically underserved, low English proficiency, Spanish-speaking Hispanics (UN-Spanish, n = 23) receiving ambulatory oncology care at a public healthcare safety net hospital in the Houston metropolitan area. Exploratory analyses compared ENRICh FT scores amongst the UN-Spanish group to: (1) un-/under-insured English-speaking Hispanics (UN-English, n = 23) from the same public facility and (2) insured English-speaking Hispanics (INS-English, n = 31) from an academic comprehensive cancer center. Multivariable logistic models compared the outcome of severe FT (score > 6).ResultsUN-Spanish Hispanic participants reported high acceptability of the instrument (only 0% responded that the instrument was “very difficult to answer” and 4% that it was “very difficult to understand the questions”; 8% responded that it was “very difficult to remember resources used” and 8% that it was “very difficult to remember the burdens experienced”; and 4% responded that it was “very uncomfortable to respond”). Internal consistency of the FT measure was high (Cronbach’s α = 0.906). In qualitative responses, UN-Spanish Hispanics frequently identified a total lack of credit, savings, or income and food insecurity as aspects contributing to FT. UN-Spanish and UN-English Hispanic patients were younger, had lower education and income, resided in socioeconomically deprived neighborhoods and had more advanced cancer vs. INS-English Hispanics. There was a higher likelihood of severe FT in UN-Spanish (OR = 2.73, 95% CI 0.77–9.70; p = 0.12) and UN-English (OR = 4.13, 95% CI 1.13–15.12; p = 0.03) vs. INS-English Hispanics. A higher likelihood of severely depleted FT coping resources occurred in UN-Spanish (OR = 4.00, 95% CI 1.07–14.92; p = 0.04) and UN-English (OR = 5.73, 95% CI 1.49–22.1; p = 0.01) vs. INS-English. The likelihood of FT did not differ between UN-Spanish and UN-English in both models (p = 0.59 and p = 0.62 respectively).ConclusionIn medically underserved, uninsured Hispanic patients with cancer, comprehensive Spanish-language FT assessment in low English proficiency participants was feasible, acceptable, and internally consistent. Future studies employing tailored FT assessment and intervention should encompass the key privations and hardships in this population.

  12. a

    Linguistic Isolation (by Georgia House) 2017

    • opendata.atlantaregional.com
    Updated Jun 26, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgia Association of Regional Commissions (2019). Linguistic Isolation (by Georgia House) 2017 [Dataset]. https://opendata.atlantaregional.com/datasets/linguistic-isolation-by-georgia-house-2017
    Explore at:
    Dataset updated
    Jun 26, 2019
    Dataset provided by
    The Georgia Association of Regional Commissions
    Authors
    Georgia Association of Regional Commissions
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This layer was developed by the Research & Analytics Group of the Atlanta Regional Commission, using data from the U.S. Census Bureau’s American Community Survey 5-year estimates for 2013-2017, to show number and percentage of U.S. population 5 years and older that speaks English less than "very well" and don’t speak English at home by Georgia House in the Atlanta region. The user should note that American Community Survey data represent estimates derived from a surveyed sample of the population, which creates some level of uncertainty, as opposed to an exact measure of the entire population (the full census count is only conducted once every 10 years and does not cover as many detailed characteristics of the population). Therefore, any measure reported by ACS should not be taken as an exact number – this is why a corresponding margin of error (MOE) is also given for ACS measures. The size of the MOE relative to its corresponding estimate value provides an indication of confidence in the accuracy of each estimate. Each MOE is expressed in the same units as its corresponding measure; for example, if the estimate value is expressed as a number, then its MOE will also be a number; if the estimate value is expressed as a percent, then its MOE will also be a percent. The user should also note that for relatively small geographic areas, such as census tracts shown here, ACS only releases combined 5-year estimates, meaning these estimates represent rolling averages of survey results that were collected over a 5-year span (in this case 2013-2017). Therefore, these data do not represent any one specific point in time or even one specific year. For geographic areas with larger populations, 3-year and 1-year estimates are also available. For further explanation of ACS estimates and margin of error, visit Census ACS website. Naming conventions: Prefixes:NoneCountpPercentrRatemMedianaMean (average)tAggregate (total)chChange in absolute terms (value in t2 - value in t1)pchPercent change ((value in t2 - value in t1) / value in t1)chpChange in percent (percent in t2 - percent in t1)Suffixes:NoneChange over two periods_eEstimate from most recent ACS_mMargin of Error from most recent ACS_00Decennial 2000 Attributes:SumLevelSummary level of geographic unit (e.g., County, Tract, NSA, NPU, DSNI, SuperDistrict, etc)GEOIDCensus tract Federal Information Processing Series (FIPS) code NAMEName of geographic unitPlanning_RegionPlanning region designation for ARC purposesAcresTotal area within the tract (in acres)SqMiTotal area within the tract (in square miles)CountyCounty identifier (combination of Federal Information Processing Series (FIPS) codes for state and county)CountyNameCounty NamePop5P_e# Population 5 years and over, 2017Pop5P_m# Population 5 years and over, 2017 (MOE)EnglishOnly_e# Speaks English only, 2017EnglishOnly_m# Speaks English only, 2017 (MOE)pEnglishOnly_e% Speaks English only, 2017pEnglishOnly_m% Speaks English only, 2017 (MOE)NotEnglish_e# Speaks language other than English at home, 2017NotEnglish_m# Speaks language other than English at home, 2017 (MOE)pNotEnglish_e% Speaks language other than English at home, 2017pNotEnglish_m% Speaks language other than English at home, 2017 (MOE)EngLtVeryWell_e# English not spoken at home, speaks English less than 'very well', 2017EngLtVeryWell_m# English not spoken at home, speaks English less than 'very well', 2017 (MOE)pEngLtVeryWell_e% English not spoken at home, speaks English less than 'very well', 2017pEngLtVeryWell_m% English not spoken at home, speaks English less than 'very well', 2017 (MOE)Spanish_e# Speaks Spanish at home, 2017Spanish_m# Speaks Spanish at home, 2017 (MOE)pSpanish_e% Speaks Spanish at home, 2017pSpanish_m% Speaks Spanish at home, 2017 (MOE)SpanishEngLtVeryWell_e# Speaks Spanish at home, speaks English less than 'very well', 2017SpanishEngLtVeryWell_m# Speaks Spanish at home, speaks English less than 'very well', 2017 (MOE)pSpanishEngLtVeryWell_e% Speaks Spanish at home, speaks English less than 'very well', 2017pSpanishEngLtVeryWell_m% Speaks Spanish at home, speaks English less than 'very well', 2017 (MOE)IndoEurNotEnglish_e# Speaks other Indo-European language at home, 2017IndoEurNotEnglish_m# Speaks other Indo-European language at home, 2017 (MOE)pIndoEurNotEnglish_e% Speaks other Indo-European language at home, 2017pIndoEurNotEnglish_m% Speaks other Indo-European language at home, 2017 (MOE)IndoEurEngLtVeryWell_e# Speaks other Indo-European language at home, speaks English less than 'very well', 2017IndoEurEngLtVeryWell_m# Speaks other Indo-European language at home, speaks English less than 'very well', 2017 (MOE)pIndoEurEngLtVeryWell_e% Speaks other Indo-European language at home, speaks English less than 'very well', 2017pIndoEurEngLtVeryWell_m% Speaks other Indo-European language at home, speaks English less than 'very well', 2017 (MOE)AsianNotEnglish_e# Speaks Asian language at home, 2017AsianNotEnglish_m# Speaks Asian language at home, 2017 (MOE)pAsianNotEnglish_e% Speaks Asian language at home, 2017pAsianNotEnglish_m% Speaks Asian language at home, 2017 (MOE)AsianEngLtVeryWell_e# Speaks Asian language at home, speaks English less than 'very well', 2017AsianEngLtVeryWell_m# Speaks Asian language at home, speaks English less than 'very well', 2017 (MOE)pAsianEngLtVeryWell_e% Speaks Asian language at home, speaks English less than 'very well', 2017pAsianEngLtVeryWell_m% Speaks Asian language at home, speaks English less than 'very well', 2017 (MOE)OthLangNotEnglish_e# Speaks other language at home, 2017OthLangNotEnglish_m# Speaks other language at home, 2017 (MOE)pOthLangNotEnglish_e% Speaks other language at home, 2017pOthLangNotEnglish_m% Speaks other language at home, 2017 (MOE)OthLangEngLtVeryWell_e# Speaks other language at home, speaks English less than 'very well', 2017OthLangEngLtVeryWell_m# Speaks other language at home, speaks English less than 'very well', 2017 (MOE)pOthLangEngLtVeryWell_e% Speaks other language at home, speaks English less than 'very well', 2017pOthLangEngLtVeryWell_m% Speaks other language at home, speaks English less than 'very well', 2017 (MOE)last_edited_dateLast date the feature was edited by ARC Source: U.S. Census Bureau, Atlanta Regional CommissionDate: 2013-2017 For additional information, please visit the Census ACS website.

  13. A

    Data from: Hispanic-English Database

    • abacus.library.ubc.ca
    iso, txt
    Updated Nov 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abacus Data Network (2022). Hispanic-English Database [Dataset]. https://abacus.library.ubc.ca/dataset.xhtml;jsessionid=719087385d798ea8ac94b0e3997f?persistentId=hdl%3A11272.1%2FAB2%2FIIJZCH&version=&q=&fileTypeGroupFacet=%22Text%22&fileAccess=
    Explore at:
    txt(1308), iso(3087785984)Available download formats
    Dataset updated
    Nov 30, 2022
    Dataset provided by
    Abacus Data Network
    Description

    AbstractIntroduction Hispanic-English Database contains approximately 30 hours of English and Spanish conversational and read speech with transcripts (24 hours) and metadata collected from 22 non-native English speakers between 1996 and 1998. The corpus was developed by Entropic Research Laboratory, Inc., a developer of speech recognition and speech synthesis software toolkits that was acquired by Microsoft in 1999. Participants were adult native speakers of Spanish as spoken in Central America and South America who resided in the Palo Alto, California area, had lived in the United States for at least one year and demonstrated a basic ability to understand, read and speak English. They read a total of 2200 sentences, 50 each in Spanish and English per speaker. The Spanish sentence prompts were a subset of the materials in LATINO-40 Spanish Read News, and the English sentence prompts were taken from the TIMIT database. Conversations were task-oriented, drawing on exercises similar to those used in English second language instruction and designed to engage the speakers in collaborative, problem-solving activities. Data Read speech was recorded on two wideband channels with a Shure SM10A head-mounted microphone in a quiet laboratory environment. The conversational speech was simultaneously recorded on four channels, two of which were used to place phone calls to each subject in two separate offices and to record the incoming speech of the two channels into separate files. The audio was originally saved under the Entropic Audio (ESPS) format using a 16kHz sampling rate and 16 bit samples. Audio files were converted to flac compressed .wav files from the ESPS format. ESPS headers were removed and are presented in this release as *.hdr files that include demographic and technical data. Transcripts were developed with the Entropic Annotator tool and are time-aligned with speaker turns. The transcription conventions were based on those used in the LDC Switchboard and CALLHOME collections. Transcript files are denoted with a .lab extension. Data files and their corresponding label files are stored in subdirectories named using a speaker-pair id and session number. The first three letters identify the speaker on channel A. The last three letters identify the speaker on channel B. Wideband audio files contain *.wb.flac in their file name, and narrow band audio files are denoted with a *.nb.flac in the file name.

  14. Spanish Language Datasets | 1.8M+ Sentences | Translation Data | TTS |...

    • datarade.ai
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Languages (2025). Spanish Language Datasets | 1.8M+ Sentences | Translation Data | TTS | Dictionary Display | Translations | EU & LATAM Coverage [Dataset]. https://datarade.ai/data-products/spanish-language-datasets-1-8m-sentences-nlp-tts-dic-oxford-languages
    Explore at:
    .csv, .json, .mp3, .txt, .wav, .xls, .xmlAvailable download formats
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Oxford Languageshttps://www.lexico.com/
    Area covered
    Panama, Ecuador, Costa Rica, Honduras, Colombia, Chile, Bolivia (Plurinational State of), Nicaragua, Paraguay, Cuba
    Description

    Our Spanish language datasets are carefully compiled and annotated by language and linguistic experts; you can find them available for licensing:

    1. Spanish Monolingual Dictionary Data
    2. Spanish Bilingual Dictionary Data
    3. Spanish Sentences Data
    4. Synonyms and Antonyms Data
    5. Audio Data
    6. Word list Data

    Key Features (approximate numbers):

    1. Spanish Monolingual Dictionary Data

    Our Spanish monolingual reliably offers clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Spanish language.

    • Headwords: 73,000
    • Senses: 123,000
    • Sentence examples: 104,000
    • Format: XML and JSON formats
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually
    1. Spanish Bilingual Dictionary Data

    The bilingual data provides translations in both directions, from English to Spanish and from Spanish to English. It is annually reviewed and updated by our in-house team of language experts. Offers significant coverage of the language, providing a large volume of translated words of excellent quality.

    • Translations: 221,300
    • Senses: 103,500
    • Example sentences: 74,500
    • Example translations: 83,800
    • Format: XML and JSON formats
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually
    1. Spanish Sentences Data

    Spanish sentences retrieved from the corpus are ideal for NLP model training, presenting approximately 20 million words. The sentences provide a great coverage of Spanish-speaking countries and are accordingly tagged to a particular country or dialect.

    • Sentences volume: 1,840,000
    • Format: XML and JSON format
    • Delivery: Email (link-based file sharing) and REST API
    1. Spanish Synonyms and Antonyms Data

    This Spanish language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for building linguistically aware AI systems and language technologies.

    • Synonyms: 127,700
    • Antonyms: 9,500
    • Format: XML format
    • Delivery: Email (link-based file sharing)
    • Updated frequency: annually
    1. Spanish Audio Data (word-level)

    Curated word-level audio data for the Spanish language, which covers all varieties of world Spanish, providing rich dialectal diversity in the Spanish language.

    • Audio files: 20,900
    • Format: XLSX (for index), MP3 and WAV (audio files)
    1. Spanish Word List Data

    This language data contains a carefully curated and comprehensive list of 450,000 Spanish words.

    • Wordforms: 450,000
    • Format: CSV and TXT formats
    • Delivery: Email (link-based file sharing)

    Use Cases:

    We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD).

    If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Oxford.Languages@oup.com to start the conversation.

    Pricing:

    Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

    Contact our team or email us at Oxford.Languages@oup.com to explore pricing options and discover how our language data can support your goals.

  15. e

    Spanish-English website parallel corpus

    • data.europa.eu
    • live.european-language-grid.eu
    zip
    Updated Dec 14, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Directorate-General for Communications Networks, Content and Technology (2017). Spanish-English website parallel corpus [Dataset]. https://data.europa.eu/data/datasets/elrc_339?locale=en
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 14, 2017
    Dataset authored and provided by
    Directorate-General for Communications Networks, Content and Technology
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a parallel corpus of bilingual texts crawled from multilingual websites, which contains 21,007 TUs. Period of crawling : 15/11/2016 - 23/01/2017 A strict validation process has been followed, which resulted in discarding: - TUs from crawled websites that do not comply to the PSI directive, - TUs with more than 99% of mispelled tokens, - TUs identified during the manual validation process and all the TUs from websites which error rate in the sample extracted for manual validation is strictly above the following thresholds: 50% of TUs with language identification errors, 50% of TUs with alignment errors, 50% of TUs with tokenization errors, 20% of TUs identified as machine translated content, 50% of TUs with translation errors.

    This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) actions SMART 2014/1074 and SMART 2015/1091. For further information on the project: http://lr-coordination.eu.

  16. Code-switching in bilingual children with DLD (Gross & Castilla-Earls, 2023)...

    • asha.figshare.com
    pdf
    Updated Oct 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan C. Gross; Anny Castilla-Earls (2023). Code-switching in bilingual children with DLD (Gross & Castilla-Earls, 2023) [Dataset]. http://doi.org/10.23641/asha.23479574.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 4, 2023
    Dataset provided by
    American Speech–Language–Hearing Association
    Authors
    Megan C. Gross; Anny Castilla-Earls
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Purpose: This study examined the frequency, direction, and structural characteristics of code-switching (CS) during narratives by Spanish–English bilingual children with and without developmental language disorder (DLD) to determine whether children with DLD exhibit unique features in their CS that may inform clinical decision-making.Method: Spanish–English bilingual children, aged 4;0–6;11 (years;months), with DLD (n = 33) and with typical language development (TLD; n = 33) participated in narrative retell and story generation tasks in Spanish and English. Instances of CS were classified as between utterance or within utterance; within-utterance CS was coded for type of grammatical structure. Children completed the morphosyntax subtests of the Bilingual English-Spanish Assessment to assist in identifying DLD and to index Spanish and English morphosyntactic proficiency.Results: In analyses examining the contributions of both DLD status and Spanish and English proficiency, the only significant effect of DLD was on the tendency to engage in between-utterance CS; children with DLD were more likely than TLD peers to produce whole utterances in English during the Spanish narrative task. Within-utterance CS was related to lower morphosyntax scores in the target language, but there was no effect of DLD. Both groups exhibited noun insertions as the most frequent type of within-utterance CS. However, children with DLD tended to exhibit more determiner and verb insertions than TLD peers and increased use of “congruent lexicalization,” that is, CS utterances that integrate content and function words from both languages.Conclusions: These findings reinforce that use of CS, particularly within-utterance CS, is a typical bilingual behavior even during narrative samples collected in a single-language context. However, language difficulties associated with DLD may emerge in how children code-switch, including use of between-utterance CS and unique patterns during within-utterance CS. Therefore, analyzing CS patterns may contribute to a more complete profile of children’s dual-language skills during assessment.Supplemental Material S1. Participant characteristics by site.Supplemental Material S2. Spearman-rho correlation tables.Supplemental Material S3. Examples of insertion types.Gross, M. C., & Castilla-Earls, A. (2023). Code-switching during narratives by bilingual children with and without developmental language disorder. Language, Speech, and Hearing Services in Schools, 54(3), 996–1019. https://doi.org/10.1044/2023_LSHSS-22-00149

  17. n

    388 Hours - Spanish Speaking English Speech Data by Mobile Phone

    • nexdata.ai
    Updated Oct 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 388 Hours - Spanish Speaking English Speech Data by Mobile Phone [Dataset]. https://www.nexdata.ai/datasets/speechrecog/990
    Explore at:
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Nexdata
    nexdata technology inc
    Authors
    Nexdata
    Variables measured
    Format, Country, Speaker, Language, Accuracy Rate, Content category, Recording device, Recording condition, Features of annotation
    Description

    English(Spain) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(891 people in total), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  18. F

    English-Spanish Parallel Corpus for the Medical Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English-Spanish Parallel Corpus for the Medical Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/spanish-english-translated-parallel-corpus-for-medical-domain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The English-Spanish Medical Parallel Corpus is a professionally curated bilingual dataset designed to support the development of language models, translation systems, and NLP applications in the healthcare and medical sectors. With over 50,000 sentence pairs covering a wide range of medical topics, this dataset serves as a powerful resource for improving multilingual AI systems in one of the most critical domains like healthcare.

    Dataset Content

    Volume and Translator Diversity
    Sentence Count: 50,000+ parallel sentences
    Translator Base: Contributions from over 200 native Spanish translators with subject matter familiarity
    Data Origin: All content is purpose-built and translation-ready, developed specifically for machine learning applications
    Sentence Diversity
    Length Range: Sentences range from 7 to 25 words
    Structural Variety: Includes simple, compound, and complex sentence structures
    Form Types: Covers questions, commands, affirmations, and negations
    Voice: Balanced inclusion of both active and passive constructions
    Bi-directional Translation: Includes both English-to-Spanish and Spanish-to-English sentence sets to enhance model performance in both directions
    Domain-relevant metaphors, idioms, and phrases
    Logical flow supported by a rich use of discourse markers and connectors

    Medical Domain Specifics

    Terminology Coverage

    The dataset reflects real-world terminology from across the medical field, including:

    Anatomy and physiology
    Diseases and symptoms
    Diagnosis and treatment protocols
    Pharmaceutical and drug-related terminology
    Medical devices, procedures, and administrative documentation
    Real-World Contexts

    This corpus features data drawn from various healthcare settings and content types such as:

    Patient-doctor dialogues and telehealth interactions
    Diagnosis summaries and treatment plans
    Clinical notes and discharge instructions
    Medical research abstracts and journal-style excerpts
    Drug descriptions, usage guidelines, and safety instructions
    Hospital policy and consent-related materials
    Informational content around wellness, supplements, and preventive care
    Cross-Domain Elements

    In addition to core medical language, the dataset also includes related content from:

    Healthtech and medical devices
    Wellness and self-care
    Nutrition and lifestyle medicine

    Format and Structure

    Available Formats: Delivered in Excel, with optional conversions to JSON, TMX, XML, XLIFF, or other localization-ready formats
    Fields Included:
    Serial Number
    Unique ID
    Source Sentence
    Source Word Count
    Target Sentence
    Target Word Count

    Applications and Use Cases

    <div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px;

  19. Preferred Language Spoken in California Facilities

    • healthdata.gov
    • data.chhs.ca.gov
    • +1more
    application/rdfxml +5
    Updated Apr 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chhs.data.ca.gov (2025). Preferred Language Spoken in California Facilities [Dataset]. https://healthdata.gov/State/Preferred-Language-Spoken-in-California-Facilities/e5vw-v44b
    Explore at:
    csv, application/rssxml, json, application/rdfxml, xml, tsvAvailable download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    chhs.data.ca.gov
    Area covered
    California
    Description

    The dataset contains combined counts for hospital discharges, emergency room encounters, and ambulatory surgeries by preferred language spoken at each facility. The nearly 100 languages collected in the patient-level data were combined into eight geographical or cultural groups: English Language, Spanish Language, Asian/Pacific Islander Languages, Middle Eastern Languages, European Languages, African Languages, Latin American Languages, Native American Languages, and Sign Language. See the Preferred Language Spoken Language List below to see the exact separation of languages.

  20. Z

    Data from: English and Spanish Vowel Formants from DIAPIX-FL

    • data.niaid.nih.gov
    • research.science.eus
    • +2more
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Garcia Lecumberri, Maria Luisa (2024). English and Spanish Vowel Formants from DIAPIX-FL [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14411924
    Explore at:
    Dataset updated
    Dec 12, 2024
    Dataset provided by
    Garcia Lecumberri, Maria Luisa
    Cooke, Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data table contains acoustic parameters for each vowel (of length at least 30ms) from the DIAPIX-FL corpus of first and second language task-oriented speech spoken by Spanish and English talkers, available at https://datashare.ed.ac.uk/handle/10283/346

    Each row corresponds to a single vowel and contains the following columns:

    l1: L1 of the speaker (either En or Sp)

    speaking: language that is being spoken (either En or Sp)

    speaker: anonymised speaker identifier

    frag: which DIAPIX-FL TCU this corresponds to

    nwords: number of words in TCU

    vowel: MRPA symbol corresponding to the vowel

    word: word from which the vowel was extracted

    length: duration of vowel in units of 10ms frames

    en: Praat estimate of median energy

    f0: estimated median fundamental frequency

    f1: estimated median first formant frequency (f1)

    f2: estimated median second formant frequency (f2)

    f3: estimated median third formant frequency (f3)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Organization logo

The most spoken languages worldwide 2025

Explore at:
419 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description

In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

Search
Clear search
Close search
Google apps
Main menu