97 datasets found
  1. The most spoken languages worldwide 2023

    • statista.com
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2023 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Jan 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2022
    Area covered
    World
    Description

    In 2023, there were around 1.5 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.1 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year.

    Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation and other official pronouncements. The United States is a land of immigrations and the languages spoken in the United States vary as a result of the multi-cultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over 41 million people spoke at home in 2021. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.7 million Tagalog speakers and 1.5 million Vietnamese speakers counted in the United States that year.

    Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 44 percent of California’s population was speaking a language other than English at home in 2021.

  2. 2013 American Community Survey - Table Packages: Detailed Language Spoken in...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2023). 2013 American Community Survey - Table Packages: Detailed Language Spoken in the U.S. [Dataset]. https://catalog.data.gov/dataset/2013-american-community-survey-table-packages-detailed-language-spoken-in-the-u-s
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    United States
    Description

    This data set uses the 2009-2013 American Community Survey to tabulate the number of speakers of languages spoken at home and the number of speakers of each language who speak English less than very well. These tabulations are available for the following geographies: nation; each of the 50 states, plus Washington, D.C. and Puerto Rico; counties with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish; core-based statistical areas (metropolitan statistical areas and micropolitan statistical areas) with 100,000 or more total population and 25,000 or more speakers of languages other than English and Spanish.

  3. G

    Mother tongue by knowledge of official languages, language spoken most often...

    • ouvert.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Mother tongue by knowledge of official languages, language spoken most often at home and other language(s) spoken regularly at home: Canada, provinces and territories, census divisions and census subdivisions [Dataset]. https://ouvert.canada.ca/data/dataset/69e0cbf2-457f-4313-8013-9252e722d5b8
    Explore at:
    csv, html, xmlAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    Data on mother tongue, knowledge of official languages, language spoken most often at home and other language(s) spoken regularly at home and age for the population excluding institutional residents.

  4. Language spoken most often at home by age: Canada, provinces and...

    • ouvert.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Language spoken most often at home by age: Canada, provinces and territories, census metropolitan areas and census agglomerations with parts [Dataset]. https://ouvert.canada.ca/data/dataset/87915210-5c40-47aa-86db-6c07b379e0bb
    Explore at:
    xml, html, csvAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    Data on language spoken most often at home by age for the population excluding institutional residents of Canada, provinces and territories, census metropolitan areas and census agglomerations.

  5. First official language spoken by language spoken most often at home: Canada...

    • datasets.ai
    • www150.statcan.gc.ca
    • +2more
    21, 55, 8
    Updated Sep 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada | Statistique Canada (2024). First official language spoken by language spoken most often at home: Canada and forward sortation areas © [Dataset]. https://datasets.ai/datasets/a786d3d5-cdad-4ba0-af40-23ab394d0083
    Explore at:
    21, 8, 55Available download formats
    Dataset updated
    Sep 7, 2024
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Authors
    Statistics Canada | Statistique Canada
    Area covered
    Canada
    Description

    Data on first official language spoken, language spoken most often at home, age and gender for the population excluding institutional residents for Canada and forward sortation areas.

  6. G

    First official language spoken by language spoken most often at home:...

    • ouvert.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). First official language spoken by language spoken most often at home: Canada, provinces and territories, census divisions and census subdivisions [Dataset]. https://ouvert.canada.ca/data/dataset/6306b46c-d85d-4237-9eb7-b206981f8f98
    Explore at:
    csv, html, xmlAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    Data on first official language spoken, language spoken most often at home, age and gender for the population excluding institutional residents for Canada, provinces and territories, census divisions and census subdivisions.

  7. 2010-2014 ACS Language Spoken at Home Variables - Boundaries

    • hub.arcgis.com
    Updated Nov 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2020). 2010-2014 ACS Language Spoken at Home Variables - Boundaries [Dataset]. https://hub.arcgis.com/maps/98bf5b2403c5456492df577ee3cee241
    Explore at:
    Dataset updated
    Nov 20, 2020
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    This layer contains 2010-2014 American Community Survey (ACS) 5-year data, and contains estimates and margins of error. The layer shows language group of language spoken at home by age. This is shown by tract, county, and state boundaries. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis. This layer is symbolized to show the percentage of the population age 5+ who speak Spanish at home. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right. Vintage: 2010-2014ACS Table(s): B16007 Data downloaded from: Census Bureau's API for American Community Survey Date of API call: November 11, 2020National Figures: data.census.govThe United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. For more information about ACS layers, visit the FAQ. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:This layer has associated layers containing the most recent ACS data available by the U.S. Census Bureau. Click here to learn more about ACS data releases and click here for the associated boundaries layer. The reason this data is 5+ years different from the most recent vintage is due to the overlapping of survey years. It is recommended by the U.S. Census Bureau to compare non-overlapping datasets.Boundaries come from the US Census TIGER geodatabases. Boundary vintage (2014) appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines clipped for cartographic purposes. For census tracts, the water cutouts are derived from a subset of the 2010 AWATER (Area Water) boundaries offered by TIGER. For state and county boundaries, the water and coastlines are derived from the coastlines of the 500k TIGER Cartographic Boundary Shapefiles. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.

  8. LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED...

    • catalog.data.gov
    Updated Jan 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Seattle ArcGIS Online (2025). LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED ENGLISH SPEAKING HOUSEHOLDS (B16003) [Dataset]. https://catalog.data.gov/dataset/language-spoken-at-home-for-the-population-5-years-and-over-in-limited-english-speaking-ho
    Explore at:
    Dataset updated
    Jan 31, 2025
    Dataset provided by
    https://arcgis.com/
    Description

    Table from the American Community Survey (ACS) B16003 of age by language spoken at home for the population 5 years and over in limited English-speaking households. These are multiple, nonoverlapping vintages of the 5-year ACS estimates of population and housing attributes starting in 2010 shown by the corresponding census tract vintage. Also includes the most recent release annually.King County, Washington census tracts with nonoverlapping vintages of the 5-year American Community Survey (ACS) estimates starting in 2010. Vintage identified in the "ACS Vintage" field.The census tract boundaries match the vintage of the ACS data (currently 2010 and 2020) so please note the geographic changes between the decades. Tracts have been coded as being within the City of Seattle as well as assigned to neighborhood groups called "Community Reporting Areas". These areas were created after the 2000 census to provide geographically consistent neighborhoods through time for reporting U.S. Census Bureau data. This is not an attempt to identify neighborhood boundaries as defined by neighborhoods themselves.Vintages: 2010, 2015, 2020, 2021, 2022, 2023ACS Table(s): B16003Data downloaded from: <a href='https://data.c

  9. C

    Census Data - Languages spoken in Chicago, 2008 – 2012

    • data.cityofchicago.org
    • depauliaonline.com
    • +3more
    application/rdfxml +5
    Updated Sep 12, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2014). Census Data - Languages spoken in Chicago, 2008 – 2012 [Dataset]. https://data.cityofchicago.org/Health-Human-Services/Census-Data-Languages-spoken-in-Chicago-2008-2012/a2fk-ec6q
    Explore at:
    application/rssxml, csv, application/rdfxml, xml, json, tsvAvailable download formats
    Dataset updated
    Sep 12, 2014
    Dataset authored and provided by
    U.S. Census Bureau
    Area covered
    Chicago
    Description

    This dataset contains estimates of the number of residents aged 5 years or older in Chicago who “speak English less than very well,” by the non-English language spoken at home and community area of residence, for the years 2008 – 2012. See the full dataset description for more information at: https://data.cityofchicago.org/api/views/fpup-mc9v/files/dK6ZKRQZJ7XEugvUavf5MNrGNW11AjdWw0vkpj9EGjg?download=true&filename=P:\EPI\OEPHI\MATERIALS\REFERENCES\ECONOMIC_INDICATORS\Dataset_Description_Languages_2012_FOR_PORTAL_ONLY.pdf

  10. E

    GlobalPhone Vietnamese

    • live.european-language-grid.eu
    • catalogue.elra.info
    audio format
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GlobalPhone Vietnamese [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/2100
    Explore at:
    audio formatAvailable download formats
    License

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    The GlobalPhone corpus developed in collaboration with the Karlsruhe Institute of Technology (KIT) was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, multilingual speech and text database for language independent and language adaptive speech recognition as well as for language identification tasks.

    The entire GlobalPhone corpus enables the acquisition of acoustic-phonetic knowledge of the following 22 spoken languages: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), Chinese-Shanghai (ELRA-S0194), Croatian (ELRA-S0195), Czech (ELRA-S0196), French (ELRA-S0197), German (ELRA-S0198), Hausa (ELRA-S0347), Japanese (ELRA-S0199), Korean (ELRA-S0200), Polish (ELRA-S0320), Portuguese (Brazilian) (ELRA-S0201), Russian (ELRA-S0202), Spanish (Latin America) (ELRA-S0203), Swahili (ELRA-S0375), Swedish (ELRA-S0204), Tamil (ELRA-S0205), Thai (ELRA-S0321), Turkish (ELRA-S0206), Ukrainian (ELRA-S0377), and Vietnamese (ELRA-S0322).

    In each language about 100 sentences were read from each of the 100 speakers. The read texts were selected from national newspapers available via Internet to provide a large vocabulary. The read articles cover national and international political news as well as economic news. The speech is available in 16bit, 16kHz mono quality, recorded with a close-speaking microphone (Sennheiser 440-6). The transcriptions are internally validated and supplemented by special markers for spontaneous effects like stuttering, false starts, and non-verbal effects like laughing and hesitations. Speaker information like age, gender, occupation, etc. as well as information about the recording setup complement the database. The entire GlobalPhone corpus contains over 450 hours of speech spoken by more than 2100 native adult speakers.

    Data is shortened by means of the shorten program written by Tony Robinson. Alternatively, the data could be delivered unshorten.

    The Vietnamese part of GlobalPhone was collected in summer 2009. In total 160 speakers were recorded, 140 of them in the cities of Hanoi and Ho Chi Minh City in Vietnam, and an additional set of 20 speakers were recorded in Karlsruhe, Germany. All speakers are Vietnamese native speakers, covering the main dialectal variants from South and North Vietnam. Of these 160 speakers, 70 were female and 90 were male. The majority of speakers are well educated, being graduated students and engineers. The age distribution of the speakers ranges from 18 to 65 years. Each speaker read between 50 and 200 utterances from newspaper articles, corresponding to roughly 9.5 minutes of speech or 138 utterances per person, in total we recorded 22.112 utterances. The speech was recorded using a close-talking microphone Sennheiser HM420 in a push-to-talk scenario using an inhouse developed modern laptop-based data collection toolkit. All data were recorded at 16kHz and 16bit resolution in PCM format. The data collection took place in small-sized rooms with very low background noise. Information on recording place and environmental noise conditions are provided in a separate speaker session file for each speaker. The speech data was recorded in two phases. In a first phase data was collected from 140 speakers in the cities of Hanoi and Ho Chi Minh. In the second phase we selected utterances from the text corpus in order to cover rare Vietnamese phonemes. This second recording phase was carried out with 20 Vietnamese graduate students who live in Karlsruhe. In sum, 22.112 utterances were spoken, corresponding to 25.25 hours of speech. The text data used for recording mainly came from the news posted in online editions of 15 Vietnamese newspaper websites, where the first 12 were used for the training set, while the last three were used for the development and evaluation set. The text data collected from the first 12 websites cover almost 4 Million word tokens with a vocabulary of 30.000 words resulting in an Out-of-Vocabulary rate of 0% on the development set and 0.067% on the evaluation set. For the text selection we followed the standard GlobalPhone protocols and focused on national and international politics and economics news (see [SCHULTZ 2002]). The transcriptions are provided in Vietnamese-style Roman script, i.e. using several diacritics encoded in UTF-8. The Vietnamese data are organized in a training set of 140 speakers with 22.15 hours of speech, a development set of 10 speakers, 6 from North and 4 from South Vietnam with 1:40 hours of speech and an evaluation set of 10 speakers with same gender and dialect distribution as the development set with 1:30 hours of speech. More details on corpus statistics, collection scenario, and system building based on the Vietnamese part of GlobalPhone can be found under [Vu and Schultz, 2009, 2010].

    [Schultz 2002] Tanja Schultz (2002): GlobalPhone: A Multilingual Speech and Text Database developed at Karlsruhe University, Proceedings of the International Conference of Spoken Language Processing, ICSLP 2002, Denver, CO, September 2002. [Vu and Schultz, 2010] Ngoc Thang Vu, Tanja Schultz (2010): Optimization On Vietnamese Large Vocabulary Speech Recognition, 2nd Workshop on Spoken Languages Technologies for Under-resourced Languages, SLTU 2010, Penang, Malaysia, May 2010. [Vu and Schultz, 2009] Ngoc Thang Vu, Tanja Schultz (2009): Vietnamese Large Vocabulary Continuous Speech Recognition, Automatic Speech Recognition and Understanding, ASRU 2009, Merano.

  11. G

    Population by language spoken most often at home and geography, 1971 to...

    • open.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Population by language spoken most often at home and geography, 1971 to 2016, inactive [Dataset]. https://open.canada.ca/data/dataset/3ba16b66-abb7-4789-a1ee-adde239f9cfd
    Explore at:
    html, csv, xmlAvailable download formats
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Data on the language spoken most often at home by the population of Canada and Canada outside Quebec, and of all provinces and territories, for Census years 1971 to 2016.

  12. Language spoken most often at home by other language(s) spoken regularly at...

    • ouvert.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Language spoken most often at home by other language(s) spoken regularly at home: Canada, provinces and territories, census metropolitan areas and census agglomerations with parts [Dataset]. https://ouvert.canada.ca/data/dataset/ea320917-f0e4-4baa-a0f5-642557d37bf3
    Explore at:
    html, xml, csvAvailable download formats
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    Data on language spoken most often at home, other language(s) spoken regularly at home and age for the population excluding institutional residents for Canada, provinces and territories, census metropolitan areas and census agglomerations.

  13. Languages and English Ability - Seattle Neighborhoods

    • catalog.data.gov
    • arc-gis-hub-home-arcgishub.hub.arcgis.com
    Updated Dec 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Seattle ArcGIS Online (2024). Languages and English Ability - Seattle Neighborhoods [Dataset]. https://catalog.data.gov/dataset/languages-and-english-ability-seattle-neighborhoods
    Explore at:
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    https://arcgis.com/
    Area covered
    Seattle
    Description

    Table from the American Community Survey (ACS) 5-year series on languages spoken and English ability related topics for City of Seattle Council Districts, Comprehensive Plan Growth Areas and Community Reporting Areas. Table includes B16004 Age by Language Spoken at Home by Ability to Speak English, C16002 Household Language by Household Limited English-Speaking Status. Data is pulled from block group tables for the most recent ACS vintage and summarized to the neighborhoods based on block group assignment.Table created for and used in the Neighborhood Profiles application.Vintages: 2023ACS Table(s): B16004, C16002Data downloaded from: Census Bureau's Explore Census Data The United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for

  14. g

    First official language spoken by language spoken most often at home: Census...

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    First official language spoken by language spoken most often at home: Census metropolitan areas, tracted census agglomerations and census tracts | gimi9.com [Dataset]. https://gimi9.com/dataset/ca_90275ac0-2f6f-461d-a904-7cf7cf8251da
    Explore at:
    Description

    Data on first official language spoken, language spoken most often at home, age and gender for the population excluding institutional residents for census metropolitan areas, tracted census agglomerations and census tracts.

  15. n

    Data from: Language Spoken at Home

    • linc.osbm.nc.gov
    csv, excel, geojson +1
    Updated Oct 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Language Spoken at Home [Dataset]. https://linc.osbm.nc.gov/explore/dataset/language-spoken-at-home/
    Explore at:
    geojson, csv, json, excelAvailable download formats
    Dataset updated
    Oct 3, 2024
    Description

    Language spoken at home and the ability to speak English for the population age 5 and over as reported by the US Census Bureau's, American Community Survey (ACS) 5-year estimates table C16001.

  16. Language Spoken at Home by Zip Code Tabulation Area 2012-2016

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Language Spoken at Home by Zip Code Tabulation Area 2012-2016 [Dataset]. https://www.johnsnowlabs.com/marketplace/language-spoken-at-home-by-zip-code-tabulation-area-2012-2016/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Time period covered
    2012 - 2016
    Area covered
    United States
    Description

    This American Community Survey (ACS) data set identifies the language spoken at home by zip code tabulation area within the United States, from 2012 through 2016. The dataset identifies languages spoken and how well English is spoken by Zip Code Tabulation Area.

  17. a

    Languages spoken by tract, ACS

    • hub.arcgis.com
    • massachsuetts-environmental-justice-datasets-mass-eoeea.hub.arcgis.com
    Updated May 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MA Executive Office of Energy and Environmental Affairs (2021). Languages spoken by tract, ACS [Dataset]. https://hub.arcgis.com/datasets/6c8c34fa83564796b564bdad99be912c
    Explore at:
    Dataset updated
    May 19, 2021
    Dataset authored and provided by
    MA Executive Office of Energy and Environmental Affairs
    Area covered
    Description

    The American Community Survey, Table B16001 provided detailed individual-level language estimates at the tract level of 42 non-English language categories, tabulated by the English-speaking ability. Two sets of languages data are included here, with population counts and percentages for both:the tract population speaking languages other than English, regardless of English=speaking ability, identified by the language name, and the languages spoken other than English by the tract population who does not speak English 'very well', identified by the language name followed by "_Enw".The default pop-up for this service presents the second of these data: languages spoken other than English by the tract population who does not speak English 'very well'.In part because of privacy concerns with the very small counts in some categories in Table B16001, the Census changed the American Community Survey estimates of the languages spoken by individuals. In 2016, the number of categories previously presented in Table B16001 was reduced to reflect the most commonly spoken languages, and several languages spoken in Massachusetts were grouped into generalized (i.e., "Other...") categories.Table B16001 has been renamed Table C16001 with these generalized categories. Therefore, although the information presented in this datalayer is not current, and these data cannot be updated.

  18. e

    Top Languages Spoken in London Boroughs and MSOAs

    • data.europa.eu
    unknown
    Updated Jul 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    census2011@london.gov.uk (2021). Top Languages Spoken in London Boroughs and MSOAs [Dataset]. https://data.europa.eu/data/datasets/top-languages-spoken-in-london-boroughs-and-msoas?locale=ga
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jul 19, 2021
    Dataset authored and provided by
    census2011@london.gov.uk
    Area covered
    London
    Description

    This dataset shows the most spoken languages by borough and MSOAs in London. It provides numbers of the population aged 3+ who speak specified languages as their main language.

    Main language is from 2011 Census (detailed) - Census table QS204EW.

    This data is presented alongside Annual Population Survey (APS) data showing the top nationalities of residents in January - December 2019 by borough. The top 3 non-British nationalities are at the far right of the table. This is to highlight areas which may now have other common non-British languages spoken compared to 2011 (the year in which the Census information was gathered). The top non-British nationalities in 2019, which did not feature in 2011 as one of the most spoken non-British languages, are highlighted in column AD.

    The APS has a sample of around 320,000 people in the UK (around 28,000 in London). As such all figures must be treated with some caution. Estimates for non-British nationalities at borough level that are below 10,000 are considered too small to be reliable and should be treated with additional caution.

    MSOA codes have now been linked to House of Commons MSOA names

  19. Ranking of languages spoken at home in the U.S. 2022

    • statista.com
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Ranking of languages spoken at home in the U.S. 2022 [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
    Explore at:
    Dataset updated
    Dec 9, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2022
    Area covered
    United States
    Description

    In 2022, around 42.03 million people in the United States spoke Spanish at home. In comparison, approximately 974,829 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.

  20. F

    Mandarin (China) General Conversation Speech Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mandarin (China) General Conversation Speech Dataset [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-mandarin-china
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the Mandarin Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of Mandarin language speech recognition models, with a particular focus on Chinese accents and dialects.

    With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the Mandarin language spoken in China.

    Speech Data:

    This training dataset comprises 50 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 70 native Mandarin speakers from different states/provinces of China. This collaborative effort guarantees a balanced representation of Chinese accents, dialects, and demographics, reducing biases and promoting inclusivity.

    Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.

    Metadata:

    In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.

    The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of Mandarin language speech recognition models.

    Transcription:

    This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.

    Our goal is to expedite the deployment of Mandarin language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.

    Updates and Customization:

    We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.

    If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.

    License:

    This audio dataset, created by FutureBeeAI, is now available for commercial use.

    Conclusion:

    Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). The most spoken languages worldwide 2023 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Organization logo

The most spoken languages worldwide 2023

Explore at:
411 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
World
Description

In 2023, there were around 1.5 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.1 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year.

Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation and other official pronouncements. The United States is a land of immigrations and the languages spoken in the United States vary as a result of the multi-cultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over 41 million people spoke at home in 2021. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.7 million Tagalog speakers and 1.5 million Vietnamese speakers counted in the United States that year.

Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 44 percent of California’s population was speaking a language other than English at home in 2021.

Search
Clear search
Close search
Google apps
Main menu