94 datasets found
  1. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  2. Common languages used for web content 2025, by share of websites

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  3. Ranking of languages spoken at home in the U.S. 2023

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Ranking of languages spoken at home in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.

  4. Top Languages Spoken in the United States

    • kaggle.com
    Updated Oct 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Top Languages Spoken in the United States [Dataset]. https://www.kaggle.com/datasets/thedevastator/top-languages-spoken-in-the-united-states/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Area covered
    United States
    Description

    Top Languages Spoken in the United States

    The Impact of linguistics on Community and Business in America

    About this dataset

    Languages are an important part of daily life in the USA. Here is a table that shows the most common languages spoken in the USA, as well as a big spreadsheet which shows each CBSA (Core-Based Statistical Area, or urban area).

    Language usage varies widely throughout the United States. According to the latest census data, over 350 different languages are represented in homes across the country. The following table and spreadsheet provide more detailed information on language usage throughout the various states and cities in the US:

    Columns: - index: Index column for dataframe - Table with column headers in row 5 and row headers in column A: Contains language data for each CBSA (Core Based Statistical Area) - Unnamed: 1: Rank of CBSA by total number of speakers of all languages - Unnamed: 2: Name of CBSA - Unnamed: 3: Population of CBSA - Unnamed: 4: Percent of population that speaks English very well - Unnamed: 5 through Unnamed: 58 : Languages spoken by at least 0.1% of the population, with corresponding percentages

    How to use the dataset

    1. This dataset can be used to understand the linguistic diversity of the United States, and to compare languages spoken across different states and cities.
    2. This data can also be used to explore trends in language usage over time.
    3. businesses can use this dataset to identify which languages are most commonly spoken in the areas in which they operate and tailor their marketing or customer service accordingly.
    4. Schools could use this dataset to plan language-learning programs based on the needs of their community.
    5. Policymakers could use this data to better understand linguistic diversity in the United States and design programs to support bilingualism or multilingualism

    Research Ideas

    1. Businesses can use this dataset to identify which languages are most commonly spoken in the areas in which they operate and cater their marketing or customer service accordingly.
    2. Schools could use this data to plan language-learning programs based on the needs of their community.
    3. Policymakers could use this dataset to better understand linguistic diversity in the United States and design programs to support bilingualism or multilingualism

    Acknowledgements

    This dataset was created by Gary Hoover. The data was sourced from https://www.kaggle.com/garyhoov/us-languages

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: Languages Spoken at Home by Urban Area = CBSA.csv

    File: US Languages Spoken at Home 2014.csv | Column name | Description | |:-------------------------------------------------------------------|:--------------| | Table with column headers in row 5 and row headers in column A | |

  5. Languages in Mexico 2020

    • statista.com
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Languages in Mexico 2020 [Dataset]. https://www.statista.com/statistics/275440/languages-in-mexico/
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2020
    Area covered
    Mexico
    Description

    In 2020, about 93.8 percent of the Mexican population was monolingual in Spanish. Around five percent spoke a combination of Spanish and indigenous languages. Spanish is the third-most spoken native language worldwide, after Mandarin Chinese and Hindi.

    Mexican Spanish

    Spanish was first being used in Mexico in the 16th century, at the time of Spanish colonization during the Conquest campaigns of what is now Mexico and the Caribbean. As of 2018, Mexico is the country with the largest number of native Spanish speakers worldwide. Mexican Spanish is influenced by English and Nahuatl, and has about 120 million users. The Mexican government uses Spanish in the majority of its proceedings, however it recognizes 68 national languages, 63 of which are indigenous.

    Indigenous languages spoken

    Of the indigenous languages spoken, two of the most widely used are Nahuatl and Maya. Due to a history of marginalization of indigenous groups, most indigenous languages are endangered, and many linguists warn they might cease to be used after a span of just a few decades. In recent years, legislative attempts such as the San Andréas Accords have been made to protect indigenous groups, who make up about 25 million of Mexico’s 125 million total inhabitants, though the efficacy of such measures is yet to be seen.

  6. Data from: Kallaama: A Transcribed Speech Dataset about Agriculture in the...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Apr 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elodie Gauthier; Elodie Gauthier; Aminata Ndiaye Diallo; Abdoulaye Guissé; Aminata Ndiaye Diallo; Abdoulaye Guissé (2024). Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal [Dataset]. http://doi.org/10.5281/zenodo.10892569
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 3, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Elodie Gauthier; Elodie Gauthier; Aminata Ndiaye Diallo; Abdoulaye Guissé; Aminata Ndiaye Diallo; Abdoulaye Guissé
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2023
    Area covered
    Senegal
    Description

    This data is transcribed speech data, in Wolof, Pulaar and Sereer.

    The recordings are about agriculture. The recorded consist of farmers, agricultural advisers, and agri-food business managers. Type of recordings comprise interactive radio programmes, focus groups, voice messages, push messages and interviews. Therefore, spontaneous speech is prevailing. Quality of audio may vary depending on the type of programme.

    Content description :

    • speech_dataset_wol.tar.gz: Wolof (ISO Code 639-2: wol) speech dataset contains 55 hours of transcribed speech, including almost 13 hours of validated content check by an expert. It also contains a XSAMPA lexicon (49,132 phonetised entries) and a text corpus (1,140,508 words).
    • speech_dataset_fuc.tar.gz: Pulaar (ISO Code 639-2: fuc) speech dataset contains nearly 32 hours of transcribed speech, including around 11 hours of validated content check by an expert. It also contains a text corpus (742,024 words).
    • speech_dataset_srr.tar.gz: Sereer (ISO Code 639-2: srr) speech dataset contains 38 hours of transcribed speech, including nearly 11 hours of validated content check by an expert.
      In total, these resources provide 125 hours of transcribed speech in the 3 most widely spoken languages in Senegal, including 35 hours of checked transcriptions.

    This work is a result of the Kallaama project, funded by Lacuna Fund for 1 year, in 2023.

    See the https://github.com/gauthelo/kallaama-speech-dataset" target="_blank" rel="noopener">GitHub repository for more details about the dataset.

  7. e

    Top Languages Spoken in London Boroughs and MSOAs

    • data.europa.eu
    unknown
    Updated Jul 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    census2011@london.gov.uk (2021). Top Languages Spoken in London Boroughs and MSOAs [Dataset]. https://data.europa.eu/data/datasets/top-languages-spoken-in-london-boroughs-and-msoas?locale=ga
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jul 19, 2021
    Dataset authored and provided by
    census2011@london.gov.uk
    Area covered
    London
    Description

    This dataset shows the most spoken languages by borough and MSOAs in London. It provides numbers of the population aged 3+ who speak specified languages as their main language.

    Main language is from 2011 Census (detailed) - Census table QS204EW.

    This data is presented alongside Annual Population Survey (APS) data showing the top nationalities of residents in January - December 2019 by borough. The top 3 non-British nationalities are at the far right of the table. This is to highlight areas which may now have other common non-British languages spoken compared to 2011 (the year in which the Census information was gathered). The top non-British nationalities in 2019, which did not feature in 2011 as one of the most spoken non-British languages, are highlighted in column AD.

    The APS has a sample of around 320,000 people in the UK (around 28,000 in London). As such all figures must be treated with some caution. Estimates for non-British nationalities at borough level that are below 10,000 are considered too small to be reliable and should be treated with additional caution.

    MSOA codes have now been linked to House of Commons MSOA names

  8. First Official Language Spoken (7), Detailed Language Spoken Most Often at...

    • datasets.ai
    • open.canada.ca
    55
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada | Statistique Canada (2024). First Official Language Spoken (7), Detailed Language Spoken Most Often at Home (232), Age Groups (17A) and Sex (3) for the Population Excluding Institutional Residents of Canada, Provinces, Territories, Census Divisions and Census Subdivisions, 2011 Census [Dataset]. https://datasets.ai/datasets/af80f73b-5820-45ca-8348-420fbf25e72f
    Explore at:
    55Available download formats
    Dataset updated
    Aug 6, 2024
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Authors
    Statistics Canada | Statistique Canada
    Area covered
    Canada
    Description

    This table is part of a series of tables that present a portrait of Canada based on the various census topics. The tables range in complexity and levels of geography. Content varies from a simple overview of the country to complex cross-tabulations; the tables may also cover several censuses.

  9. Most spoken Indian languages worldwide 2025

    • statista.com
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most spoken Indian languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/1614099/worldwide-indian-languages-spoken/
    Explore at:
    Dataset updated
    Jun 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    India
    Description

    As of 2025, ***** was the most spoken Indian language worldwide and ranked third globally, with approximately *** million speakers. ******* was the second most spoken Indian language, with approximately *** million speakers globally.

  10. n

    Data from: Language Spoken at Home

    • linc.osbm.nc.gov
    • ncosbm.opendatasoft.com
    csv, excel, geojson +1
    Updated Oct 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Language Spoken at Home [Dataset]. https://linc.osbm.nc.gov/explore/dataset/language-spoken-at-home/
    Explore at:
    geojson, csv, json, excelAvailable download formats
    Dataset updated
    Oct 3, 2024
    Description

    Language spoken at home and the ability to speak English for the population age 5 and over as reported by the US Census Bureau's, American Community Survey (ACS) 5-year estimates table C16001.

  11. g

    Mother Tongue (8), Knowledge of Official Languages (5), Language Spoken Most...

    • gimi9.com
    Updated May 3, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2012). Mother Tongue (8), Knowledge of Official Languages (5), Language Spoken Most Often at Home (8), Other Language Spoken Regularly at Home (9), Age Groups (7) and Sex (3) for the Population Excluding Institutional Residents of Canada, Provinces, Territories, | gimi9.com [Dataset]. https://gimi9.com/dataset/ca_aa852f12-309c-465a-89ba-87a8dac27094
    Explore at:
    Dataset updated
    May 3, 2012
    Area covered
    Canada
    Description

    This table is part of a series of tables that present a portrait of Canada based on the various census topics. The tables range in complexity and levels of geography. Content varies from a simple overview of the country to complex cross-tabulations; the tables may also cover several censuses.

  12. Detailed Language Spoken Most Often at Home (103), Other Language Spoken...

    • datasets.ai
    • open.canada.ca
    55
    Updated Aug 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada | Statistique Canada (2024). Detailed Language Spoken Most Often at Home (103), Other Language Spoken Regularly at Home (9), Generation Status (4) and Sex (3) for the Population 15 Years and Over of Census Metropolitan Areas, Tracted Census Agglomerations and Census Tracts, 2006 Census - 20% Sample Data [Dataset]. https://datasets.ai/datasets/bd21a93c-61aa-4777-9a2c-d9aa63ffc93d
    Explore at:
    55Available download formats
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Authors
    Statistics Canada | Statistique Canada
    Description

    This table is part of a series of tables that present a portrait of Canada based on the various census topics. The tables range in complexity and levels of geography. Content varies from a simple overview of the country to complex cross-tabulations; the tables may also cover several censuses.

  13. u

    Mother Tongue (10), First Official Language Spoken (7), Language Spoken Most...

    • data.urbandatacentre.ca
    Updated Oct 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Mother Tongue (10), First Official Language Spoken (7), Language Spoken Most Often at Home (10), Other Language(s) Spoken Regularly at Home (11), Age (27) and Sex (3) for the Population Excluding Institutional Residents of Canada, Provinces and Territories, Census Metropolitan Areas and Census Agglomerations, 2016 Census - 100% Data - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-a753d1c9-c5d1-43f5-9d45-ede896fdaf36
    Explore at:
    Dataset updated
    Oct 1, 2024
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    This table is part of a series of tables that present a portrait of Canada based on the various census topics. The tables range in complexity and levels of geography. Content varies from a simple overview of the country to complex cross-tabulations; the tables may also cover several censuses.

  14. Mother Tongue (8), Knowledge of Official Languages (5), Language Spoken Most...

    • data.wu.ac.at
    • datasets.ai
    • +1more
    xml
    Updated Dec 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada | Statistique Canada (2016). Mother Tongue (8), Knowledge of Official Languages (5), Language Spoken Most Often at Home (8), Other Language Spoken Regularly at Home (9), Age Groups (7) and Sex (3) for the Population Excluding Institutional Residents of Canada, Provinces, Territories, Census Metropolitan Areas and Census Agglomerations, 2011 Census [Dataset]. https://data.wu.ac.at/schema/www_data_gc_ca/YWE4NTJmMTItMzA5Yy00NjVhLTg5YmEtODdhOGRhYzI3MDk0
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Dec 1, 2016
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    This table is part of a series of tables that present a portrait of Canada based on the various census topics. The tables range in complexity and levels of geography. Content varies from a simple overview of the country to complex cross-tabulations; the tables may also cover several censuses.

  15. Table 2.5 - Speakers of foreign languages by language spoken by NUTS3...

    • census.geohive.ie
    Updated Dec 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Office (2023). Table 2.5 - Speakers of foreign languages by language spoken by NUTS3 (Census 2022) [Dataset]. https://census.geohive.ie/datasets/5332fbecfe0e4c27ae4a9fa3bc49fe72
    Explore at:
    Dataset updated
    Dec 1, 2023
    Dataset provided by
    Central Statistics Office Irelandhttps://www.cso.ie/en/
    Authors
    Central Statistics Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Speakers of foreign languages by language spoken by NUTS3. (Census 2022 Theme 2 Table 5 )Census 2022 table 2.5 is speakers of foreign languages by languages spoken. Details include foreign language speakers. Census 2022 theme 2 is Migration, Ethnicity, Religion and Foreign Languages. The Nomenclature of Territorial Units for Statistics (NUTS) were created by Eurostat in order to define territorial units for the production of regional statistics across the European Union. In 2003 the NUTS classification was established within a legal framework (Regulation (EC) No 1059/2003).Changes made under the 2014 Local Government Act prompted a revision of the Irish NUTS 2 and NUTS 3 Regions. The main changes at NUTS 3 level were the transfer of South Tipperary from the South-East into the Mid-West NUTS 3 region and the movement of Louth from the Border to the Mid-East NUTS 3 Region. NUTS 3 Regions are grouped into three NUTS 2 Regions (Northern and Western, Southern, Eastern and Midland) which correspond to the Regional Assemblies established in the 2014 Local Government Act. The revisions made to the NUTS boundaries have been given legal status under Commission Regulation (EU) 2016/2066.Coordinate reference system: Irish Transverse Mercator (EPSG 2157). These boundaries are based on 20m generalised boundaries sourced from Tailte Éireann Open Data Portal. NUTS3 Regions 2015This dataset is provided by Tailte Éireann

  16. l

    Census 2021 - Main language

    • data.leicester.gov.uk
    csv, excel, json
    Updated Apr 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Census 2021 - Main language [Dataset]. https://data.leicester.gov.uk/explore/dataset/census-2021-leicester-main-language-detailed/
    Explore at:
    excel, json, csvAvailable download formats
    Dataset updated
    Apr 25, 2023
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for Leicester and compare this with national statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsMain languageThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their main language. The estimates are as at Census Day, 21 March 2021.Main language is a person's first or preferred language. They may speak other languages as well. A main language is provided only for residents age 3 and above. Residents age below 3 years will appear as ‘Does not apply’. Please note that some organisations exclude those below 3 years when calculating percentages for this variable.This dataset contains information for Leicester City and England overall.

  17. Z

    Leibniz-ZAS corpus of MAIN

    • data.niaid.nih.gov
    Updated May 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rizaeva, Zarina (2021). Leibniz-ZAS corpus of MAIN [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4724969
    Explore at:
    Dataset updated
    May 6, 2021
    Dataset provided by
    Sternharz, Alyona
    Gagarina, Natalia
    Topaj, Nathalie
    Rizaeva, Zarina
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The presented dataset is part of the narrative corpus collected at Leibniz-Centre General Linguistics (Leibniz-ZAS). It contains transcriptions of oral narratives elicited with the Multilingual Assessment Instrument for Narratives (MAIN; read more here), developed as part of the LITMUS battery of tests in the framework of COST Action IS0804 Language Impairment in a Multilingual Society: Linguistic Patterns and the Road to Assessment. Narratives were elicited in the Russian, Turkish and German languages in the telling elicitation mode using two MAIN picture stories, Baby Birds and Baby Goats. The data were collected during two large-scale longitudinal studies conducted at ZAS in the framework of the Berlin Interdisciplinary Network for Multilingualism (BIVEM) and Interdisciplinary Research Alliance (IFV) projects (more information about the studies). The participants of the studies were Russian-German and Turkish-German bilingual children from different areas of Berlin. Their language development was closely documented every year from early kindergarten up to the end of the third grade of primary school (age 2;9 to 10;4 years). It is the longest and largest study of language development in bilingual children in Germany allowing for cross-sectional and longitudinal analyses from a cross-linguistic perspective.

    The narratives were audio recorded and transcribed in the standardized CHAT format (MacWhinney, 2000) using the CLAN program according to the CHILDES transcription rules for later analysis. The transcriptions can be used to analyze the narrative abilities of bilingual children on macro- and microstructural levels (more information can be found here).

    In total, the dataset contains 210 transcriptions of narratives from 29 participants (10 Russian-German bilingual children and 19 Turkish-German bilingual children), who were tested 5 times after the initial testing (pretest). The 5 testing points are therefore referred to as posttests: post1, post2, post3, post4, post5, post6 (this dataset does not contain data from post5, as oral narratives were not elicited at the end of the second grade). The corresponding age ranges at all testing points are given below for each part of the dataset. The dataset is divided into two parts, Russian-German and Turkish-German narrative corpus respectively.

    The narrative corpus of Russian-German bilingual children includes two folders with narratives elicited in Russian and German, at 5 testing points.

    Total number of transcriptions=100

    Number of children=10

    Total age range=2;9-10;4

    Age range of children for narratives in Russian at each testing point:

    post 1: 2;9-4;3 (kindergarten)

    post 2: 3;9-5;2 (kindergarten)

    post 3: 4;9-6;1 (kindergarten)

    post 4: 6;9-7;6 (end of first grade)

    post 6: 8;7-9;10 (end of third grade)

    Age range of children for narratives in German at each testing point:

    post 1: 2;10-4;3 (kindergarten)

    post 2: 3;9-5;3 (kindergarten)

    post 3: 4;9-6;2 (kindergarten)

    post 4: 6;9-7;6 (end of first grade)

    post 6: 8;8-10;4 (end of third grade)

    The narrative corpus of Turkish-German bilingual children includes two folders.

    One folder contains narratives elicited in German at the earlier 3 testing points, which allows the analysis of early narrative development in one language.

    Total number of transcriptions=30

    Number of children=10

    Total age range=3;5-6;4

    Age range of children for narratives in German at each testing point:

    post 1: 3;5-4;3 (kindergarten)

    post 2: 4;4-5;4 (kindergarten)

    post 3: 5;3-6;4 (kindergarten)

    Another folder contains narratives elicited in both languages, Turkish and German, at 4 testing points starting from post2 and allowing for the analysis of narrative development up to the third grade in both languages.

    Total number of transcriptions=80

    Number of children=10

    Total age range=3;10-9;9

    Age range of children for narratives in Turkish at each testing point:

    post 2: 3;10-5;1 (kindergarten)

    post 3: 4;9-6;1 (kindergarten)

    post 4: 6;5-7;8 (end of first grade)

    post 6: 8;6-9;9 (end of third grade)

    Age range of children for narratives in German at each testing point:

    post 2: 4;1-5;4 (kindergarten)

    post 3: 5;1-6;4 (kindergarten)

    post 4: 6;6-7;8 (end of first grade)

    post 6: 8;5-9;8 (end of third grade)

    The files are named according to the following pattern: child’s code (letters refer to child’s first languages: r-Russian, t-Turkish), test (MAIN), story (bb=Baby Birds, bg=Baby Goats), language of elicitation (de/ru/tr), testing point (1=post1, 2=post2 etc.), and child’s age (year/month). Here is an example: r009_MAIN_bb_de_4_610.

  18. d

    Data from: Domestic and International Common Language Database (DICL)

    • researchdiscovery.drexel.edu
    Updated Mar 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tamara Gurevich; Peter Herman; Farid Toubal; Yoto Yotov (2025). Domestic and International Common Language Database (DICL) [Dataset]. https://researchdiscovery.drexel.edu/esploro/outputs/dataset/Domestic-and-International-Common-Language-Database/991022032773104721
    Explore at:
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    United States International Trade Commission
    Authors
    Tamara Gurevich; Peter Herman; Farid Toubal; Yoto Yotov
    Time period covered
    2024
    Description

    The database contains index measures of linguistic similarity both domestically and internationally. The domestic measures capture linguistic similarities present among populations within a single country while the international indexes capture language similarities between two different countries. The 8 indices reflect three different aspects of language: common official languages, common native and acquired spoken languages, and linguistic proximity across different languages. This database has many uses, such as in models of bilateral flows—including FDI, migration, and international trade—as well as in regional or country level analyses. Extensive and detailed coverage: Bilateral indexes for 242 countries Based on 6,674 individual languages

  19. u

    Detailed Language Spoken Most Often at Home (186), Other Language Spoken...

    • data.urbandatacentre.ca
    Updated Oct 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Detailed Language Spoken Most Often at Home (186), Other Language Spoken Regularly at Home (9), Mother Tongue (8), Age Groups (17A) and Sex (3) for the Population of Canada, Provinces, Territories, Census Metropolitan Areas and Census Agglomerations, 2006 Census - 20% Sample Data - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-a535a1c1-39b5-444c-ae09-1848f115039c
    Explore at:
    Dataset updated
    Oct 1, 2024
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    This table is part of a series of tables that present a portrait of Canada based on the various census topics. The tables range in complexity and levels of geography. Content varies from a simple overview of the country to complex cross-tabulations; the tables may also cover several censuses.

  20. l

    Census 21 - Main Language MSOA

    • data.leicester.gov.uk
    csv, excel, geojson +1
    Updated Aug 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Census 21 - Main Language MSOA [Dataset]. https://data.leicester.gov.uk/explore/dataset/census-21-main-language-msoa/
    Explore at:
    json, geojson, excel, csvAvailable download formats
    Dataset updated
    Aug 22, 2023
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for the MSOAs of Leicester and compare this with Leicester overall statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsMain languageThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their main language. The estimates are as at Census Day, 21 March 2021.Main language is a person's first or preferred language. They may speak other languages as well. A main language is provided only for residents age 3 and above. Residents age below 3 years will appear as ‘Does not apply’. Please note that some organisations exclude those below 3 years when calculating percentages for this variable.This dataset contains information for the MSOAs of Leicester City.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Organization logo

The most spoken languages worldwide 2025

Explore at:
438 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description

In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

Search
Clear search
Close search
Google apps
Main menu