100+ datasets found
  1. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  2. Common languages used for web content 2025, by share of websites

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  3. E

    Distribution of the Austronesian Language Family and Major Subgroupings

    • ecaidata.org
    • data.depositar.io
    jgw, jpeg
    Updated Mar 7, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECAI Pacific Language Mapping (2016). Distribution of the Austronesian Language Family and Major Subgroupings [Dataset]. https://ecaidata.org/dataset/distribution-of-the-austronesian-language-family-and-major-subgroupings
    Explore at:
    jpeg, jgwAvailable download formats
    Dataset updated
    Mar 7, 2016
    Dataset provided by
    ECAI Pacific Language Mapping
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ECAI Austronesian Team working with Paul Li and Academia Sinica has digitized several maps concerning the dispersal and classification of Austronesian languages, including: Austronesian Languages Zones by Peter Bellwood.

  4. Spoken Language Statistics

    • zenodo.org
    bin, pdf, txt
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Bampoulidis; Alex Bampoulidis (2024). Spoken Language Statistics [Dataset]. http://doi.org/10.5281/zenodo.55708
    Explore at:
    bin, pdf, txtAvailable download formats
    Dataset updated
    Aug 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alex Bampoulidis; Alex Bampoulidis
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Find out which are the top 10 most spoken languages in the world according to GeoNames and preserve the data containing the information needed, as some countries get split or merged, some languages get extinct, etc.

  5. f

    Recommendations for the suitable contents of the geospatial datasets...

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Rantanen; Harri Tolvanen; Meeli Roose; Jussi Ylikoski; Outi Vesakoski (2023). Recommendations for the suitable contents of the geospatial datasets presenting the distribution of languages including the benefits of each, and our solutions (selected in the case study) concerning the Uralic languages. [Dataset]. http://doi.org/10.1371/journal.pone.0269648.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Timo Rantanen; Harri Tolvanen; Meeli Roose; Jussi Ylikoski; Outi Vesakoski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recommendations for the suitable contents of the geospatial datasets presenting the distribution of languages including the benefits of each, and our solutions (selected in the case study) concerning the Uralic languages.

  6. Ranking of languages spoken at home in the U.S. 2023

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Ranking of languages spoken at home in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.

  7. Language spoken at home by single and multiple responses of language spoken...

    • www150.statcan.gc.ca
    • canwin-datahub.ad.umanitoba.ca
    • +2more
    Updated Aug 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2022). Language spoken at home by single and multiple responses of language spoken at home and mother tongue: Canada, provinces and territories, census metropolitan areas and census agglomerations with parts [Dataset]. http://doi.org/10.25318/9810020101-eng
    Explore at:
    Dataset updated
    Aug 17, 2022
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Data on language spoken at home by single and multiple responses of language spoken at home, mother tongue and age for the population excluding institutional residents for Canada, provinces and territories, census metropolitan areas and census agglomerations.

  8. Languages in Mexico 2020

    • statista.com
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Languages in Mexico 2020 [Dataset]. https://www.statista.com/statistics/275440/languages-in-mexico/
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2020
    Area covered
    Mexico
    Description

    In 2020, about 93.8 percent of the Mexican population was monolingual in Spanish. Around five percent spoke a combination of Spanish and indigenous languages. Spanish is the third-most spoken native language worldwide, after Mandarin Chinese and Hindi.

    Mexican Spanish

    Spanish was first being used in Mexico in the 16th century, at the time of Spanish colonization during the Conquest campaigns of what is now Mexico and the Caribbean. As of 2018, Mexico is the country with the largest number of native Spanish speakers worldwide. Mexican Spanish is influenced by English and Nahuatl, and has about 120 million users. The Mexican government uses Spanish in the majority of its proceedings, however it recognizes 68 national languages, 63 of which are indigenous.

    Indigenous languages spoken

    Of the indigenous languages spoken, two of the most widely used are Nahuatl and Maya. Due to a history of marginalization of indigenous groups, most indigenous languages are endangered, and many linguists warn they might cease to be used after a span of just a few decades. In recent years, legislative attempts such as the San Andréas Accords have been made to protect indigenous groups, who make up about 25 million of Mexico’s 125 million total inhabitants, though the efficacy of such measures is yet to be seen.

  9. p

    Distribution of Students Across Grade Levels in World Languages Institute

    • publicschoolreview.com
    Updated Dec 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2021). Distribution of Students Across Grade Levels in World Languages Institute [Dataset]. https://www.publicschoolreview.com/world-languages-institute-profile
    Explore at:
    Dataset updated
    Dec 1, 2021
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset tracks annual distribution of students across grade levels in World Languages Institute

  10. Languages and English Ability - Seattle Neighborhoods

    • arc-gis-hub-home-arcgishub.hub.arcgis.com
    • data.seattle.gov
    • +4more
    Updated Feb 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Seattle ArcGIS Online (2024). Languages and English Ability - Seattle Neighborhoods [Dataset]. https://arc-gis-hub-home-arcgishub.hub.arcgis.com/datasets/5ebf54a443194f1080ffde06d1d381b5
    Explore at:
    Dataset updated
    Feb 22, 2024
    Dataset provided by
    Authors
    City of Seattle ArcGIS Online
    Area covered
    Seattle
    Description

    Table from the American Community Survey (ACS) 5-year series on languages spoken and English ability related topics for City of Seattle Council Districts, Comprehensive Plan Growth Areas and Community Reporting Areas. Table includes B16004 Age by Language Spoken at Home by Ability to Speak English, C16002 Household Language by Household Limited English-Speaking Status. Data is pulled from block group tables for the most recent ACS vintage and summarized to the neighborhoods based on block group assignment.Table created for and used in the Neighborhood Profiles application.Vintages: 2023ACS Table(s): B16004, C16002Data downloaded from: Census Bureau's Explore Census Data The United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb_(year)_a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2020 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.

  11. f

    The number of dataset files divided into the original published studies...

    • figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Rantanen; Harri Tolvanen; Meeli Roose; Jussi Ylikoski; Outi Vesakoski (2023). The number of dataset files divided into the original published studies (original) and expert-modified distributions (expert) with two overall time periods. [Dataset]. http://doi.org/10.1371/journal.pone.0269648.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Timo Rantanen; Harri Tolvanen; Meeli Roose; Jussi Ylikoski; Outi Vesakoski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The number of dataset files divided into the original published studies (original) and expert-modified distributions (expert) with two overall time periods.

  12. Geographical database of the Uralic languages

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jun 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timo Rantanen; Outi Vesakoski; Jussi Ylikoski; Harri Tolvanen; Timo Rantanen; Outi Vesakoski; Jussi Ylikoski; Harri Tolvanen (2022). Geographical database of the Uralic languages [Dataset]. http://doi.org/10.5281/zenodo.4784188
    Explore at:
    Dataset updated
    Jun 14, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Timo Rantanen; Outi Vesakoski; Jussi Ylikoski; Harri Tolvanen; Timo Rantanen; Outi Vesakoski; Jussi Ylikoski; Harri Tolvanen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    How to cite

    When you use the datasets or maps, please also cite to the following paper introducing the whole of process from data collection, harmonization and visualization until releasing the data:

    Rantanen, T., Tolvanen, H., Roose, M., Ylikoski, J. & Vesakoski, O. (2022) “Best practices for spatial language data harmonization, sharing and map creation - A case study of Uralic” PLoS ONE 17(6): e0269648. https://doi.org/10.1371/journal.pone.0269648.

    Overview

    The Geographical database of the Uralic languages consists of past and current distributions of the Uralic languages both as the original digital spatial datasets and as finalized maps. The database has been collected by the interdisciplinary BEDLAN (Biological Evolution and Diversification of LANguages) research team in collaboration with experts of Uralic languages. The work has been financed by the University of Turku (UTU–BGG), Kone Foundation (UraLex, AikaSyyni), the Academy of Finland (URKO), UiT – The Arctic University of Norway and the University of Oulu, as well as the Finno-Ugrian Society. The data have been compiled for the purposes of doing spatial linguistic and multidisciplinary research, and to visually present the state-of-the-art knowledge of the Uralic languages and their dialects. Geographic distributions are visualized as vector data primarily by using polygon objects (speaker areas or language areas), and in some rare cases, by using points. Based on the language distributions, coordinates for the languages and their dialects (point locations) have also been defined.

  13. w

    Distribution of books per language by Suzanne Wise

    • workwithdata.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Distribution of books per language by Suzanne Wise [Dataset]. https://www.workwithdata.com/charts/books?agg=count&chart=bar&f=1&fcol0=author&fop0=%3D&fval0=Suzanne+Wise&x=language&y=records
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This bar chart displays books by language using the aggregation count. The data is filtered where the author is Suzanne Wise. The data is about books.

  14. d

    Research-Language-7. Working Papers on language distribution

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manning, Patrick (2023). Research-Language-7. Working Papers on language distribution [Dataset]. http://doi.org/10.7910/DVN/LFOZXH
    Explore at:
    Dataset updated
    Nov 12, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Manning, Patrick
    Description

    Working Papers on classification of language phyla, preparation of GIS-based maps, estimation of migration paths for language groups, and related topics - prepared at the University of Pittsburgh.

  15. p

    Distribution of Students Across Grade Levels in Academy For Science Foreign...

    • publicschoolreview.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review, Distribution of Students Across Grade Levels in Academy For Science Foreign Language [Dataset]. https://www.publicschoolreview.com/academy-for-science-foreign-language-profile
    Explore at:
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset tracks annual distribution of students across grade levels in Academy For Science Foreign Language

  16. a

    The distribution of languages and language families in the North

    • catalogue.arctic-sdi.org
    Updated Aug 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). The distribution of languages and language families in the North [Dataset]. https://catalogue.arctic-sdi.org/geonetwork/srv/search?keyword=inhabited
    Explore at:
    Dataset updated
    Aug 23, 2024
    Description

    The North is inhabited by an array of peoples with different cultures and language groupings. For this report, information was compiled on 89 northern languages which accounts for a little more than 1% of the worlds living languages3. These can be grouped into six distinct language families plus three isolated languages presently unconnected to any other language grouping (Fig. 20.1). Conservation of Arctic Flora and Fauna, CAFF 2013 - Akureyri . Arctic Biodiversity Assessment. Status and Trends in Arctic biodiversity. - Linguistic Diversity (Chapter 20) page 656

  17. p

    Distribution of Students Across Grade Levels in North Academy Of World...

    • publicschoolreview.com
    Updated Oct 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2024). Distribution of Students Across Grade Levels in North Academy Of World Languages [Dataset]. https://www.publicschoolreview.com/north-academy-of-world-languages-profile
    Explore at:
    Dataset updated
    Oct 6, 2024
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset tracks annual distribution of students across grade levels in North Academy Of World Languages

  18. p

    Distribution of Students Across Grade Levels in Robert Randall World...

    • publicschoolreview.com
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2025). Distribution of Students Across Grade Levels in Robert Randall World Languages [Dataset]. https://www.publicschoolreview.com/robert-randall-world-languages-profile
    Explore at:
    Dataset updated
    Jan 21, 2025
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    This dataset tracks annual distribution of students across grade levels in Robert Randall World Languages

  19. E

    GlobalPhone Chinese-Mandarin

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2017). GlobalPhone Chinese-Mandarin [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0193/
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    The GlobalPhone corpus developed in collaboration with the Karlsruhe Institute of Technology (KIT) was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, multilingual speech and text database for language independent and language adaptive speech recognition as well as for language identification tasks. The entire GlobalPhone corpus enables the acquisition of acoustic-phonetic knowledge of the following 22 spoken languages: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), Chinese-Shanghai (ELRA-S0194), Croatian (ELRA-S0195), Czech (ELRA-S0196), French (ELRA-S0197), German (ELRA-S0198), Hausa (ELRA-S0347), Japanese (ELRA-S0199), Korean (ELRA-S0200), Polish (ELRA-S0320), Portuguese (Brazilian) (ELRA-S0201), Russian (ELRA-S0202), Spanish (Latin America) (ELRA-S0203), Swahili (ELRA-S0375), Swedish (ELRA-S0204), Tamil (ELRA-S0205), Thai (ELRA-S0321), Turkish (ELRA-S0206), Ukrainian (ELRA-S0377), and Vietnamese (ELRA-S0322).In each language about 100 sentences were read from each of the 100 speakers. The read texts were selected from national newspapers available via Internet to provide a large vocabulary. The read articles cover national and international political news as well as economic news. The speech is available in 16bit, 16kHz mono quality, recorded with a close-speaking microphone (Sennheiser 440-6). The transcriptions are internally validated and supplemented by special markers for spontaneous effects like stuttering, false starts, and non-verbal effects like laughing and hesitations. Speaker information like age, gender, occupation, etc. as well as information about the recording setup complement the database. The entire GlobalPhone corpus contains over 450 hours of speech spoken by more than 2100 native adult speakers.Data is shortened by means of the shorten program written by Tony Robinson. Alternatively, the data could be delivered unshorten.The Chinese-Mandarin corpus was produced using the Peoples Daily newspaper. It contains recordings of 132 speakers (64 males, 68 females) recorded in Beijing, Wuhan and Hekou, China. The following age distribution has been obtained: 16 speakers are below 19, 96 speakers are between 20 and 29, 16 speakers are between 30 and 39, 3 speakers are between 40 and 49 (1 speaker age is unknown).

  20. Language distribution of songs in the weekly Spotify charts in France...

    • statista.com
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Language distribution of songs in the weekly Spotify charts in France 2017-2020 [Dataset]. https://www.statista.com/statistics/1294318/distribution-languages-song-weekly-spotify-charts-france/
    Explore at:
    Dataset updated
    Jul 18, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2017 - Nov 2020
    Area covered
    France
    Description

    The songs appearing in the weekly Spotify charts in France from 2017 to 2020 are by vast majority sung either in English or in French, while Spanish-speaking songs only play a minor role there (************ percent). According to the source, songs sung in French have gradually been taking over the spots from those in English within the weekly Spotify charts. Wheareas English and French are almost equally distributed in 2017, French-speaking songs have gained a share of around ** percent by 2020, compared to English-speaking songs with ** percent.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Organization logo

The most spoken languages worldwide 2025

Explore at:
419 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description

In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

Search
Clear search
Close search
Google apps
Main menu