44 datasets found
  1. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  2. Common languages used for web content 2025, by share of websites

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  3. Ranking of languages spoken at home in the U.S. 2023

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Ranking of languages spoken at home in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.

  4. MCB_languages_county

    • kaggle.com
    Updated Oct 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marisol Brewster (2019). MCB_languages_county [Dataset]. https://www.kaggle.com/mcbrewster/mcb-languages-county/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Marisol Brewster
    Description

    Context

    This is a dataset I found online through the Google Dataset Search portal.

    Content

    The American Community Survey (ACS) 2009-2013 multi-year data are used to list all languages spoken in the United States that were reported during the sample period. These tables provide detailed counts of many more languages than the 39 languages and language groups that are published annually as a part of the routine ACS data release. This is the second tabulation beyond 39 languages since ACS began.

    The tables include all languages that were reported in each geography during the 2009 to 2013 sampling period. For the purpose of tabulation, reported languages are classified in one of 380 possible languages or language groups. Because the data are a sample of the total population, there may be languages spoken that are not reported, either because the ACS did not sample the households where those languages are spoken, or because the person filling out the survey did not report the language or reported another language instead.

    The tables also provide information about self-reported English-speaking ability. Respondents who reported speaking a language other than English were asked to indicate their ability to speak English in one of the following categories: "Very well," "Well," "Not well," or "Not at all." The data on ability to speak English represent the person’s own perception about his or her own ability or, because ACS questionnaires are usually completed by one household member, the responses may represent the perception of another household member.

    These tables are also available through the Census Bureau's application programming interface (API). Please see the developers page for additional details on how to use the API to access these data.

    Acknowledgements

    Sources:

    Google Dataset Search: https://toolbox.google.com/datasetsearch

    2009-2013 American Community Survey

    Original dataset: https://www.census.gov/data/tables/2013/demo/2009-2013-lang-tables.html

    Downloaded From: https://data.world/kvaughn/languages-county

    Banner and thumbnail photo by Farzad Mohsenvand on Unsplash

  5. Language Named Authority List

    • data.europa.eu
    rdf xml, xml, zip
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Publications Office of the European Union (2024). Language Named Authority List [Dataset]. https://data.europa.eu/data/datasets/language?locale=en
    Explore at:
    xml, rdf xml, zipAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    Publications Office of the European Unionhttp://op.europa.eu/
    European Union-
    Authors
    Publications Office of the European Union
    License

    http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj

    Description

    Language is a controlled vocabulary that lists world languages and language varieties, including sign languages. Its main purpose is to support activities associated with the publication process. The full set of languages contains more than 8000 language varieties, each identified by a code equivalent to the ISO 639-3 code. Concepts are aligned with the ISO 639 international standard, which is issued in several parts: ISO 639-1 contains strictly two alphabetic letters (alpha-2), ISO 639-2/B (B = bibliographic) is used for bibliographic purpose (alpha-3), ISO 639-2/T (T = terminology) is used for technical purpose (alpha-3), ISO 639-3 covers all the languages and macro-languages of the world (alpha-3); the values are compliant with ISO 639-2/T. If an authority code is needed for a language without an assigned ISO code, an alphanumeric code is created to avoid confusion with the strictly alphabetic ISO codes. Labels are provided in all 24 official EU languages for the most frequently used languages. Language is under governance of the Interinstitutional Metadata and Formats Committee (IMFC). It is maintained by the Publications Office of the European Union and disseminated on the EU Vocabularies website. It is a corporate reference data asset covered by the Corporate Reference Data Management policy of the European Commission.

  6. Most used programming languages among developers worldwide 2024

    • statista.com
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most used programming languages among developers worldwide 2024 [Dataset]. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/
    Explore at:
    Dataset updated
    Feb 6, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 19, 2024 - Jun 20, 2024
    Area covered
    Worldwide
    Description

    As of 2024, JavaScript and HTML/CSS were the most commonly used programming languages among software developers around the world, with more than 62 percent of respondents stating that they used JavaScript and just around 53 percent using HTML/CSS. Python, SQL, and TypeScript rounded out the top five most widely used programming languages around the world. Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.

  7. Number of native Spanish speakers worldwide 2024, by country

    • statista.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.

  8. A

    ‘Languages spoken across various nations’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Languages spoken across various nations’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-languages-spoken-across-various-nations-a8e8/latest
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Languages spoken across various nations’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shubhamptrivedi/languages-spoken-across-various-nations on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    I was fascinated by this type of data as this gives a slight peek on cultural diversity of a nation and what kind of literary work to be expected from that nation

    Content

    This dataset is a collection of all the languages that are spoken by the different nations around the world. Nowadays, Most nations are bi or even trilingual in nature this can be due to different cultures and different groups of people are living in the same nation in harmony. This type of data can be very useful for linguistic research, market research, advertising purposes, and the list goes on.

    Acknowledgements

    This dataset was published on the site Infoplease which is a general information website.

    Inspiration

    I think this dataset can be useful to understand which type of literature publication can be done for maximum penetration of the market base

    --- Original source retains full ownership of the source dataset ---

  9. f

    Numbers of certain historical figures for top 100 list of each language: N1...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Ho Eom; Pablo Aragón; David Laniado; Andreas Kaltenbrunner; Sebastiano Vigna; Dima L. Shepelyansky (2023). Numbers of certain historical figures for top 100 list of each language: N1 is the number of historical figures of a given language among the top 100 PageRank global historical figures; N2 is the number of historical figures of a given language among the top 100 PageRank historical figures for the given language edition; N3 is the number of historical figures of a given language among the top 100 2DRank global historical figures; N4 is the number of historical figures of a given language among the top 100 2DRank historical figures for the given language edition. [Dataset]. http://doi.org/10.1371/journal.pone.0114825.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Young-Ho Eom; Pablo Aragón; David Laniado; Andreas Kaltenbrunner; Sebastiano Vigna; Dima L. Shepelyansky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Numbers of certain historical figures for top 100 list of each language: N1 is the number of historical figures of a given language among the top 100 PageRank global historical figures; N2 is the number of historical figures of a given language among the top 100 PageRank historical figures for the given language edition; N3 is the number of historical figures of a given language among the top 100 2DRank global historical figures; N4 is the number of historical figures of a given language among the top 100 2DRank historical figures for the given language edition.

  10. The most linguistically diverse countries worldwide 2025, by number of...

    • statista.com
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most linguistically diverse countries worldwide 2025, by number of languages [Dataset]. https://www.statista.com/statistics/1224629/the-most-linguistically-diverse-countries-worldwide-by-number-of-languages/
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    Papua New Guinea is the most linguistically diverse country in the world. As of 2025, it was home to 840 different languages. Indonesia ranked second with 709 languages spoken. In the United States, 335 languages were spoken in that same year.

  11. A

    ‘Extinct Languages’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Extinct Languages’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-extinct-languages-6686/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Extinct Languages’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/the-guardian/extinct-languages on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    A recent Guardian blog post asks: "How many endangered languages are there in the World and what are the chances they will die out completely?" The United Nations Education, Scientific and Cultural Organisation (UNESCO) regularly publishes a list of endangered languages, using a classification system that describes its danger (or completion) of extinction.

    Content

    The full detailed dataset includes names of languages, number of speakers, the names of countries where the language is still spoken, and the degree of endangerment. The UNESCO endangerment classification is as follows:

    • Vulnerable: most children speak the language, but it may be restricted to certain domains (e.g., home)
    • Definitely endangered: children no longer learn the language as a 'mother tongue' in the home
    • Severely endangered: language is spoken by grandparents and older generations; while the parent generation may understand it, they do not speak it to children or among themselves
    • Critically endangered: the youngest speakers are grandparents and older, and they speak the language partially and infrequently
    • Extinct: there are no speakers left

    Acknowledgements

    Data was originally organized and published by The Guardian, and can be accessed via this Datablog post.

    Inspiration

    • How can you best visualize this data?
    • Which rare languages are more isolated (Sicilian, for example) versus more spread out? Can you come up with a hypothesis for why that is the case?
    • Can you compare the number of rare speakers with more relatable figures? For example, are there more Romani speakers in the world than there are residents in a small city in the United States?

    --- Original source retains full ownership of the source dataset ---

  12. f

    List of the top 10 global female historical figures by PageRank and 2DRank...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Ho Eom; Pablo Aragón; David Laniado; Andreas Kaltenbrunner; Sebastiano Vigna; Dima L. Shepelyansky (2023). List of the top 10 global female historical figures by PageRank and 2DRank for all the 24 Wikipedia editions. [Dataset]. http://doi.org/10.1371/journal.pone.0114825.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Young-Ho Eom; Pablo Aragón; David Laniado; Andreas Kaltenbrunner; Sebastiano Vigna; Dima L. Shepelyansky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All names are represented by article titles in the English Wikipedia. Here, ΘA is the ranking score of the algorithm A (Eq.3); NA is the number of appearances of a given person in the top 100 rank for all editions. Here CC is the birth country code and LC is the language code of the given historical figure.

  13. f

    List of country code (CC), countries as birth places of historical figures,...

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Ho Eom; Pablo Aragón; David Laniado; Andreas Kaltenbrunner; Sebastiano Vigna; Dima L. Shepelyansky (2023). List of country code (CC), countries as birth places of historical figures, and language code (LC) for each country. [Dataset]. http://doi.org/10.1371/journal.pone.0114825.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Young-Ho Eom; Pablo Aragón; David Laniado; Andreas Kaltenbrunner; Sebastiano Vigna; Dima L. Shepelyansky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LC is determined by the most spoken language in the given country. Country codes are based on country codes of Internet top-level domains and language codes are based on language edition codes of Wikipedia; WR represents all languages other than the considered 24 languages.

  14. a

    Nigeria Language Areas

    • ebola-nga.opendata.arcgis.com
    Updated Dec 5, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Geospatial-Intelligence Agency (2014). Nigeria Language Areas [Dataset]. https://ebola-nga.opendata.arcgis.com/content/a8562de38b814219b331c7d49cc87ff4
    Explore at:
    Dataset updated
    Dec 5, 2014
    Dataset authored and provided by
    National Geospatial-Intelligence Agency
    Area covered
    Description

    There are over 500 known languages in Nigeria. While the official language is English, its use is largely confined to urban elites. The most commonly used languages are Hausa, Yoruba, Igbo (Ibo) and Fulfulde. Edo, Efik, Adamawa Fulfulde, Idoma, and Central Kanuri are also widely spoken. The area of greatest diversity is the ‘Middle Belt’, the band of territory stretching across the country between the large language blocs of the north and the south. The reason for this diversity remains unclear, but three of Africa's four language families meet in the Middle Belt of Nigeria. This has had sociolinguistic consequences where frequent conflicts have erupted between the culture and language of particular groups.

    ISO3 - International Organization for Standardization 3-digit country code

    LANG_FAM - Language family

    LANG_SUBGP - Language sub-family

    SOURCE_DT - Primary source creation date

    SOURCE - Primary source

    Collection

    This shapefile created by using Anthromapper consists of language layers that have been based on The World Language Mapping System (WLMS). Geographical terrain features, combined with a watershed model, were also used to predict the likely extent of ethnic and linguistic influence. The HGIS and metadata were supplemented with anthropological information from peer-reviewed journals and published books. The interpretation of names often produces multiple spellings of the same language; therefore similarly spelled or phonetic titles may be referencing the same language group.

    The data included herein have not been derived from a registered survey and should be considered approximate unless otherwise defined. While rigorous steps have been taken to ensure the quality of each dataset, DigitalGlobe Analytics is not responsible for the accuracy and completeness of data compiled from outside sources.

    Sources (HGIS)

    Anthromapper. DigitalGlobe Analytics, April 2013.

    World Language Mapping System (WLMS) Version 16. World GeoDatasets, April 2013.

    Sources (Metadata)

    Roger, Blench. "Position Paper: The Dimensions of Ethnicity, Language, and Culture in Nigeria." Last modified 2013. Accessed March 26, 2013. http://www.rogerblench.info.

    Roger, Blench. “The Status of the Languages of Central Nigeria.” Last modified 2013. Accessed March 26, 2013. http://www.rogerblench.info.

  15. j

    Japan Centre of Excellence (JACEEX)

    • jaceex.com
    html
    Updated Jul 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Japan Centre of Excellence (JACEEX) (2019). Japan Centre of Excellence (JACEEX) [Dataset]. https://www.jaceex.com/ssw
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jul 16, 2019
    Dataset provided by
    https://www.jaceex.com/
    Authors
    Japan Centre of Excellence (JACEEX)
    Area covered
    Description

    Japan Centre of Excellence (JACEEX), is a brand under Jaceex Ventures LLP. Jaceex has been formed with a vision to create a world class workforce with skill sets, work and business ethics, sincerity and devotion as well as other great positive traits found in the Japanese workforce which has been responsible for having built world class Enterprises. For the Indian Students and youths stepping into this world, our objective is to provide life changing opportunity in the form of skill and work in Japan Japan Centre of Excellence (JACEEX) provides an integrated course schedule of learning through exploration, scrutiny and self reflection. We are offering Japanese Language and Culture training-Basic, Intermediate and High Levels. Our training is designed to make the trainee eligible to certify themselves with the globally recognised Japanese Language Proficiency Test (JLPT) Examination . This will help in building careers with Japanese companies in Japan , in India and also self employment.We also have the facility of Virtual Live class platform

  16. Enrollment numbers in language training Spain 2005 to 2023

    • statista.com
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Enrollment numbers in language training Spain 2005 to 2023 [Dataset]. https://www.statista.com/statistics/459491/enrollment-numbers-in-language-training-spain/
    Explore at:
    Dataset updated
    Jan 22, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Spain
    Description

    The number of enrollments in language schools in Spain reveals that Spaniards are well aware of the importance of foreign languages in modern times. During the 2022/23 academic year, almost 331,000 people were registered at the Spanish language schools to add a new language to their curricula. In a globalized world, languages are taking a much more important role on the job market. The most studied and spoken languages in the world include English, Mandarin, Hindi or Spanish.

    The importance of language knowledge in the job market Enrollment numbers at language schools come as no surprise considering that foreign languages have become a vital asset for job seekers in the last years. English, par excellence the most used language for international affairs, unsurprisingly ranked first on the list of most valued languages on the Spanish job market, with approximately 65.2 of job openings that require foreign language skills demanding this one. Far from that stood French, with 17.38 percent of the job openings.

    Languages in the Spanish multimedia scene Most of the best selling albums Spain during 2022 were recorded in the country’s main language Spanish, with 38 albums in the top 50. As for videogames, 96 percent of the games produced in the country had English as a language option. Spanish was the second most used language, being present in 91 percent of productions.

  17. Data from: Knowledge from non-English-language studies broadens...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Filipe Serrano; Valentina Marconi; Stefanie Deinet; Hannah Puleston; Helga Correa; Juan C. Díaz-Ricaurte; Carolina Farhat; Ricardo Luria-Manzano; Marcio Martins; Eletra Souza; Sergio Souza; Joao Vieira-Alencar; Paula Valdujo; Robin Freeman; Louise McRae (2025). Knowledge from non-English-language studies broadens contributions to conservation policy and helps to tackle bias in biodiversity data [Dataset]. http://doi.org/10.5061/dryad.ngf1vhj68
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset provided by
    Instituto Salva Silvestres
    Universidade Federal do ABC
    Zoological Society of London
    The Biodiversity Consultancy
    University of the Amazon
    Universidade de São Paulo
    WWF Brazil
    Authors
    Filipe Serrano; Valentina Marconi; Stefanie Deinet; Hannah Puleston; Helga Correa; Juan C. Díaz-Ricaurte; Carolina Farhat; Ricardo Luria-Manzano; Marcio Martins; Eletra Souza; Sergio Souza; Joao Vieira-Alencar; Paula Valdujo; Robin Freeman; Louise McRae
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Local ecological evidence is key to informing conservation. However, many global biodiversity indicators often neglect local ecological evidence published in languages other than English, potentially biassing our understanding of biodiversity trends in areas where English is not the dominant language. Brazil is a megadiverse country with a thriving national scientific publishing landscape. Here, using Brazil and a species abundance indicator as examples, we assess how well bilingual literature searches can both improve data coverage for a country where English is not the primary language and help tackle biases in biodiversity datasets. We conducted a comprehensive screening of articles containing abundance data for vertebrates published in 59 Brazilian journals (articles in Portuguese or English) and 79 international English-only journals. These were grouped into three datasets according to journal origin and article language (Brazilian-Portuguese, Brazilian-English and International). We analysed the taxonomic, spatial and temporal coverage of the datasets, compared their average abundance trends and investigated predictors of such trends with a modelling approach. Our results showed that including data published in Brazilian journals, especially those in Portuguese, strongly increased representation of Brazilian vertebrate species (by 10.1 times) and populations (by 7.6 times) in the dataset. Meanwhile, international journals featured a higher proportion of threatened species. There were no marked differences in spatial or temporal coverage between datasets, in spite of different bias towards infrastructures. Overall, while country-level trends in relative abundance did not substantially change with the addition of data from Brazilian journals, uncertainty considerably decreased. We found that population trends in international journals showed stronger and more frequent decreases in average abundance than those in national journals, regardless of whether the latter were published in Portuguese or English. Policy implications. Collecting data from local sources markedly further strengthens global biodiversity databases by adding species not previously included in international datasets. Furthermore, the addition of these data helps to understand spatial and temporal biases that potentially influence abundance trends at both national and global level. We show how incorporating non-English-language studies in global databases and indicators could provide a more complete understanding of biodiversity trends and therefore better inform global conservation policy. Methods Data collection We collected time-series of vertebrate population abundance suitable for entry into the LPD (livingplanetindex.org), which provides the repository for one of the indicators in the GBF, the Living Planet Index (LPI, Ledger et al., 2023). Despite the continuous addition of new data, LPI coverage remains incomplete for some regions (Living Planet Report 2024 – A System in Peril, 2024). We collected data from three sets of sources: a) Portuguese-language articles from Brazilian journals (hereafter “Brazilian-Portuguese” dataset), b) English-language articles from Brazilian journals (“Brazilian-English” dataset) and c) English-language articles from non-Brazilian journals (“International” dataset). For a) and b), we first compiled a list of Brazilian biodiversity-related journals using the list of non-English-language journals in ecology and conservation published by the translatE project (www.translatesciences.com) as a starting point. The International dataset was obtained from the LPD team and sourced from the 78 journals they routinely monitor as part of their ongoing data searches. We excluded journals whose scope was not relevant to our work (e.g. those focusing on agroforestry or crop science), and taxon-specific journals (e.g. South American Journal of Herpetology) since they could introduce taxonomic bias to the data collection process. We considered only articles published between 1990 and 2015, and thus further excluded journals that published articles exclusively outside of this timeframe. We chose this period because of higher data availability (Deinet et al., 2024), since less monitoring took place in earlier decades, and data availability for the last decade is also not as high as there is a lag between data being collected and trends becoming available in the literature. Finally, we excluded any journals that had inactive links or that were no longer available online. While we acknowledge that biodiversity data are available from a wider range of sources (grey literature, online databases, university theses etc.), here we limited our searches to peer-reviewed journals and articles published within a specific timeframe to standardise data collection and allow for comparison between datasets. We screened a total of 59 Brazilian journals; of these, nine accept articles only in English, 13 only in Portuguese and 37 in both languages. We systematically checked all articles of all issues published between 1990 and 2015. Articles that appeared to contain abundance data for vertebrate species based on title and/or abstract were further evaluated by reading the material and methods section. For an article to be included in our dataset, we followed the criteria applied for inclusion into the LPD (livingplanetindex.org/about_index#data): a) data must have been collected using comparable methods for at least two years for the same population, and b) units must be of population size, either a direct measure such as population counts or densities, or indices, or a reliable proxy such as breeding pairs, capture per unit effort or measures of biomass for a single species (e.g. fish data are often available in one of the latter two formats). Assessing search effectiveness and dataset representation We calculated the encounter rate of relevant articles (i.e. those that satisfied the criteria for inclusion in our datasets) for each journal as the proportion of such articles relative to the total number of articles screened for that journal. We assessed the taxonomic representation of each dataset by calculating the percentage of species of each vertebrate group (all fishes combined, amphibians, reptiles, birds and mammals) with relevant abundance data in relation to the number of species of these groups known to occur in Brazil. The total number of known species for each taxon was compiled from national-level sources (amphibians, Segalla et al. 2021; birds, (Pacheco et al., 2021); mammals, Abreu et al. 2022; reptiles, Costa, Guedes and Bérnils, 2022) or through online databases (Fishbase, Froese and Pauly, 2024). We calculated accumulation curves using 1,000 permutations and applying the rarefaction method, using the vegan package (Jari Oksanen et al., 2024). These represent the cumulative number of new species added with each article containing relevant data, allowing us to assess how additional data collection could increase coverage of abundance data across datasets. To compare species threat status among datasets, we used the category for each species available in the Brazilian (‘Sistema de Avaliação do Risco de Extinção da Biodiversidade – SALVE’, 2024) and IUCN Red List (IUCN, 2024), and calculated the percentage of species in each category per dataset. To assess and compare the temporal coverage of the different datasets, we calculated the number of populations and species across time. To assess geographic gaps, we mapped the locations of each population using QGIS version 3.6 (QGIS Development Team, 2019). We then quantified the bias of terrestrial records towards proximity to infrastructures (airports, cities, roads and waterbodies) at a 0.5º resolution (circa 55.5 km x 55.5 km at the equator) and a 2º buffer using posterior weights from the R package sampbias (Zizka, Antonelli and Silvestro, 2021). Higher posterior weights indicate stronger bias effect. Generalised linear mixed models and population abundance trends We used the rlpi R package (Freeman et al., 2017) to calculate trends in relative abundance. We calculated the average lambda (logged annual rate of change) for each time-series by averaging the lambda values across all years between the start and the end year of the time-series. We then built generalised linear mixed models (GLMM) to test how average lambdas changed across language (Portuguese vs English), journal origin (national vs international), and taxonomic group, using location, journal name, and species as random intercepts (Table 1). We offset these by the number of sampled years to adjust summed lambda to a standardised measure, to allow comparison across different observations with different length of time series and plotted the beta coefficients (effect sizes) of all factors. Finally, we performed a post-hoc test to check pairwise differences between taxonomic groups (Table S2). To assess the influence of national-level data on global trends in relative abundance, we calculated the trends for both the International dataset and the two combined Brazilian datasets (Brazilian-Portuguese and Brazilian-English), using only years for which data were available for more than one species, to be able to estimate trend variation. We also plotted the trends for the Brazilian datasets separately. All analyses were performed in R 4.4.1 (R Core Team, 2024).

  18. e

    List of Czech Exonyms

    • data.europa.eu
    Updated Sep 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). List of Czech Exonyms [Dataset]. https://data.europa.eu/data/datasets/cz-cuzk-exonyma-t
    Explore at:
    Dataset updated
    Sep 20, 2020
    Description

    The publication focuses on geographical names outside the Czech Republic, endonyms and Czech exonyms. Due to the range, only the most commonly used names were chosen. The 3rd edition (2019) substantially corrects previous editions and includes a supplemented or corrected range of geographical names from the whole world. The most important change is that endonyms are stated not only in Latin script but also in the original script used in each country. The names are stated in the official language of the territory and are linked to the standardized Czech geographical names (exonyms). Information about the origin of the names are added selectively. The publication also includes additions and revisions of the publication Czech Names of the Seas and International Territories. On 212 pages we can find 2696 Czech exonyms and their endonyms. The publication as well contains a glossary of names of states and a glossary of codes of languages according to corresponding ISO standards. ISBN 978-80-88197-20-1.

  19. f

    List of global historical figures by PageRank and 2DRank for all 24...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Ho Eom; Pablo Aragón; David Laniado; Andreas Kaltenbrunner; Sebastiano Vigna; Dima L. Shepelyansky (2023). List of global historical figures by PageRank and 2DRank for all 24 Wikipedia editions. [Dataset]. http://doi.org/10.1371/journal.pone.0114825.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Young-Ho Eom; Pablo Aragón; David Laniado; Andreas Kaltenbrunner; Sebastiano Vigna; Dima L. Shepelyansky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All names are represented by the corresponding article titles in the English Wikipedia. Here, ΘA is the ranking score of algorithm A (3); NA is the number of appearances of a given person in the top 100 rank for all editions.

  20. Share of U.S. population speaking a language besides English at home 2023,...

    • statista.com
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of U.S. population speaking a language besides English at home 2023, by state [Dataset]. https://www.statista.com/statistics/312940/share-of-us-population-speaking-a-language-other-than-english-at-home-by-state/
    Explore at:
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    As of 2023, more than ** percent of people in the United States spoke a language other than English at home. California had the highest share among all U.S. states, with ** percent of its population speaking a language other than English at home.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Organization logo

The most spoken languages worldwide 2025

Explore at:
429 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description

In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

Search
Clear search
Close search
Google apps
Main menu