56 datasets found
  1. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  2. Ranking of languages spoken at home in the U.S. 2023

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ranking of languages spoken at home in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.

  3. MCB_languages_county

    • kaggle.com
    Updated Oct 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marisol Brewster (2019). MCB_languages_county [Dataset]. https://www.kaggle.com/mcbrewster/mcb-languages-county/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Marisol Brewster
    Description

    Context

    This is a dataset I found online through the Google Dataset Search portal.

    Content

    The American Community Survey (ACS) 2009-2013 multi-year data are used to list all languages spoken in the United States that were reported during the sample period. These tables provide detailed counts of many more languages than the 39 languages and language groups that are published annually as a part of the routine ACS data release. This is the second tabulation beyond 39 languages since ACS began.

    The tables include all languages that were reported in each geography during the 2009 to 2013 sampling period. For the purpose of tabulation, reported languages are classified in one of 380 possible languages or language groups. Because the data are a sample of the total population, there may be languages spoken that are not reported, either because the ACS did not sample the households where those languages are spoken, or because the person filling out the survey did not report the language or reported another language instead.

    The tables also provide information about self-reported English-speaking ability. Respondents who reported speaking a language other than English were asked to indicate their ability to speak English in one of the following categories: "Very well," "Well," "Not well," or "Not at all." The data on ability to speak English represent the person’s own perception about his or her own ability or, because ACS questionnaires are usually completed by one household member, the responses may represent the perception of another household member.

    These tables are also available through the Census Bureau's application programming interface (API). Please see the developers page for additional details on how to use the API to access these data.

    Acknowledgements

    Sources:

    Google Dataset Search: https://toolbox.google.com/datasetsearch

    2009-2013 American Community Survey

    Original dataset: https://www.census.gov/data/tables/2013/demo/2009-2013-lang-tables.html

    Downloaded From: https://data.world/kvaughn/languages-county

    Banner and thumbnail photo by Farzad Mohsenvand on Unsplash

  4. The most linguistically diverse countries worldwide 2025, by number of...

    • statista.com
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most linguistically diverse countries worldwide 2025, by number of languages [Dataset]. https://www.statista.com/statistics/1224629/the-most-linguistically-diverse-countries-worldwide-by-number-of-languages/
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    Papua New Guinea is the most linguistically diverse country in the world. As of 2025, it was home to 840 different languages. Indonesia ranked second with 709 languages spoken. In the United States, 335 languages were spoken in that same year.

  5. Script 1 to 9 and necessary data to run them

    • figshare.com
    txt
    Updated Sep 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Boissonneault (2023). Script 1 to 9 and necessary data to run them [Dataset]. http://doi.org/10.6084/m9.figshare.24117273.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 11, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Michael Boissonneault
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R scripts (numbered from 1 to 9) to prepare data, perform calculations, and run analyses for the projection of speaker numbers for 27 Indigenous languages of Canada between the years 2001 and 2101. Contains data on first language collected during the censuses of 2001, 2006, 2011, 2016, and 2021 provided by Statistics Canada.Contains fertility and mortality schedules taken from the 2022 World Population Prospects (UN). Contains other data files that were produced from the data and calculations described above.

  6. Number of native Spanish speakers worldwide 2024, by country

    • statista.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.

  7. f

    Statistics of the Languages spoken in South Africa. For each language, we...

    • plos.figshare.com
    xls
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koena Ronny Mabokela; Mpho Primus; Turgay Celik (2025). Statistics of the Languages spoken in South Africa. For each language, we report the ISO, the African subfamily, and the prevalent countries where the language is also spoken. [Dataset]. http://doi.org/10.1371/journal.pone.0325102.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Koena Ronny Mabokela; Mpho Primus; Turgay Celik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Africa, Africa
    Description

    Statistics of the Languages spoken in South Africa. For each language, we report the ISO, the African subfamily, and the prevalent countries where the language is also spoken.

  8. f

    Parameters characterizing the distribution of population size and area...

    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Susanna C. Manrubia; Jacob B. Axelsen; Damián H. Zanette (2023). Parameters characterizing the distribution of population size and area covered by human languages in several representative world regions. [Dataset]. http://doi.org/10.1371/journal.pone.0040137.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Susanna C. Manrubia; Jacob B. Axelsen; Damián H. Zanette
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    Columns represent the following data: and are the (logarithmic) averages of population size and area; corresponds to the slope of the major ellipse axis relating and , and measures their degree of correlation. Errors in both variables are shown. The values of and correspond to model parameters yielding the measured values of and within the estimated interval.

  9. d

    Human cultural Diversity - A Cross-national data set

    • search.dataone.org
    • knb.ecoinformatics.org
    Updated Aug 14, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael E. Hochberg; National Center for Ecological Analysis and Synthesis; Howard Cornell; Daniel Nettle; NCEAS 6640: Hochberg: HumanSocialBehavior; Jean-François Guégan; Marc Choisy (2015). Human cultural Diversity - A Cross-national data set [Dataset]. http://doi.org/10.5063/AA/bowdish.246.10
    Explore at:
    Dataset updated
    Aug 14, 2015
    Dataset provided by
    Knowledge Network for Biocomplexity
    Authors
    Michael E. Hochberg; National Center for Ecological Analysis and Synthesis; Howard Cornell; Daniel Nettle; NCEAS 6640: Hochberg: HumanSocialBehavior; Jean-François Guégan; Marc Choisy
    Variables measured
    CPI, GDP, SWB, Area, GDP2, Gini, Area2, Gini2, Trust, CivLib, and 50 more
    Description

    A cross-national data set of 21 variables was assembled for 212 countries from three sources (Barro and Lee 1994; Gordon 2005; CIA World Fact Book 2005). Our data set includes several proxy measures for national wealth, cultural diversity, social instability (both at national and international levels), and demography. Separate diversity measures were calculated for three different cultural domains, namely language, religion and ethnic groups . In addition, wealth variables (per capita GDP, and GINI, the coefficient of income inequality) were assembled, along with indicators of societal functioning drawn from the literature (especially Barro and Lee 1994), including indices of political rights (PRIGHTSB), revolutions and coups d'états (REVCOUP), and political instability (PINSTAB). Measures of international conflict were extracted from the social science literature, and the following were used: the proportion of the time between 1960-85 the country was involved in an external war (WARTIME), the number of international disputes in which the country was involved (TOTINTDISP), and an index of total military expenditure (TOTMILITEXP). Possible confounding variables such as population size (POPSIZE) and the number of international borders (NBINTBORDERS) were also included.

  10. Common languages used for web content 2025, by share of websites

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  11. Bilingual Education for Children Market Report | Global Forecast From 2025...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Bilingual Education for Children Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-bilingual-education-for-children-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Bilingual Education for Children Market Outlook




    The global market size for Bilingual Education for Children is projected to grow significantly from $5.2 billion in 2023 to approximately $12.5 billion by 2032, boasting a Compound Annual Growth Rate (CAGR) of 10%. Several factors contribute to this robust growth, including increasing globalization, a heightened emphasis on multicultural competencies, and the growing recognition of the cognitive benefits associated with bilingualism.




    One of the primary growth drivers for the bilingual education market is the increasing globalization and interconnectivity of economies. As businesses expand across borders and as migration rates rise, the demand for multilingual capabilities among the younger population has surged. Parents and educators alike recognize the importance of equipping children with the skills necessary to thrive in a globalized world, where being fluent in multiple languages can provide a competitive edge in the job market and open up a plethora of opportunities.




    Moreover, an increasing body of research highlighting the cognitive benefits of bilingualism is fostering greater acceptance and enthusiasm for bilingual education. Studies have shown that bilingual children tend to have better problem-solving skills, improved memory, and enhanced cognitive flexibility compared to their monolingual peers. These cognitive advantages, coupled with the cultural enrichment that comes from being proficient in more than one language, are driving parents to seek bilingual education programs for their children.




    Government policies and educational reforms in various countries are also contributing significantly to the growth of the bilingual education market. Many nations are recognizing the importance of bilingualism and are incorporating language learning into their educational curriculums. For instance, the European Union has a long-standing policy of multilingualism, encouraging citizens to learn at least two foreign languages. Similarly, countries like Canada and the United States have various state and federal programs that support bilingual education in public schools.




    Regionally, North America and Europe are leading the market, attributing to their diverse populations and strong emphasis on multicultural education. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period. This surge can be attributed to the rising middle-class population, increased investments in education, and a growing emphasis on English proficiency, which is seen as a gateway to global opportunities.



    In the context of bilingual education, English Picture Books for Children play a crucial role in facilitating language acquisition and literacy development. These books serve as an engaging tool for young learners, combining visual storytelling with simple text to enhance comprehension and retention. By integrating picture books into bilingual programs, educators can create a more immersive and enjoyable learning experience for children. The use of picture books not only aids in vocabulary building but also introduces cultural narratives, helping children connect with diverse perspectives. As the demand for bilingual education grows, the incorporation of English picture books becomes increasingly significant, offering a bridge between languages and fostering a love for reading among children.



    Program Type Analysis




    The Program Type segment in the bilingual education for children market is divided into various subcategories, including Dual Language Immersion, Transitional Bilingual Education, Two-Way Bilingual Education, and Others. Dual Language Immersion programs have been particularly popular due to their balanced approach, where students are taught in two languages for an equal amount of time. This method not only fosters language proficiency but also ensures that content learning is not compromised. Schools and institutions are increasingly adopting this approach, which is reflected in the growing investments and enrollments in Dual Language Immersion programs.




    Transitional Bilingual Education is another significant sub-segment that aims to provide students with the necessary skills to transition from their native language to the target language, usually the languag

  12. Data from: Projected speaker numbers and dormancy risks of Canada's...

    • zenodo.org
    bin, csv
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Boissonneault; Michael Boissonneault (2024). Projected speaker numbers and dormancy risks of Canada's Indigenous languages [Dataset]. http://doi.org/10.5281/zenodo.14267791
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Boissonneault; Michael Boissonneault
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Canada
    Description

    Data and code for the paper "Projected speaker numbers and dormancy risks of Canada’s Indigenous languages".

    Contain speaker numbers by age and language (Indigenous mother tongue, unique responses).

    There is one file per year (2001, 2006, 2011, 2016, 2021).

    Data were provided by Statistics Canada.

    These include:

    - indigenousmothertongue2001.csv
    - indigenousmothertongue2006.csv
    - indigenousmothertongue2011.csv
    - indigenousmothertongue2016.csv
    - indigenousmothertongue2021.csv

    Additionally, the file 'coordinates.xlsx' contains the geographic coordinates necessary for Fig. 1. Information comes from Ethnologue with modifications.

    Also included is the life table information produced by World Population Prospects 2024 (wpp 2024 files) available at https://population.un.org/wpp/Download/Standard/Mortality/. These are provided here for convenience as well as to prevent updates by the WPP.

    Also contains the whole R code to produce the results described in the paper (RevisedScript_ProjectCanIndigLangs_Final.R).

  13. Gallup World Poll

    • stanford.redivis.com
    • redivis.com
    application/jsonl +7
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford University Libraries (2025). Gallup World Poll [Dataset]. http://doi.org/10.57761/xkms-eq09
    Explore at:
    parquet, sas, stata, arrow, spss, application/jsonl, avro, csvAvailable download formats
    Dataset updated
    Jul 10, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford University Libraries
    Area covered
    World
    Description

    Abstract

    Gallup’s World Poll continually surveys residents in more than 150 countries and areas, representing more than 98% of the world’s adult population, using randomly selected, nationally representative samples. Gallup typically surveys 1,000 individuals in each country or area, using a standard set of core questions that has been translated into the major languages of the respective country. In some regions, supplemental questions are asked in addition to core questions. Face-to-face interviews are approximately 1 hour, while telephone interviews are about 30 minutes. In many countries, the survey is conducted once per year, and fieldwork is generally completed in two to four weeks. The Country Dataset Details document displays each country’s sample size, month/year of the data collection, mode of interviewing, languages employed, design effect, margin of error and details about sample coverage.

    The data was last updated March 2025.

    Bulk Data Access

    Data access is required to view this section.

  14. a

    Languages of the Middle East

    • hub.arcgis.com
    Updated Mar 16, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MEMIROnline (2012). Languages of the Middle East [Dataset]. https://hub.arcgis.com/items/d29dbbe0ce4342ddae47c3b33bc0be94
    Explore at:
    Dataset updated
    Mar 16, 2012
    Dataset authored and provided by
    MEMIROnline
    Area covered
    Middle East,
    Description

    ​ “Middle East” LanguagesIndependent StudyFebruary 16, 2012Amanda DoyleCo-authors: Kevin Ragborg, Marc Puricelli, and Maria LindellDespite the relatively small geographical size of the “Middle East,” there is great diversity of the spoken languages within the region. The most common spoken language of the “Middle East” is Arabic, a Semitic language closely related to Hebrew that was developed beginning in the 8th century BC. Currently, around 280 million people speak Arabic in the regions of the “Middle East” and North Africa encompassing the countries between Morocco to Iraq. The Qur’an, the central religious text of Islam, is only allowed to be written in Arabic, giving the language a very important role in the Muslim world. Different from some other languages, there are many different dialects of Arabic, which can make it difficult for speakers from different areas of the Arabic speaking world to understand one another3. The next major language of the Middle East is Persian or Farsi, the national language of Iran. Persian is spoken by an estimated 65 million people, most of which are concentrated in Iran but there are significant Persian speaking populations in Afghanistan and the United Arab Emirates. Younger than Arabic, Persian was developed around 400 BC and is closely related to Hindi and Urdu. There are three main dialects of Persian: Iranian Persian (spoken in Iran), Dari Persian (spoken in Afghanistan) and Tajik Persian (spoken in Tajikistan.) 4Hebrew is spoken by roughly 3.8 million people in the “Middle East,” but this population is now concentrated in Israel and the neighboring countries. Though, not all Jews, even Israeli Jews, speak Hebrew since centuries ago, Hebrew ceased being a working language; however, due to Jewish nationalism, the Zionist movement, and the need for a unifying language between immigrants into Israel the language has been revived. Turkish, the national language of Turkey and the main spoken language of the Turkish nation is also spoken by roughly 170,000 people in Cyprus and by minorities in the Fertile Crescent area. Kurdish is the language that unifies the Kurds, a nation that spans a large geographical range from Beirut to Afghanistan. Additionally, almost all countries in the “Middle East” have several minority languages, such as Berber, spoken by many North Africans, including some parts of northwestern Egypt. Azeri, a minority Turkic language, is often spoken in northwestern Iran. Turkish tribes in the southern Zagros Mountains in Iran speak Qashqai, while Baluchi is spoken in southeastern and eastern Iran by the Baluch peoples and migrants in United Arab Emirates and Oman. Nomadic tribes in the Zagros Mountains can be found speaking Luri. Lastly, Armenian, due to its historical significance is spoken by minorities in urban centers such as Beirut, Damascus, Aleppo, Tehran, and Cairo1.Works Cited (1) Held, Colbert C. Middle East Patterns – Places, Peoples and Politics. 2nd ed. Westview Press, Inc.: Boulder, Co, 1994, pgs. 76-80.(2) The World Factbook. Central Intelligence Agency. 2011. https://www.cia.gov/library/publications/the-world-factbook/fields/2098.html?countryName=Jordan&countryCode=jo&regionCode=me&#jo.(3) "Learn Arabic - All About the Arabic Language." Innovative Language Learning. Web. 28 Mar. 2011. http://innovativelanguage.com/languagelearning/arabic-language.(4) UCLA, Language Materials Projects. "Persian Language." Iran Chamber Society. Web. 29 Mar. 2011. http://www.iranchamber.com/literature/articles/persian_language.php.

  15. i

    Global Financial Inclusion (Global Findex) Database 2017 - Lithuania

    • catalog.ihsn.org
    • microdata.worldbank.org
    Updated Dec 5, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Research Group, Finance and Private Sector Development Unit (2019). Global Financial Inclusion (Global Findex) Database 2017 - Lithuania [Dataset]. https://catalog.ihsn.org/catalog/8303
    Explore at:
    Dataset updated
    Dec 5, 2019
    Dataset authored and provided by
    Development Research Group, Finance and Private Sector Development Unit
    Time period covered
    2017
    Area covered
    Lithuania
    Description

    Abstract

    Financial inclusion is critical in reducing poverty and achieving inclusive economic growth. When people can participate in the financial system, they are better able to start and expand businesses, invest in their children’s education, and absorb financial shocks. Yet prior to 2011, little was known about the extent of financial inclusion and the degree to which such groups as the poor, women, and rural residents were excluded from formal financial systems.

    By collecting detailed indicators about how adults around the world manage their day-to-day finances, the Global Findex allows policy makers, researchers, businesses, and development practitioners to track how the use of financial services has changed over time. The database can also be used to identify gaps in access to the formal financial system and design policies to expand financial inclusion.

    Geographic coverage

    National coverage

    Analysis unit

    Individuals

    Universe

    The target population is the civilian, non-institutionalized population 15 years and above.

    Kind of data

    Observation data/ratings [obs]

    Sampling procedure

    The indicators in the 2017 Global Findex database are drawn from survey data covering almost 150,000 people in 144 economies-representing more than 97 percent of the world's population (see Table A.1 of the Global Findex Database 2017 Report for a list of the economies included). The survey was carried out over the 2017 calendar year by Gallup, Inc., as part of its Gallup World Poll, which since 2005 has annually conducted surveys of approximately 1,000 people in each of more than 160 economies and in over 150 languages, using randomly selected, nationally representative samples. The target population is the entire civilian, noninstitutionalized population age 15 and above. Interview procedure Surveys are conducted face to face in economies where telephone coverage represents less than 80 percent of the population or where this is the customary methodology. In most economies the fieldwork is completed in two to four weeks.

    In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used.

    Respondents are randomly selected within the selected households. Each eligible household member is listed and the handheld survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer's gender.

    In economies where telephone interviewing is employed, random digit dialing or a nationally representative list of phone numbers is used. In most economies where cell phone penetration is high, a dual sampling frame is used. Random selection of respondents is achieved by using either the latest birthday or household enumeration method. At least three attempts are made to reach a person in each household, spread over different days and times of day.

    The sample size was 1000.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup Inc. also provided valuable input. The questionnaire was piloted in multiple countries, using focus groups, cognitive interviews, and field testing. The questionnaire is available in more than 140 languages upon request.

    Questions on cash on delivery, saving using an informal savings club or person outside the family, domestic remittances, and agricultural payments are only asked in developing economies and few other selected countries. The question on mobile money accounts was only asked in economies that were part of the Mobile Money for the Unbanked (MMU) database of the GSMA at the time the interviews were being held.

    Sampling error estimates

    Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar, and Jake Hess. 2018. The Global Findex Database 2017: Measuring Financial Inclusion and the Fintech Revolution. Washington, DC: World Bank

  16. f

    Hyperparameters used for our models.

    • plos.figshare.com
    xls
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koena Ronny Mabokela; Mpho Primus; Turgay Celik (2025). Hyperparameters used for our models. [Dataset]. http://doi.org/10.1371/journal.pone.0325102.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Koena Ronny Mabokela; Mpho Primus; Turgay Celik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    While sentiment analysis systems excel in high-resource languages, most African languages facing limited resources, remain under-represented. This gap leaves a significant portion of the world’s population without access to technologies in their native languages. However, multilingual pre-trained language models (PLM) offer a promising approach for sentiment analysis in low-resource languages. Although the absence of large data in African languages poses a challenge for developing PLMs, fine-tuning and task adaptation of existing multilingual PLMs is an alternative solution. This paper explores the use of multilingual PLMs for sentiment analysis in five Southern African languages: Sepedi, Sesotho, Setswana, isiXhosa, and isiZulu. We leverage existing PLMs and fine-tune them for this specific task, avoiding training the models from scratch. Our work expands on the SAfriSenti corpus, a Twitter sentiment dataset for these languages. We employ various annotation techniques to create a labelled dataset and perform benchmark experiments utilising various multilingual PLMs. Our findings demonstrate the effectiveness of multilingual PLM, particularly for closely-related languages (Sotho-Tswana), where the ensemble PLMs method achieved an average weighted F1 score above 63%. In particular, Nguni closely-related languages achieved an even higher average weighted F1 score, exceeding 77%, highlighting the potential of PLMs for sentiment analysis in South African languages.

  17. Data from: GeoNames

    • data.wu.ac.at
    • huggingface.co
    zip
    Updated Oct 10, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Geospatial Data (2013). GeoNames [Dataset]. https://data.wu.ac.at/schema/datahub_io/MzE1MTQ4YWYtZmQyOC00ZWJjLTg3MDEtZWVkMDExNTE3MDA0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 10, 2013
    Dataset provided by
    Open Geospatial Consortiumhttps://www.ogc.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The geonames.org geographical database is available for download free of charge under a creative commons attribution license. It contains over eight million geographical names and consists of 6.3 million unique features whereof 2.2 million populated places and 1.8 million alternate names. All features are categorized into one out of nine feature classes and further subcategorized into one out of 645 feature codes. (more statistics ...).

    The data is accessible free of charge through a number of webservices and a daily database export. Geonames.org is already serving up to over 3 million web service requests per day.

    Geonames is integrating geographical data such as names of places in various languages, elevation, population and others from various sources. All lat/long coordinates are in WGS84 (World Geodetic System 1984). Users may manually edit, correct and add new names using a user friendly wiki interface.

    TODO

    This is a large dataset and there are a whole bunch of specially exported subsets of data at http://download.geonames.org/export/dump/ which it might be worth turning into separate datasets (or at least listing here in Resources).

    Linked Data

    Geonames locations are available as linked data, see dataset:geonames-semantic-web

  18. Spanish speakers in countries where Spanish is not an official language 2024...

    • statista.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Spanish speakers in countries where Spanish is not an official language 2024 [Dataset]. https://www.statista.com/statistics/1276290/number-spanish-speakers-non-hispanic-countries-worldwide/
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    The United States is the non-hispanic country with the largest number of native Spanish speakers in the world, with approximately 41.89 million people with a native command of the language in 2024. However, the European Union had the largest group of non-native speakers with limited proficiency of Spanish, at around 28 million people. Furthermore, Mexico is the country with the largest number of native Spanish speakers in the world as of 2024.

  19. Gallup World Poll 2013, June - Afghanistan, Angola, Albania...and 183 more

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    Updated Jun 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gallup, Inc. (2022). Gallup World Poll 2013, June - Afghanistan, Angola, Albania...and 183 more [Dataset]. https://catalog.ihsn.org/catalog/8494
    Explore at:
    Dataset updated
    Jun 14, 2022
    Dataset authored and provided by
    Gallup, Inc.http://gallup.com/
    Time period covered
    2005 - 2012
    Area covered
    Albania, Angola, Afghanistan
    Description

    Abstract

    Gallup Worldwide Research continually surveys residents in more than 150 countries, representing more than 98% of the world's adult population, using randomly selected, nationally representative samples. Gallup typically surveys 1,000 individuals in each country, using a standard set of core questions that has been translated into the major languages of the respective country. In some regions, supplemental questions are asked in addition to core questions. Face-to-face interviews are approximately 1 hour, while telephone interviews are about 30 minutes. In many countries, the survey is conducted once per year, and fieldwork is generally completed in two to four weeks. The Country Dataset Details spreadsheet displays each country's sample size, month/year of the data collection, mode of interviewing, languages employed, design effect, margin of error, and details about sample coverage.

    Gallup is entirely responsible for the management, design, and control of Gallup Worldwide Research. For the past 70 years, Gallup has been committed to the principle that accurately collecting and disseminating the opinions and aspirations of people around the globe is vital to understanding our world. Gallup's mission is to provide information in an objective, reliable, and scientifically grounded manner. Gallup is not associated with any political orientation, party, or advocacy group and does not accept partisan entities as clients. Any individual, institution, or governmental agency may access the Gallup Worldwide Research regardless of nationality. The identities of clients and all surveyed respondents will remain confidential.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    SAMPLING AND DATA COLLECTION METHODOLOGY With some exceptions, all samples are probability based and nationally representative of the resident population aged 15 and older. The coverage area is the entire country including rural areas, and the sampling frame represents the entire civilian, non-institutionalized, aged 15 and older population of the entire country. Exceptions include areas where the safety of interviewing staff is threatened, scarcely populated islands in some countries, and areas that interviewers can reach only by foot, animal, or small boat.

    Telephone surveys are used in countries where telephone coverage represents at least 80% of the population or is the customary survey methodology (see the Country Dataset Details for detailed information for each country). In Central and Eastern Europe, as well as in the developing world, including much of Latin America, the former Soviet Union countries, nearly all of Asia, the Middle East, and Africa, an area frame design is used for face-to-face interviewing.

    The typical Gallup Worldwide Research survey includes at least 1,000 surveys of individuals. In some countries, oversamples are collected in major cities or areas of special interest. Additionally, in some large countries, such as China and Russia, sample sizes of at least 2,000 are collected. Although rare, in some instances the sample size is between 500 and 1,000. See the Country Dataset Details for detailed information for each country.

    FACE-TO-FACE SURVEY DESIGN

    FIRST STAGE In countries where face-to-face surveys are conducted, the first stage of sampling is the identification of 100 to 135 ultimate clusters (Sampling Units), consisting of clusters of households. Sampling units are stratified by population size and or geography and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size, otherwise simple random sampling is used. Samples are drawn independent of any samples drawn for surveys conducted in previous years.

    There are two methods for sample stratification:

    METHOD 1: The sample is stratified into 100 to 125 ultimate clusters drawn proportional to the national population, using the following strata: 1) Areas with population of at least 1 million 2) Areas 500,000-999,999 3) Areas 100,000-499,999 4) Areas 50,000-99,999 5) Areas 10,000-49,999 6) Areas with less than 10,000

    The strata could include additional stratum to reflect populations that exceed 1 million as well as areas with populations less than 10,000. Worldwide Research Methodology and Codebook Copyright © 2008-2012 Gallup, Inc. All rights reserved. 8

    METHOD 2:

    A multi-stage design is used. The country is first stratified by large geographic units, and then by smaller units within geography. A minimum of 33 Primary Sampling Units (PSUs), which are first stage sampling units, are selected. The sample design results in 100 to 125 ultimate clusters.

    SECOND STAGE

    Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day, and where possible, on different days. If an interviewer cannot obtain an interview at the initial sampled household, he or she uses a simple substitution method. Refer to Appendix C for a more in-depth description of random route procedures.

    THIRD STAGE

    Respondents are randomly selected within the selected households. Interviewers list all eligible household members and their ages or birthdays. The respondent is selected by means of the Kish grid (refer to Appendix C) in countries where face-to-face interviewing is used. The interview does not inform the person who answers the door of the selection criteria until after the respondent has been identified. In a few Middle East and Asian countries where cultural restrictions dictate gender matching, respondents are randomly selected using the Kish grid from among all eligible adults of the matching gender.

    TELEPHONE SURVEY DESIGN

    In countries where telephone interviewing is employed, random-digit-dial (RDD) or a nationally representative list of phone numbers is used. In select countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to reach a person in each household, spread over different days and times of day. Appointments for callbacks that fall within the survey data collection period are made.

    PANEL SURVEY DESIGN

    Prior to 2009, United States data were collected using The Gallup Panel. The Gallup Panel is a probability-based, nationally representative panel, for which all members are recruited via random-digit-dial methodology and is only used in the United States. Participants who elect to join the panel are committing to the completion of two to three surveys per month, with the typical survey lasting 10 to 15 minutes. The Gallup Worldwide Research panel survey is conducted over the telephone and takes approximately 30 minutes. No incentives are given to panel participants. Worldwide Research Methodology and Codebook Copyright © 2008-2012 Gallup, Inc. All rights reserved. 9

    Research instrument

    QUESTION DESIGN

    Many of the Worldwide Research questions are items that Gallup has used for years. When developing additional questions, Gallup employed its worldwide network of research and political scientists1 to better understand key issues with regard to question development and construction and data gathering. Hundreds of items were developed, tested, piloted, and finalized. The best questions were retained for the core questionnaire and organized into indexes. Most items have a simple dichotomous ("yes or no") response set to minimize contamination of data because of cultural differences in response styles and to facilitate cross-cultural comparisons.

    The Gallup Worldwide Research measures key indicators such as Law and Order, Food and Shelter, Job Creation, Migration, Financial Wellbeing, Personal Health, Civic Engagement, and Evaluative Wellbeing and demonstrates their correlations with world development indicators such as GDP and Brain Gain. These indicators assist leaders in understanding the broad context of national interests and establishing organization-specific correlations between leading indexes and lagging economic outcomes.

    Gallup organizes its core group of indicators into the Gallup World Path. The Path is an organizational conceptualization of the seven indexes and is not to be construed as a causal model. The individual indexes have many properties of a strong theoretical framework. A more in-depth description of the questions and Gallup indexes is included in the indexes section of this document. In addition to World Path indexes, Gallup Worldwide Research questions also measure opinions about national institutions, corruption, youth development, community basics, diversity, optimism, communications, religiosity, and numerous other topics. For many regions of the world, additional questions that are specific to that region or country are included in surveys. Region-specific questions have been developed for predominantly Muslim nations, former Soviet Union countries, the Balkans, sub-Saharan Africa, Latin America, China and India, South Asia, and Israel and the Palestinian Territories.

    The questionnaire is translated into the major conversational languages of each country. The translation process starts with an English, French, or Spanish version, depending on the region. One of two translation methods may be used.

    METHOD 1: Two independent translations are completed. An independent third party, with some knowledge of survey research methods, adjudicates the differences. A professional translator translates the final version back into the source language.

    METHOD 2: A translator

  20. D

    Bilingual School Education Market Report | Global Forecast From 2025 To 2033...

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Bilingual School Education Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-bilingual-school-education-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Sep 23, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Bilingual School Education Market Outlook



    The global bilingual school education market size reached approximately USD 29 billion in 2023 and is projected to reach USD 47 billion by 2032, growing at a compound annual growth rate (CAGR) of 5.5% during the forecast period. The growth of this market is driven by the increasing importance of multilingualism in an interconnected global economy, rising demand for quality education, and the need for cultural integration in diverse societies.



    One of the primary growth factors for the bilingual school education market is the rising demand for multilingual proficiency among students. In an increasingly globalized world, the ability to communicate in multiple languages is becoming a valuable skill, essential for both personal and professional development. Parents and educators alike recognize that bilingual education not only enhances cognitive abilities but also opens up a world of opportunities in terms of career and cultural experiences. This growing awareness is driving the demand for bilingual school programs worldwide.



    Technological advancements in educational tools and resources are also significantly contributing to the market's growth. The integration of digital platforms and e-learning modules has revolutionized the way languages are taught, making it more engaging and accessible. The availability of online resources and language learning apps has made it easier for schools to implement bilingual programs effectively. Additionally, the hybrid model of learning, which combines in-person and online education, has gained popularity, especially post the COVID-19 pandemic, further bolstering the market growth.



    Government policies and initiatives supporting bilingual education are playing a crucial role in market expansion. Many countries have recognized the importance of bilingualism and have implemented policies to promote bilingual education in schools. For instance, in the United States, several states have adopted bilingual education programs in public schools to cater to the needs of an increasingly diverse student population. Similarly, European countries have a long-standing tradition of promoting multilingualism, which is reflected in their education systems. These initiatives are expected to continue driving the growth of the bilingual school education market.



    The regional outlook for the bilingual school education market shows significant growth potential in regions such as Asia Pacific, North America, and Europe. The Asia Pacific region, in particular, is witnessing rapid growth due to the increasing demand for English proficiency alongside native languages. North America remains a strong market due to its diverse population and established bilingual education programs. Europe continues to lead in promoting multilingualism through its educational policies and cultural emphasis on learning multiple languages from an early age.



    Type Analysis



    The bilingual school education market can be segmented by type into primary education, secondary education, and higher education. Primary education holds a significant share of the market as the foundation for bilingual proficiency is often laid during the early years of a child's education. The cognitive benefits of learning multiple languages at a young age are well-documented, leading to a strong preference for bilingual programs at the primary level. Schools offering primary bilingual education focus on creating an immersive environment where students can naturally acquire language skills through various subjects taught in both languages.



    Secondary education also plays a crucial role in the bilingual school education market. At this stage, students build on the language skills acquired during primary education and begin to apply them in more complex academic and social contexts. Secondary schools offering bilingual programs typically adopt a more structured approach, with specific subjects taught in the second language. This level of education is critical for preparing students for higher education and future careers, where bilingual proficiency can be a significant advantage.



    Higher education is the final segment in the type analysis of the bilingual school education market. Universities and colleges offering bilingual programs are becoming increasingly popular as they cater to the growing demand for multilingual professionals. These institutions often provide specialized courses and degrees that require proficiency in multiple languages. Higher education institutions also benefit from attracting international students

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Organization logo

The most spoken languages worldwide 2025

Explore at:
429 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description

In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

Search
Clear search
Close search
Google apps
Main menu