77 datasets found
  1. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  2. Common languages used for web content 2025, by share of websites

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  3. Ranking of languages spoken at home in the U.S. 2023

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ranking of languages spoken at home in the U.S. 2023 [Dataset]. https://www.statista.com/statistics/183483/ranking-of-languages-spoken-at-home-in-the-us-in-2008/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, around 43.37 million people in the United States spoke Spanish at home. In comparison, approximately 998,179 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.

  4. Most common languages spoken in India 2011

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most common languages spoken in India 2011 [Dataset]. https://www.statista.com/statistics/616508/most-common-languages-india/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2011
    Area covered
    India
    Description

    Hindi, with over *** million native speakers was the most spoken language across Indian homes, followed by Bengali with ** million speakers, as of 2011 census data. English native speakers accounted for about *** thousand during the measured time period. The colonial rule in India One of the most remarkable and widespread legacies that the British colonial rule left behind was the English language. Before independence, the English language was the solely used for higher education and in government and administrative processes. Post-independence, however, and till today, Hindi was claimed as the language with official government patronage. This lead to resistance from the southern states of India, where Hindi did not have prominence. Consequently, the Official Languages Act of 1963, was enacted by the parliament, which ensured the continued use of English for official purposes in conjunction with Hindi. Multi-linguistic cultures India has approximately ** major languages that are written in about ** different scripts. While the country’s official languages are both, English and Hindi, Hindi remains the most preferred language used online especially in the northern rural areas. The use of English is becoming increasingly popular in the urban areas. In addition, almost every state in India has its own official language that is studied in primary and secondary school as an obligatory second language. Among the most prominent are Bengali, Marathi, and Telugu.

  5. Number of native Spanish speakers worldwide 2024, by country

    • statista.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of native Spanish speakers worldwide 2024, by country [Dataset]. https://www.statista.com/statistics/991020/number-native-spanish-speakers-country-worldwide/
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.

  6. p

    Distribution of Students Across Grade Levels in Palisades World Language...

    • publicschoolreview.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review, Distribution of Students Across Grade Levels in Palisades World Language School [Dataset]. https://www.publicschoolreview.com/palisades-world-language-school-profile
    Explore at:
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset tracks annual distribution of students across grade levels in Palisades World Language School

  7. c

    Language Services market size was estimated at USD 58.9 billion in 2022!

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Feb 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2024). Language Services market size was estimated at USD 58.9 billion in 2022! [Dataset]. https://www.cognitivemarketresearch.com/language-services-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Feb 19, 2024
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global language services market size was estimated at USD 58.9 billion in 2022 and is expected to grow at a compound annual growth rate (CAGR) of 6.2% from 2023 to 2030. Which Factors Drives the Language Services Market Growth?

    Cross-border contact has become more intense due to globalization, increasing the need for translation, localization, and interpretation services. Language solutions are required by growing multinational businesses, e-commerce, and multilingual customer service. Growth is also fueled by government programs that support accessibility and multilingualism. Technology advancements, including AI-driven translation tools, increase productivity and widen the market.

    These developments empower businesses to offer better-tailored solutions and services, which, in turn, contribute to the growth of the Language Services industry.

    For instance, A well-known international provider of language services, BIG Language Solutions, revealed in April 2022 that it had acquired the Milan-based company Lawlinguists, which offers legal translation services. With the addition of Italy, Germany, and Spain to BIG's European footprint through the purchase, its clients now have access to a wider range of excellent legal translation services, resources, and technology.

    (Source:biglanguage.com/blog/big-acquires-lawlinguists-expands-legal-offering-and-european-presence/)

    Globalization and Internationalization to Provide Viable Market Output
    

    A significant market driver for language services has been globalization. Communication in various languages is becoming increasingly important as firms grow internationally. The expansion of international trade, e-commerce, and cross-border investments all contribute to this trend. Companies must translate, localize, and adapt their products and services to local languages and cultures to remain competitive in the global market.

    There are approximately 7,139 languages spoken in the world today. However, many of these languages are endangered, with experts estimating that around 40% of languages are at risk of extinction.

    (Source:www.ohchr.org/en/stories/2019/10/many-indigenous-languages-are-danger-extinction)

    Multinational corporations with diverse workforces and clients from various language backgrounds have become popular due to globalization. These enterprises rely on translation services to eliminate language barriers to guarantee efficient internal communication and seamless relations with external parties. Language solutions, including document, website, and marketing material translation and conference and meeting interpretation services, greatly aid international collaboration and understanding.

    Technological Advancements to Propel Market Growth
    
    
    
    
    
    Localization of Digital Content
    

    Factors Restraining Growth of the Language Services Market

    Machine Translation Limitations to Hinder Market Growth
    

    The constraints of machine translation constrain the language services market. While machine translation quality has increased due to technological developments in AI, especially for complicated or specialized information, it still falls short of human translation in accuracy and nuance. The context and idiomatic idioms that machine translation systems frequently struggle with might cause translations to sound uncomfortable or inaccurate to native speakers. This restriction is especially important for fields like law, medicine, and marketing, where accuracy and cultural appropriateness are key.

    How COVID-19 Impacted the Language Services Market?

    To reach a worldwide audience, the pandemic drove digital transformation and remote labor, driving up demand for translation and localization services. Translations in the medical and scientific fields increased as information sharing became essential. Travel restrictions hampered on-site interpreting services simultaneously, increasing the demand for remote interpreting services. Due to the pandemic's emphasis on efficient intercultural communication, businesses, the medical community, and governments have all prioritized language services to enable proper information flow and support during the crisis What is Language Services?

    Language services means it is a professional service used for communication and understanding between different cultural groups. It facilitates effective comm...

  8. Most used programming languages among developers worldwide 2024

    • statista.com
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most used programming languages among developers worldwide 2024 [Dataset]. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/
    Explore at:
    Dataset updated
    Feb 6, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 19, 2024 - Jun 20, 2024
    Area covered
    Worldwide
    Description

    As of 2024, JavaScript and HTML/CSS were the most commonly used programming languages among software developers around the world, with more than 62 percent of respondents stating that they used JavaScript and just around 53 percent using HTML/CSS. Python, SQL, and TypeScript rounded out the top five most widely used programming languages around the world. Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.

  9. Languages in Canada 2022

    • statista.com
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Languages in Canada 2022 [Dataset]. https://www.statista.com/statistics/271218/languages-in-canada/
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2022
    Area covered
    Canada
    Description

    The statistic reflects the distribution of languages in Canada in 2022. In 2022, 87.1 percent of the total population in Canada spoke English as their native tongue.

  10. o

    Data from: Common Phone: A Multilingual Dataset for Robust Acoustic...

    • explore.openaire.eu
    • zenodo.org
    Updated Jan 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Klumpp; Tom��s Arias-Vergara; Paula Andrea P��rez-Toro; Elmar N��th; Juan Rafael Orozco-Arroyave (2022). Common Phone: A Multilingual Dataset for Robust Acoustic Modelling [Dataset]. http://doi.org/10.5281/zenodo.5846137
    Explore at:
    Dataset updated
    Jan 17, 2022
    Authors
    Philipp Klumpp; Tom��s Arias-Vergara; Paula Andrea P��rez-Toro; Elmar N��th; Juan Rafael Orozco-Arroyave
    Description

    Release Date: 17.01.22 Welcome to Common Phone 1.0 Legal Information Common Phone is a subset of the Common Voice corpus collected by Mozilla Corporation. By using Common Phone, you agree to the Common Voice Legal Terms. Common Phone is maintained and distributed by speech researchers at the Pattern Recognition Lab of Friedrich-Alexander-University Erlangen-Nuremberg (FAU) under the CC0 license. Like for Common Voice, you must not make any attempt to identify speakers that contributed to Common Phone. About Common Phone This corpus aims to provide a basis for Machine Learning (ML) researchers and enthusiasts to train and test their models against a wide variety of speakers, hardware/software ecosystems and acoustic conditions to improve generalization and availability of ML in real-world speech applications. The current version of Common Phone comprises 116,5 hours of speech samples, collected from 11.246 speakers in 6 languages: Language Speakers Hours train / dev / test train / dev / test English 4716 / 771 / 774 14.1 / 2.3 / 2.3 French 796 / 138 / 135 13.6 / 2.3 / 2.2 German 1176 / 202 / 206 14.5 / 2.5 / 2.6 Italian 1031 / 176 / 178 14.6 / 2.5 / 2.5 Spanish 508 / 88 / 91 16.5 / 3.0 / 3.1 Russian 190 / 34 / 36 12.7 / 2.6 / 2.8 Total 8417 / 1409 / 1420 85.8 / 15.2 / 15.5 Presented train, dev and test splits are not identical to those shipped with Common Voice. Speaker separation among splits was realized by only using those speakers that had provided age and gender information. This information can only be provided as a registered user on the website. When logged in, the session ID of contributed recordings is always linked to your user, thus we could easily link recordings to individual speakers. Keep in mind this would not be possible for unregistered users, as their session ID changes if they decide to contribute more than once. During speaker selection, we considered that some speakers had contributed to more than one of the six Common Voice datasets (one for each language). In Common Phone, a speaker will only appear in one language. The dataset is structured as follows: Six top-level directories, one for each language. Each language folder contains: [train|dev|test].csv files listing audio files, respective speaker ID and plain text transcript. meta.csv provides speaker information: age group, gender, language, accent (if available) and which of the three splits this speaker was assigned to. File names match corresponding audio file names except their extension. /grids/ contains phonetic transcription for every audio file in Praat TextGrid format. /mp3/ contains audio files in mp3, identical to those of Common Voice, e.g., sampling rates have been preserved and may vary for different files. /wav/ contains raw audio files in 16 bits/sample, 16 kHz single channel. They had been created from the original mp3 audios. We provide them for convenience, keep in mind that their source had undergone MP3-compression. Where does the phonetic annotation come from? Phonetic annotation was computed via BAS Web Services. We used the regular Pipeline (G2P-MAUS) without ASR to create an alignment of text transcripts with audio signals. We chose International Phonetic Alphabet (IPA) output symbols as they work well even in a multi-lingual setup. Common Phone annotation comprises 101 phonetic symbols, including silence. Why Common Phone? Large number of speakers and varying acoustic conditions to improve robustness of ML models Time-aligned IPA phonetic transcription for every audio sample Gender-balanced and age-group-matched (equal number of female/male speakers in every age group) Support for six different languages to leverage multi-lingual approaches Original MP3 files plus standard WAVE files Is there any publication available? Yes, a paper describing Common Phone in detail is currently under revision for LREC 2022. You can access a pre-print version on arXiv entitled ���Common Phone: A Multilingual Dataset for Robust Acoustic Modelling���. {"references": ["Klumpp, Philipp et al. (2022); "Common Phone: A Multilingual Dataset for Robust Acoustic Modelling" https://arxiv.org/abs/2201.05912"]}

  11. p

    Trends in Hispanic Student Percentage (2022-2023): Palisades World Language...

    • publicschoolreview.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review, Trends in Hispanic Student Percentage (2022-2023): Palisades World Language School vs. Oregon vs. Lake Oswego SD 7j School District [Dataset]. https://www.publicschoolreview.com/palisades-world-language-school-profile
    Explore at:
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Lake Oswego
    Description

    This dataset tracks annual hispanic student percentage from 2022 to 2023 for Palisades World Language School vs. Oregon and Lake Oswego SD 7j School District

  12. j

    Japan Centre of Excellence (JACEEX)

    • jaceex.com
    html
    Updated Jul 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Japan Centre of Excellence (JACEEX) (2019). Japan Centre of Excellence (JACEEX) [Dataset]. https://www.jaceex.com/ssw
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jul 16, 2019
    Dataset provided by
    https://www.jaceex.com/
    Authors
    Japan Centre of Excellence (JACEEX)
    Area covered
    Description

    Japan Centre of Excellence (JACEEX), is a brand under Jaceex Ventures LLP. Jaceex has been formed with a vision to create a world class workforce with skill sets, work and business ethics, sincerity and devotion as well as other great positive traits found in the Japanese workforce which has been responsible for having built world class Enterprises. For the Indian Students and youths stepping into this world, our objective is to provide life changing opportunity in the form of skill and work in Japan Japan Centre of Excellence (JACEEX) provides an integrated course schedule of learning through exploration, scrutiny and self reflection. We are offering Japanese Language and Culture training-Basic, Intermediate and High Levels. Our training is designed to make the trainee eligible to certify themselves with the globally recognised Japanese Language Proficiency Test (JLPT) Examination . This will help in building careers with Japanese companies in Japan , in India and also self employment.We also have the facility of Virtual Live class platform

  13. Data from: Five Years of COVID-19 Discourse on Instagram: A Labeled...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur, Ph.D.; Nirmalya Thakur, Ph.D. (2024). Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.13896353
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nirmalya Thakur, Ph.D.; Nirmalya Thakur, Ph.D.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 6, 2024
    Description

    Please cite the following paper when using this dataset:

    N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)

    Abstract

    The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.

    For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.

    The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)

    There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)

    The following is a description of the attributes present in this dataset

    • Post ID: Unique ID of each Instagram post
    • Post Description: Complete description of each post in the language in which it was originally published
    • Date: Date of publication in MM/DD/YYYY format
    • Language code: Language code (for example: “en”) that represents the language of the post as detected using the Google Translate API
    • Full Language: Full form of the language (for example: “English”) that represents the language of the post as detected using the Google Translate API
    • Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral

    Open Research Questions

    This dataset is expected to be helpful for the investigation of the following research questions and even beyond:

    1. How does sentiment toward COVID-19 vary across different languages?
    2. How has public sentiment toward COVID-19 evolved from 2020 to the present?
    3. How do cultural differences affect social media discourse about COVID-19 across various languages?
    4. How has COVID-19 impacted mental health, as reflected in social media posts across different languages?
    5. How effective were public health campaigns in shifting public sentiment in different languages?
    6. What patterns of vaccine hesitancy or support are present in different languages?
    7. How did geopolitical events influence public sentiment about COVID-19 in multilingual social media discourse?
    8. What role does social media discourse play in shaping public behavior toward COVID-19 in different linguistic communities?
    9. How does the sentiment of minority or underrepresented languages compare to that of major world languages regarding COVID-19?
    10. What insights can be gained by comparing the sentiment of COVID-19 posts in widely spoken languages (e.g., English, Spanish) to those in less common languages?

    All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

  14. f

    Comparison of linguistic communities detection between census and Twitter.

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabio Lamanna; Maxime Lenormand; María Henar Salas-Olmedo; Gustavo Romanillos; Bruno Gonçalves; José J. Ramasco (2023). Comparison of linguistic communities detection between census and Twitter. [Dataset]. http://doi.org/10.1371/journal.pone.0191612.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Fabio Lamanna; Maxime Lenormand; María Henar Salas-Olmedo; Gustavo Romanillos; Bruno Gonçalves; José J. Ramasco
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Global Moran’s I for a set of common languages detected in Barcelona, London and Madrid. The z-values are calculated after 99 permutations. The last column refers to the quality and significance of the spatial autocorrelations detected.

  15. D

    Does standardization matter? Evaluating the potential of the Common European...

    • dataverse.no
    • search.dataone.org
    application/x-stata +3
    Updated Jun 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miriam Schmaus; Miriam Schmaus (2025). Does standardization matter? Evaluating the potential of the Common European Framework of Reference for Languages (CEFR) to foster labour market inclusion of immigrants (DISCEFRN): Vignette study dataset [Dataset]. http://doi.org/10.18710/6YMZLS
    Explore at:
    txt(5406), application/x-stata(1310453), text/comma-separated-values(8819399), pdf(284687), pdf(126320)Available download formats
    Dataset updated
    Jun 16, 2025
    Dataset provided by
    DataverseNO
    Authors
    Miriam Schmaus; Miriam Schmaus
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2024 - Dec 31, 2024
    Dataset funded by
    Marie Skłodowska-Curie Actions (MSCA, Horizon Europe Actions); Postdoctoral fellowship project (Miriam Schmaus): Grant agreement ID: 101065566; DOI: 10.3030/101065566
    The European Union's Marie Skłodowska-Curie Actions (MSCA, Horizon Europe Actions)
    Description

    This data contains the collected information of the survey experiment that was carried out within the DISCEFRN project (see metadata section Funding Information). Within DISCEFRN, we combined web-scraped job vacancy data of the Norwegian labour market with a factorial survey experiment that exploits real-world variation in CEFR requirements within these ads (n vignette ratings= 10,495; n employers= 1,527) to examine whether fictitious applicants with a refugee background face less language-based discrimination on the individual level among employers who use standardized language requirements in their (real-world) ads compared to those that don’t. We thereby varied different applicant characteristics related to ethnic origin and to formal (CEFR certificate) and informal language indicators (e.g. spelling, argumentation, professional reference on unobservable relational skills) within vignettes and collected information on job-, firm- and employer characteristics (most notably attitudes towards different refugee groups) with standard survey items. This allowed us to assess whether CEFR requirements are primarily mitigating biased applicant evaluations that are related to language-based statistical/error discrimination (less relevance of informal language indicators), related to discrimination tastes (less relevance of group-related attitudes), or both. This dataset contains all information on the survey experiment. It is a stand-alone dataset and contains all relevant data to re-produce associated publications (See metadata field on Publications) or be reused for other research interests. Yet, it can still be linked to additional DISCEFRN datasets, i.e. the web-scraped data set, that also holds information on those employers that did not participate in the survey experiment (https://doi.org/10.18710/K6WA0V).

  16. Instructor-led Language Training Market Report | Global Forecast From 2025...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Instructor-led Language Training Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-instructor-led-language-training-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Instructor-led Language Training Market Outlook



    The global instructor-led language training market size was valued at approximately USD 8 billion in 2023 and is projected to grow significantly, reaching nearly USD 12.5 billion by 2032, with a compound annual growth rate (CAGR) of around 5%. The growth of this market is being driven by several factors, including the increasing globalization of businesses, the rising demand for multilingual employees, and the growing emphasis on effective communication skills in both personal and professional settings. As the world becomes more interconnected, the ability to communicate in multiple languages is increasingly seen as a valuable asset, leading to a surge in demand for language training programs.



    One of the primary growth factors for the instructor-led language training market is the globalization of businesses and the need for companies to operate effectively across different linguistic and cultural contexts. As companies expand their operations into new regions, the ability to communicate with local clients, partners, and employees becomes crucial. This has led to a growing demand for language training programs that can equip employees with the necessary language skills. Moreover, the rise of remote work and virtual teams has further emphasized the need for effective communication across diverse geographies, fueling the demand for language training.



    Another significant factor contributing to the growth of this market is the increasing emphasis on personal development and lifelong learning. In a rapidly changing world, individuals are increasingly seeking to enhance their skills and knowledge to remain competitive in the job market. Language learning is seen as a key component of personal development, providing individuals with the ability to connect with different cultures and communities. As a result, there is a growing demand for language training programs that are tailored to individual learning needs and preferences, offering flexibility and convenience.



    The rise of digital technology and the increasing availability of online learning platforms have also played a crucial role in the growth of the instructor-led language training market. While traditional in-person language classes remain popular, virtual language training programs have gained significant traction due to their convenience and accessibility. These programs allow learners to access high-quality language instruction from anywhere in the world, making language learning more accessible to a wider audience. The integration of technology in language training programs has also enabled the development of innovative teaching methodologies and interactive learning experiences, further driving the growth of this market.



    In the context of globalization and the increasing need for multilingual communication, Study Abroad Training has emerged as a crucial component in language education. This type of training provides learners with immersive experiences in foreign countries, allowing them to practice language skills in real-world settings while gaining cultural insights. Study Abroad Training not only enhances language proficiency but also broadens learners' perspectives, making them more adaptable and culturally aware. As more students and professionals seek international exposure, the demand for Study Abroad Training is expected to rise, contributing to the growth of the language training market. This trend highlights the importance of experiential learning in achieving language fluency and intercultural competence.



    Regionally, the instructor-led language training market is experiencing significant growth across various parts of the world. North America and Europe are currently the largest markets for language training, driven by the presence of a large number of multinational companies and a strong emphasis on language education. However, the Asia Pacific region is expected to witness the highest growth during the forecast period, driven by the rapid economic development in countries like China and India and the increasing demand for English language proficiency. The growing importance of language skills in Latin America and the Middle East & Africa is also expected to contribute to the growth of the instructor-led language training market in these regions.



    Training Type Analysis



    The instructor-led language training market is segmented by training type into in-person and virtual training. In-person training remains a traditio

  17. PersianQuAD

    • kaggle.com
    Updated Mar 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jamshid Mozafari (2022). PersianQuAD [Dataset]. https://www.kaggle.com/jamshidjdmy/persianquad/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jamshid Mozafari
    Description

    In order to address the need for a high-quality QA dataset for Persian language, we propose a model for creating dataset for deep-learning-based QA systems. We deploy the proposed model to create PersianQuAD, the first native question answering dataset for the Persian language. PersianQuAD contains approximately 20,000 "question, paragraph, answer" triplets on Persian Wikipedia articles and is the first large-scale native QA dataset for the Persian language which is created by native annotators.

    The proposed model consists of four steps: 1) Wikipedia article selection, 2) question-answer collection, 3) three-candidates test set preparation, and 4) Data Quality Monitoring. We analysed PersianQuAD and showed that it contains questions of varying types and difficulties and hence, it is a good presenter of real-world questions in the Persian language. We built three QA systems using MBERT, ALBERT-FA and ParsBERT. The best system uses MBERT and achieves a F1 score of 82.97% and an Exact Match of 78.8%. The results show that the resulted dataset performs well for training deep-learning-based QA systems. We have made our dataset and QA models freely available and hope that it encourages the development of new QA datasets and systems for different languages, and leads to further advances in machine comprehension.

  18. Language Learning Software Market Report | Global Forecast From 2025 To 2033...

    • dataintelo.com
    csv, pdf, pptx
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Language Learning Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-language-learning-software-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Language Learning Software Market Outlook



    The global language learning software market size in 2023 is estimated to be approximately USD 14.5 billion and is projected to grow significantly, reaching USD 33.5 billion by 2032, with a compound annual growth rate (CAGR) of 9.5% from 2024 to 2032. The market's robust expansion is primarily driven by the growing demand for multilingual proficiency in an increasingly globalized world. As businesses operate across borders and cultures, the need for language skills becomes more vital than ever. Digital transformation and technological advancements have further accelerated the adoption of language learning software, providing users with interactive, flexible, and personalized learning experiences.



    One of the key growth factors of the language learning software market is the increasing emphasis on language skills in the academic sector. Schools and universities worldwide are integrating digital language learning tools into their curricula to enhance students' linguistic capabilities. This trend is particularly pronounced in regions where English is not the first language, as English has become the lingua franca of international communication. Language learning software offers an effective, engaging, and scalable solution for educational institutions to teach languages, supporting diverse learning styles and paces. Furthermore, the convenience of on-demand access and the ability to track progress are making such software an attractive choice for educators.



    Another significant driver of market growth is the rise of corporate training programs focused on enhancing employees' language skills. As organizations expand globally, bridging language barriers becomes crucial for successful operations, negotiations, and customer interactions. Consequently, businesses are investing in language learning software to train their workforce. The software allows companies to provide uniform language training across diverse geographic locations, ensuring that employees are equipped with the necessary skills to communicate effectively with international clients and colleagues. This corporate demand is further fueled by the software's ability to offer tailored learning paths and real-time performance analytics, thus maximizing the return on investment.



    The proliferation of smart devices and increasing internet penetration have propelled the popularity of language learning apps, contributing significantly to market growth. Apps offer unparalleled accessibility, enabling users to learn languages at their convenience, whether on the go or at home. This flexibility is particularly appealing to individual learners who are juggling busy schedules. The availability of engaging, gamified content and social learning features in these apps enhances user retention and motivation, further boosting their adoption. Moreover, advancements in artificial intelligence and machine learning are enabling more sophisticated and personalized learning experiences, driving continued market expansion.



    Regionally, the Asia Pacific market is expected to exhibit the highest growth rate over the forecast period, driven by the increasing importance of English in business and education sectors. Countries like China, India, and Japan are witnessing a surge in demand for English language learning solutions, fueled by globalization and competitive academic and corporate landscapes. North America, being home to some of the largest providers of language learning software, holds a significant market share. However, Europe also presents promising growth opportunities, particularly due to the diverse linguistic landscape and the emphasis on multilingualism in education and business. The Middle East & Africa and Latin America are gradually recognizing the benefits of language proficiency, contributing to the global market growth.



    Product Type Analysis



    The language learning software market is segmented by product type into self-paced e-learning, online tutoring, apps-based learning, and others, each offering unique advantages and catering to different learning preferences. Self-paced e-learning remains one of the most popular segments, offering learners the flexibility to access course materials and content at their convenience. This mode is particularly suitable for individuals who prefer to learn at their own pace, without the constraints of a fixed schedule. The asynchronous nature of self-paced e-learning allows learners to revisit challenging concepts, ensuring a comprehensive understanding before progressing. This segment is also favored for its cost-effectiveness and the breadth of courses availabl

  19. Online Language Learning Market Analysis, Size, and Forecast 2025-2029: APAC...

    • technavio.com
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Online Language Learning Market Analysis, Size, and Forecast 2025-2029: APAC (China, India, Japan, South Korea), Europe (France, Germany, Italy, Spain, UK), North America (Canada), South America (Brazil), and Middle East and Africa (UAE) [Dataset]. https://www.technavio.com/report/online-language-learning-market-industry-analysis
    Explore at:
    Dataset updated
    Dec 15, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global
    Description

    Snapshot img

    Online Language Learning Market Size 2025-2029

    The online language learning market size is forecast to increase by USD 81.55 billion at a CAGR of 27.5% between 2024 and 2029.

    The market is experiencing significant growth due to its cost-effective and flexible nature, making it an attractive alternative to traditional language classes. The convenience of learning at one's own pace and location, coupled with the affordability of online courses and tutoring is driving the market's expansion. Furthermore, the integration of artificial intelligence (AI) in language learning platforms is revolutionizing the industry by providing personalized learning experiences and real-time feedback. However, the market faces challenges as well. Open sources, such as free language learning websites and applications, pose a significant threat by offering similar services at no cost.
    These platforms, while not as comprehensive as paid offerings, can still attract price-sensitive consumers and limit the revenue potential for market participants. Companies must differentiate themselves by offering unique features, superior learning outcomes, or a more engaging user experience to justify their premium pricing. To navigate this challenge, strategic partnerships, collaborations, and continuous innovation in AI technology could provide competitive advantages.
    

    What will be the Size of the Online Language Learning Market during the forecast period?

    Request Free Sample

    The market continues to evolve, driven by the growing demand for effective and engaging language learning solutions. Corporate language training is a significant sector within this market, as businesses recognize the importance of multilingualism in expanding their global reach. ESL learning, or English as a Second Language, is another thriving area, catering to the needs of advanced language learners and those seeking to improve their proficiency in English. Travel language learning is another application of the market, with individuals increasingly recognizing the value of being able to communicate effectively in foreign countries.
    Virtual classrooms and mobile language learning have also gained popularity, making language learning more accessible and convenient for learners. Language acquisition and second language acquisition are ongoing processes, with learners continually seeking new ways to improve their proficiency. Grammar exercises, pronunciation practice, and personalized learning are some of the strategies used to enhance language learning methodology. Translation services are another application of the market, providing solutions for individuals and businesses to communicate effectively across language barriers. The market for language learning games is vast and diverse, with new technologies and approaches continually emerging to meet the evolving needs of language learners. Language learning tips, vocabulary builders, and language assessment tools are essential resources for learners, helping them to optimize their learning experience and track their progress. The market is a dynamic and ever-evolving landscape, with continuous innovation and growth expected in the years to come.
    

    How is this Online Language Learning Industry segmented?

    The online language learning industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    End-user
    
      Courses
      Solutions
      Apps
    
    
    Language
    
      English
      Mandarin
      Spanish
      Others
    
    
    Delivery Format
    
      Live Online Classes
      Self-Paced Online Courses
      Hybrid Learning
      Live Online Classes
      Self-Paced Online Courses
      Hybrid Learning
    
    
    Target Learner
    
      School Students
      University Students
      Working Professionals
      Adults for Personal Development
      School Students
      University Students
      Working Professionals
      Adults for Personal Development
    
    
    End User Type
    
      Individual Learners
      Educational Institutions
      Individual Learners
      Educational Institutions
    
    
    Geography
    
      North America
    
        US
        Canada
    
    
      Europe
    
        France
        Germany
        Italy
        Spain
        UK
    
    
      Middle East and Africa
    
        UAE
    
    
      APAC
    
        China
        India
        Japan
        South Korea
    
    
      South America
    
        Brazil
    
    
      Rest of World (ROW)
    

    By End-user Insights

    The courses segment is estimated to witness significant growth during the forecast period.

    Online language learning has become a popular and accessible solution for individuals seeking to expand their linguistic abilities. Courses form the foundation of this learning journey, encompassing digital content and courseware that facilitate language acquisition. The affordability of online language courses, compared to traditional classroom programs, broadens accessibility to a larger audience, including those with financi

  20. p

    Trends in Diversity Score (2022-2023): Palisades World Language School vs....

    • publicschoolreview.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review, Trends in Diversity Score (2022-2023): Palisades World Language School vs. Oregon vs. Lake Oswego SD 7j School District [Dataset]. https://www.publicschoolreview.com/palisades-world-language-school-profile
    Explore at:
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Lake Oswego
    Description

    This dataset tracks annual diversity score from 2022 to 2023 for Palisades World Language School vs. Oregon and Lake Oswego SD 7j School District

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Organization logo

The most spoken languages worldwide 2025

Explore at:
429 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description

In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

Search
Clear search
Close search
Google apps
Main menu