In 2023, there were around 1.5 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.1 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year.
Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation and other official pronouncements. The United States is a land of immigrations and the languages spoken in the United States vary as a result of the multi-cultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over 41 million people spoke at home in 2021. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.7 million Tagalog speakers and 1.5 million Vietnamese speakers counted in the United States that year.
Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 44 percent of California’s population was speaking a language other than English at home in 2021.
Mexico is the country with the largest number of native Spanish speakers in the world. As of 2024, 132.5 million people in Mexico spoke Spanish with a native command of the language. Colombia was the nation with the second-highest number of native Spanish speakers, at around 52.7 million. Spain came in third, with 48 million, and Argentina fourth, with 46 million. Spanish, a world language As of 2023, Spanish ranked as the fourth most spoken language in the world, only behind English, Chinese, and Hindi, with over half a billion speakers. Spanish is the official language of over 20 countries, the majority on the American continent, nonetheless, it's also one of the official languages of Equatorial Guinea in Africa. Other countries have a strong influence, like the United States, Morocco, or Brazil, countries included in the list of non-Hispanic countries with the highest number of Spanish speakers. The second most spoken language in the U.S. In the most recent data, Spanish ranked as the language, other than English, with the highest number of speakers, with 12 times more speakers as the second place. Which comes to no surprise following the long history of migrations from Latin American countries to the Northern country. Moreover, only during the fiscal year 2022. 5 out of the top 10 countries of origin of naturalized people in the U.S. came from Spanish-speaking countries.
In 2022, around 42.03 million people in the United States spoke Spanish at home. In comparison, approximately 974,829 people were speaking Russian at home during the same year. The distribution of the U.S. population by ethnicity can be accessed here. A ranking of the most spoken languages across the world can be accessed here.
Papua New Guinea is the most linguistically diverse country in the world. As of 2021, 840 different languages were spoken across the country. The second most languages were spoken in Indonesia, counting 711 different languages. In the United States, 328 languages were spoken in that same year.
As of 2021, about 43.9 percent of California's population was speaking a language other than English at home. A ranking of the most spoken languages across the world can be accessed here.
As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.
The United States is the non-hispanic country with the largest number of native Spanish speakers in the world, with approximately 41.89 million people with a native command of the language in 2024. However, the European Union had the largest group of non-native speakers with limited proficiency of Spanish, at around 28 million people. Furthermore, Mexico is the country with the largest number of native Spanish speakers in the world as of 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is adapted from raw data with fully anonymized results on the State Examination of Dutch as a Second Language. This exam is officially administred by the Board of Tests and Examinations (College voor Toetsen en Examens, or CvTE). See cvte.nl/about-cvte. The Board of Tests and Examinations is mandated by the Dutch government.
The article accompanying the dataset:
Schepens, Job, Roeland van Hout, and T. Florian Jaeger. “Big Data Suggest Strong Constraints of Linguistic Similarity on Adult Language Learning.” Cognition 194 (January 1, 2020): 104056. https://doi.org/10.1016/j.cognition.2019.104056.
Every row in the dataset represents the first official testing score of a unique learner.
The columns contain the following information as based on questionnaires filled in at the time of the exam:
"L1" - The first language of the learner
"C" - The country of birth
"L1L2" - The combination of first and best additional language besides Dutch
"L2" - The best additional language besides Dutch
"AaA" - Age at Arrival in the Netherlands in years (starting date of residence)
"LoR" - Length of residence in the Netherlands in years
"Edu.day" - Duration of daily education (1 low, 2 middle, 3 high, 4 very high). From 1992 until 2006, learners' education has been measured by means of a side-by-side matrix question in a learner's questionnaire. Learners were asked to mark which type of education they have had (elementary, secondary, or tertiary schooling) by means of filling in for how many years they have been enrolled, in which country, and whether or not they have graduated. Based on this information we were able to estimate how many years learners have had education on a daily basis from six years of age onwards. Since 2006, the question about learners' education has been altered and it is asked directly how many years learners have had formal education on a daily basis from six years of age onwards. Possible answering categories are: 1) 0 thru 5 years; 2) 6 thru 10 years; 3) 11 thru 15 years; 4) 16 years or more. The answers have been merged into the categorical answer.
"Sex" - Gender
"Family" - Language Family
"ISO639.3" - Language ID code according to Ethnologue
"Enroll" - Proportion of school-aged youth enrolled in secondary education according to the World Bank. The World Bank reports on education data in a wide number of countries around the world on a regular basis. We took the gross enrollment rate in secondary schooling per country in the year the learner has arrived in the Netherlands as an indicator for a country's educational accessibility at the time learners have left their country of origin.
"STEX_speaking_score" - The STEX test score for speaking proficiency.
"Dissimilarity_morphological" - Morphological similarity
"Dissimilarity_lexical" - Lexical similarity
"Dissimilarity_phonological_new_features" - Phonological similarity (in terms of new features)
"Dissimilarity_phonological_new_categories" - Phonological similarity (in terms of new sounds)
A few rows of the data:
"L1","C","L1L2","L2","AaA","LoR","Edu.day","Sex","Family","ISO639.3","Enroll","STEX_speaking_score","Dissimilarity_morphological","Dissimilarity_lexical","Dissimilarity_phonological_new_features","Dissimilarity_phonological_new_categories"
"English","UnitedStates","EnglishMonolingual","Monolingual",34,0,4,"Female","Indo-European","eng ",94,541,0.0094,0.083191,11,19
"English","UnitedStates","EnglishGerman","German",25,16,3,"Female","Indo-European","eng ",94,603,0.0094,0.083191,11,19
"English","UnitedStates","EnglishFrench","French",32,3,4,"Male","Indo-European","eng ",94,562,0.0094,0.083191,11,19
"English","UnitedStates","EnglishSpanish","Spanish",27,8,4,"Male","Indo-European","eng ",94,537,0.0094,0.083191,11,19
"English","UnitedStates","EnglishMonolingual","Monolingual",47,5,3,"Male","Indo-European","eng ",94,505,0.0094,0.083191,11,19
There is a longstanding debate in cognitive science surrounding the source of commonalities among languages of the world. Indeed, there are many potential explanations for such commonalities—accidents of history, common processes of language change, memory limitations, constraints on linguistic representations, etc. Recent research has used psycholinguistic experiments to provide empirical evidence linking common linguistic patterns to specific features of human cognition, but these experiments tend to use English speakers, who in many cases have direct experience with precisely the common patterns of interest. Here, we highlight the importance of testing populations whose languages go against cross-linguistic trends. We investigate whether monolingual speakers of Kîîtharaka, which has an unusual way of ordering words, mirror those of English speakers. We find that they do, supporting the hypothesis that universal cognitive representations play a role in shaping word order.
Languages can be very different from each other. For example, just focussing on the order of words, languages like English put adjectives before nouns ('red house') while languages like Thai put them afterwards ('house red'). Similarly, languages like Vietnamese put Numerals before nouns ('three houses'), while others, like the Kitharaka (spoken in Kenya), put numerals after ('houses three'). If word ordering was simply due to happenstance, we would expect to see all different orders appearing in equal proportion across languages, but we don't find that. In fact, some orders are very common, some are very rare, and some don't seem to appear at all. For example, many languages are ordered like English ('three red houses'), and many are also ordered like Thai, which is exactly the reverse ('houses red three'). But the Kitharaka order ('houses three red') is much rarer, and its mirror image ('red three houses') never seems to occur. Why is this?
One of the major controversies in the language sciences is whether we need to appeal to the basic set-up of the human mind to explain the ways languages can vary, or whether these properties are instead a result of cultural differences in communication and social interaction. A great deal of recent work coming from the perspective of psychology assumes the latter: that the properties of language can be boiled down to communication, interaction and the vagaries of history, while most work in linguistics assumes the former: there must be biases in the human mind that allow us to learn languages of particular types more easily than others. This project seeks to resolve that issue.
In order to do this, we test how well people learn languages of various types, to see whether their behaviour follows the general tendencies we see across real languages. Importantly, we use artificially constructed languages, rather than natural languages, in order to make sure that they only differ in the crucial respects. For example, we present English speakers with artificial languages that use word orders from Thai and Kitharaka. If Thai orders are more common across languages than Kitharaka ones because the former are easier to learn, then we should see this reflected in the behaviour of learners in our experiments. We can also see whether such patterns are always harder to learn, or if speaking a language which uses them-like Kitharaka-makes them easier to pick up in a new language. To do this, our experiments compare English, Thai, Vietnamese and Kitharaka speakers. If our learners all show the same kinds of patterns in how they learn our artificial languages that we find across real languages, that will suggest that the way languages vary is not random, nor is it entirely a product of historical facts. Rather it would suggest that there are universal cognitive biases at play.
We plan to look at not just the basic question of what orders appear, but also two other well-known cases where languages don't seem to vary randomly. The first relates to how words like adjectives and numbers are placed relative to the nouns they modify: most languages place them both before or after (like English and Thai), rather than putting them on opposite sides (e.g., 'two houses red', like Vietnamese). We will test whether this type of pattern is always easier to learn in a new language. Second, we will look at whether people prefer to learn languages with suffixes (e.g., 'cat-s') rather than prefixes (e.g., 'un-happy'). Both types are present in English, but most languages have (more) suffixes. Our project we will shed light on whether there are universal cognitive biases in language learning, if such biases are at play for the particular phenomena we look at, and how people's native languages affect these biases.
As of 2022, there were over 2,000 living languages in Africa. With 520 languages, Nigeria accounted for around a fourth of the total languages spoken in Africa. Cameroon and the Democratic Republic of the Congo followed, each with over 200 living languages.
Africa's linguistic diversity
Africa boasted a repertoire of more than 2,000 living languages as of 2021. The continent had more than 900 classified as vigorous due to their widespread usage in face-to-face communication across all age groups. A regional perspective on the matter revealed that East Africa was Africa's most linguistically diverse region.
Embracing diversity: Africans welcome ethnic diversity in neighbors
In terms of cultural diversity, in 2021, the Republic of Chad held the title of being Africa's and the world's most culturally diverse nation. It obtained a cultural diversity index score of 0.85, with Cameroon and Nigeria following closely behind, scoring 0.84 and 0.83, respectively. The index measures diversity on a scale from zero to one, where one signifies the highest diversity level, while zero indicates the lowest level. Moreover, a survey conducted between 2019 and 2021 showed that Africans would be happy to have neighbors of different ethnicity.
The number of enrollments in language schools in Spain reveals that Spaniards are well aware of the importance of foreign languages in modern times. During the 2022/23 academic year, almost 331,000 people were registered at the Spanish language schools to add a new language to their curricula. In a globalized world, languages are taking a much more important role on the job market. The most studied and spoken languages in the world include English, Mandarin, Hindi or Spanish.
The importance of language knowledge in the job market Enrollment numbers at language schools come as no surprise considering that foreign languages have become a vital asset for job seekers in the last years. English, par excellence the most used language for international affairs, unsurprisingly ranked first on the list of most valued languages on the Spanish job market, with approximately 65.2 of job openings that require foreign language skills demanding this one. Far from that stood French, with 17.38 percent of the job openings.
Languages in the Spanish multimedia scene Most of the best selling albums Spain during 2022 were recorded in the country’s main language Spanish, with 38 albums in the top 50. As for videogames, 96 percent of the games produced in the country had English as a language option. Spanish was the second most used language, being present in 91 percent of productions.
Hindi, with over 528 million native speakers was the most spoken language across Indian homes, followed by Bengali with 97 million speakers, as of 2011 census data. English native speakers accounted for about 260 thousand during the measured time period.
The colonial rule in India
One of the most remarkable and widespread legacies that the British colonial rule left behind was the English language. Before independence, the English language was the solely used for higher education and in government and administrative processes. Post-independence, however, and till today, Hindi was claimed as the language with official government patronage. This lead to resistance from the southern states of India, where Hindi did not have prominence. Consequently, the Official Languages Act of 1963, was enacted by the parliament, which ensured the continued use of English for official purposes in conjunction with Hindi.
Multi-linguistic cultures
India has approximately 22 major languages that are written in about 13 different scripts. While the country’s official languages are both, English and Hindi, Hindi remains the most preferred language used online especially in the northern rural areas. The use of English is becoming increasingly popular in the urban areas. In addition, almost every state in India has its own official language that is studied in primary and secondary school as an obligatory second language. Among the most prominent are Bengali, Marathi, and Telugu.
Argentina scored 562 out of a maximum of 800 points in the English Proficiency Index 2023. That was the highest score among all Latin American countries included in the survey. The Argentine capital, Buenos Aires, also received the highest English proficiency score among all the Latin American cities analyzed. Mexico and Haiti received the lowest scores in the region.
Other than in Russia, the Russian language was widely spoken in CIS countries by over 79 million people in 2019. Furthermore, more than 13 million residents of Eastern European and Balkan countries were Russian speakers. Russian was the eighth most widely spoken language worldwide as of 2019.
Given its diverse range of languages and high level of economic development, it is perhaps not surprising that Europe is home to the largest language services market in the world, comprising almost half of the global market. Language services globally The language services market covers a broad range of activities, from language instruction to professional translation services to localization and voice-over services for media such as film, television and video games. With the world becoming increasingly interconnected through technology, this market has more than doubled since 2009, with an expected global value of almost 50 billion U.S. dollars in 2019. And, there is good reason to expect this market to continue growing – especially given that the market share of the Asia Pacific region is relatively low, yet the region is home to five of the ten most commonly spoken languages in the world. Machine translation Technology is playing an increasingly important role in the language services industry. Machine translation, which is the process of using software to translate from one language to another, is a fast-growing field that is expected to more than triple in size from 2017 to 2024. Accordingly, the two largest providers in the global language services market – Transperfect and Lionbridge – are investing heavily in this area, offering software based ‘artificial intelligence’ translation in conjunction with their more traditional translation services.
As of 2021, Akan was the most spoken local language in Ghana, encompassing Twi dialects such as Fante, Akuapem, Akyem, Ahafo, and Asante. Akan was spoken by over nine million people in country. Following this were the Ewe, Abron, and Dagbani languages, with users reaching approximately 3.8 million, 1.2 million, and 1.2 million, respectively. English is the official language of Ghana, with nearly ten million speakers.
Singapore scored 631 out of a maximum of 800 points in the English Proficiency Index 2022, the highest score across the selected Asian countries and territories. In contrast, Thailand reached an English Proficiency Index score of 416 that year.
This statistic presents the leading European countries by their level of English proficiency as of March 2019. According to data provided by Klazz, Sweden had the highest percentage of people who were proficient in English at 71 percent of the population.
This graph shows the population of the U.S. by race and ethnic group from 2000 to 2023. In 2023, there were around 21.39 million people of Asian origin living in the United States. A ranking of the most spoken languages across the world can be accessed here. U.S. populationCurrently, the white population makes up the vast majority of the United States’ population, accounting for some 252.07 million people in 2023. This ethnicity group contributes to the highest share of the population in every region, but is especially noticeable in the Midwestern region. The Black or African American resident population totaled 45.76 million people in the same year. The overall population in the United States is expected to increase annually from 2022, with the 320.92 million people in 2015 expected to rise to 341.69 million people by 2027. Thus, population densities have also increased, totaling 36.3 inhabitants per square kilometer as of 2021. Despite being one of the most populous countries in the world, following China and India, the United States is not even among the top 150 most densely populated countries due to its large land mass. Monaco is the most densely populated country in the world and has a population density of 24,621.5 inhabitants per square kilometer as of 2021. As population numbers in the U.S. continues to grow, the Hispanic population has also seen a similar trend from 35.7 million inhabitants in the country in 2000 to some 62.65 million inhabitants in 2021. This growing population group is a significant source of population growth in the country due to both high immigration and birth rates. The United States is one of the most racially diverse countries in the world.
The statistic reflects the distribution of languages in Canada in 2022. In 2022, 87.1 percent of the total population in Canada spoke English as their native tongue.
In 2023, there were around 1.5 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.1 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year.
Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation and other official pronouncements. The United States is a land of immigrations and the languages spoken in the United States vary as a result of the multi-cultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over 41 million people spoke at home in 2021. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.7 million Tagalog speakers and 1.5 million Vietnamese speakers counted in the United States that year.
Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 44 percent of California’s population was speaking a language other than English at home in 2021.