As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.
According to a 2023 survey, ** percent of internet users in urban India preferred using the internet in English. Meanwhile, ** percent of users accessed the internet in Indian languages, with Hindi being the most preferred language among them. Over *** million internet users reside in the urban areas of India.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Canadian Internet use survey, Internet use, by language used to search for information, for Canada in 2005. (Terminated)
This statistic displays the number of Indian and English language internet users across India from 2011 to 2021. In 2016, the number of English internet users amounted to about *** million and was projected to increase to *** million in 2021. For Indian language users, this number was about *** million users in 2016, and was projected to reach *** million in 2021.
This statistic represents the number of non-English digital payment internet users across India in 2016, based on language. Hindi internet users had the highest number of digital payment users amounting to about ** million, followed by Tamil internet users at about **** million during the measured time period.
This statistic represents the digital payment adoption rates among non-English internet users across India in 2016, based on language, with a forecast for 2021. The adoption rate among Telugu users was the highest in 2016 at about 37 percent, and was projected to reach about 53 percent in 2021.
As of February 2025, China ranked first among the countries with the most internet users worldwide. The world's most populated country had 1.11 billion internet users, more than triple the third-ranked United States, with just around 322 million internet users. Overall, all BRIC markets had over two billion internet users, accounting for four of the ten countries with more than 100 million internet users. Worldwide internet usage As of October 2024, there were more than five billion internet users worldwide. There are, however, stark differences in user distribution according to region. Eastern Asia is home to 1.34 billion internet users, while African and Middle Eastern regions had lower user figures. Moreover, the urban areas showed a higher percentage of internet access than rural areas. Internet use in China China ranks first in the list of countries with the most internet users. Due to its ongoing and fast-paced economic development and a cultural inclination towards technology, more than a billion of the estimated 1.4 billion population in China are online. As of the third quarter of 2023, around 87 percent of Chinese internet users stated using WeChat, the most popular social network in the country. On average, Chinese internet users spent five hours and 33 minutes online daily.
In the third quarter of 2023, over 55 percent the Peruvian population over six years old speaking native languages such as Quechua or Aymara claimed having used the internet in the South American country. The internet penetration in Peru has been growing steadily, having reached 74 percent of the country's population in 2022.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Internet use’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/19704278-bundesamt-fur-statistik-bfs on 16 January 2022.
--- Dataset description provided by original source is as follows ---
This dataset presents the half-yearly figures of Internet usage of persons aged 14 and over, by type of user (closer user group, additional user group), age group, gender, educational level, monthly income, language region and place (home, workplace, on the move/mobile access), since 1997. The descriptions of the variables in the CSV file are available in the appendix.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Dataset, in 29 files of xlsx format, contains the data of all metrics and accumulated information as they are described in the methodology, results and discussion section of the research article "Exploring the Dominance of the English Language on the Websites of EU Countries".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Flash Eurobarometer studied how Europeans use different languages online. While 90% of European internet users prefer to surf the internet in their own language, 55% at least occasionally use a language other than their own when online according to a pan-EU Eurobarometer survey released today. However, 44% feel they are missing interesting information because web pages are not in a language that they understand.
This statistic represents the forecast for share of non-English internet users across India in 2020, based on language. Hindi was projected to have the highest share of internet users in the country with about ** percent, while the share was about ***** percent for Malayalam during the measured time period.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Canadian Internet use survey, Internet use, by language used to search for information, for Canada in 2005. (Terminated)
The Canadian Internet Use Survey (CIUS) measures the extent and scope to which individual Canadians use the Internet. Survey content includes the location of use (e.g., at home, at work), the frequency and intensity of use, the specific uses of the Internet from the home, the purchase of products and services (electronic commerce), and other issues related to Internet use (such as language of use and concerns over privacy). This content is supplemented by information on individual and household characteristics (e.g., age, income, education, family type) and some geographic detail (e.g. province, urban/rural, and CMA). The Canadian Internet Use Survey results are widely disseminated to a variety of users. All levels of government can use CIUS to shape policies and programmes related to the Internet (i.e. uptake and barriers, high speed access, Government on-line and other communication initiatives) and electronic commerce. Also, the Organization for Economic Cooperation and Development (OECD) uses the results for international benchmarking and comparison studies. The CIUS data support a wide range of research initiatives. In academia, micro data are made available to students and researchers within universities and colleges under the Data Liberation Initiative. The survey results are also used in the private sector for market research, as well as for consultation on regulatory issues related to the internet. Finally, the results of the CIUS are widely quoted in the media reflecting a high level of interest in the Internet and its users. The CIUS replaces the Household Internet Use Survey (HIUS), conducted from 1997 to 2003, which focused on household Internet penetration. The new survey was redesigned to focus more on Internet use by individuals and to conform to international standards regarding Internet statistics. Because CIUS collects information from the individual and HIUS was based on the household, it is not appropriate to directly compare results from 2005 with previous surveys.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Developed the Index of Internet Connectivity as part of a package of measures to help monitor the UK's use of the Internet and the growth of e-commerce. Source agency: Office for National Statistics Designation: National Statistics Language: English Alternative title: Internet connectivity
This statistic gives information on the distribution of U.S. Hispanic internet users in 2015, by primary language. During the 2015 National Survey of Latinos conducted in November 2015, it was found that English was the dominant language for 31 percent of U.S. Hispanic internet users.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.
The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.
WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.
Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.
Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.
We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.
The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942
Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.
posts
Column Name | Type | Description |
---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
crosspost_parent_id | TEXT | The ID of the original Reddit post if this post is a crosspost. |
post_id | TEXT | Unique identifier for the Reddit post. |
created_at | TIMESTAMP | The timestamp when the post was created. |
updated_at | TIMESTAMP | The timestamp when the post was last updated. |
language_code | TEXT | The language code of the post. |
score | INTEGER | The score (upvotes minus downvotes) of the post. |
upvote_ratio | REAL | The ratio of upvotes to total votes. |
gildings | INTEGER | Number of awards (gildings) received by the post. |
num_comments | INTEGER | Number of comments on the post. |
comments
Column Name | Type | Description |
---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
post_id | TEXT | The ID of the Reddit post the comment belongs to. |
parent_id | TEXT | The ID of the parent comment (if a reply). |
comment_id | TEXT | Unique identifier for the comment. |
created_at | TIMESTAMP | The timestamp when the comment was created. |
last_modified_at | TIMESTAMP | The timestamp when the comment was last modified. |
score | INTEGER | The score (upvotes minus downvotes) of the comment. |
upvote_ratio | REAL | The ratio of upvotes to total votes for the comment. |
gilded | INTEGER | Number of awards (gildings) received by the comment. |
postlinks
Column Name | Type | Description |
---|---|---|
post_id | TEXT | Unique identifier for the Reddit post. |
end_processed_valid | INTEGER | Whether the extracted URL from the post resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the Reddit post. |
final_valid | INTEGER | Whether the final URL from the post resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final URL. |
final_url | TEXT | The final URL after redirections. |
redirected | INTEGER | Indicator of whether the posted URL was redirected (1) or not (0). |
in_title | INTEGER | Indicator of whether the link appears in the post title (1) or post body (0). |
commentlinks
Column Name | Type | Description |
---|---|---|
comment_id | TEXT | Unique identifier for the Reddit comment. |
end_processed_valid | INTEGER | Whether the extracted URL from the comment resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the comment. |
final_valid | INTEGER | Whether the final URL from the comment resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final |
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The online language learning software market is experiencing robust growth, driven by increasing internet penetration, the rising demand for multilingual skills in a globalized workforce, and the convenience and affordability offered by digital platforms. The market's substantial size, estimated at $15 billion in 2025, reflects a considerable investment in language acquisition technologies. A Compound Annual Growth Rate (CAGR) of 15% is projected for the period 2025-2033, indicating a significant expansion to approximately $45 billion by 2033. This growth is fueled by several key trends, including the incorporation of gamification and AI-powered personalized learning experiences, an increase in the accessibility of diverse language courses, and the growing adoption of subscription-based models. While factors like the need for reliable internet access and concerns over the effectiveness of solely online learning represent potential restraints, these are being mitigated by advancements in technology and the integration of blended learning approaches. The market is segmented by software type (e.g., mobile apps, desktop software), language offered, and target demographic (e.g., students, professionals), with key players including well-established brands like Duolingo and Babbel, alongside newer entrants capitalizing on innovative learning methodologies. The competitive landscape is characterized by ongoing innovation in pedagogy and technology, driving continuous improvements in the user experience and learning outcomes. The regional distribution of the market showcases strong growth across North America, Europe, and Asia-Pacific. North America currently holds the largest market share, benefiting from high internet penetration and a strong demand for language skills, but Asia-Pacific is anticipated to experience the fastest growth in the coming years, fueled by a large, young, tech-savvy population and increasing disposable incomes. The presence of numerous established players and a surge in investment in EdTech startups suggests a highly competitive but dynamic market with substantial potential for future expansion. The ongoing refinement of language learning software, incorporating advancements in artificial intelligence and personalized learning strategies, positions this sector for sustained and significant growth throughout the forecast period.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This publication has been discontinued as a result of the ONS Consultation on Statistical Products, 2013. The last edition of the Internet Access Quarterly Update was published on 14 May 2014, for Q1 2014. ONS will conduct a public consultation on future plans for the annual publication of estimates of Internet users and this will appear on the ONS public consultation page:
http://www.ons.gov.uk/ons/about-ons/get-involved/consultations/open-consultations/index.html
Source agency: Office for National Statistics
Designation: National Statistics
Language: English
Alternative title: Internet Access Quarterly Update
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Online Language Learning Market Analysis The global online language learning market is poised for significant growth over the forecast period of 2023-2030. Valued at USD 43.79 billion in 2022, the market is projected to reach USD 155.30 billion by 2030, exhibiting a CAGR of 17.7%. The increasing demand for language learning due to globalization, technological advancements, and the growing popularity of distance learning are key drivers of market growth. English, Chinese (Mandarin), and European languages dominate the market, while individual learners and institutional learners comprise the primary user segments. Market Trends and Drivers The surge in internet penetration, the availability of affordable smartphones and tablets, and the development of innovative language learning platforms have revolutionized the online language learning landscape. The adoption of artificial intelligence (AI) enhances the learning experience by providing personalized feedback, adaptive learning paths, and virtual assistants. The rise of video conferencing and live online classes facilitates real-time interactions with language instructors. Additionally, the increasing demand for language skills in the global job market and the growing popularity of international travel contribute to the growing demand for online language learning solutions. The global online language learning market is projected to reach USD 16.24 billion by 2028, exhibiting a CAGR of 16.7% from 2021 to 2028. The surge in demand for online learning platforms, increased accessibility to language learning resources, and growing need for language proficiency in the globalized business landscape have contributed to the market's growth.
As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.