95 datasets found

Common languages used for web content 2025, by share of websites
statista.com
ai-chatbox.pro
Updated Feb 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
Explore at:
Dataset updated
Feb 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.
Preferred language to access the internet India 2023
statista.com
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Preferred language to access the internet India 2023 [Dataset]. https://www.statista.com/statistics/1459294/india-internet-access-by-language/
Explore at:
Dataset updated
Mar 28, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
India
Description
According to a 2023 survey, 43 percent of internet users in urban India preferred using the Internet in English. Meanwhile, 57 percent of users accessed the internet in Indian languages, with Hindi being the most preferred language among them. Over 300 million internet users reside in the urban areas of India.
Number of Indian and English language internet users in India 2011-2021
statista.com
Updated Mar 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Number of Indian and English language internet users in India 2011-2021 [Dataset]. https://www.statista.com/statistics/718420/internet-user-base-by-language-india/
Explore at:
Dataset updated
Mar 15, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
India
Description
This statistic displays the number of Indian and English language internet users across India from 2011 to 2021. In 2016, the number of English internet users amounted to about 175 million and was projected to increase to 199 million in 2021. For Indian language users, this number was about 234 million users in 2016, and was projected to reach 536 million in 2021.
Z
Data from: Exploring the Dominance of the English Language on the Websites...
data.niaid.nih.gov
zenodo.org
Updated Mar 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pergantis Minas (2020). Exploring the Dominance of the English Language on the Websites of EU Countries [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3698007
Explore at:
Dataset updated
Mar 5, 2020
Dataset provided by
Lamprogeorgos Aristeidis
Varlamis Iraklis
Giannakoulopoulos Andreas
Limniati Laida
Pergantis Minas
Konstantinou Nikos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
European Union
Description
This Dataset, in 29 files of xlsx format, contains the data of all metrics and accumulated information as they are described in the methodology, results and discussion section of the research article "Exploring the Dominance of the English Language on the Websites of EU Countries".
G
Internet use, by language used to search for information
open.canada.ca
www150.statcan.gc.ca
+1more
csv, html, xml
Updated Jan 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2023). Internet use, by language used to search for information [Dataset]. https://open.canada.ca/data/en/dataset/e2617831-7e2d-4da5-919f-47311eea3349
Explore at:
html, xml, csvAvailable download formats
Dataset updated
Jan 17, 2023
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Canadian Internet use survey, Internet use, by language used to search for information, for Canada in 2005. (Terminated)
Most common sources of language errors on the internet in Poland 2023
statista.com
Updated Feb 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most common sources of language errors on the internet in Poland 2023 [Dataset]. https://www.statista.com/statistics/1098947/poland-most-common-places-for-language-errors-online/
Explore at:
Dataset updated
Feb 28, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
Poland
Description
According to the source, 9,154 language errors were published each day on the internet in Poland in 2023. Over 38 percent of mistakes were found on Facebook, 20.21 percent on Twitter.
Online Language Learning Market By Product (Institutional Learners and...
fnfresearch.com
pdf
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Facts and Factors (2025). Online Language Learning Market By Product (Institutional Learners and Individual Learners), By Language (Japanese, German, French, Chinese, Spanish, English, and Others), and By Region - Global and Regional Industry Trends, Market Insights, Data analysis, Historical Information, and Forecast 2022–2028 [Dataset]. https://www.fnfresearch.com/online-language-learning-market
Explore at:
pdfAvailable download formats
Dataset updated
May 30, 2025
Dataset authored and provided by
Facts and Factors
License
https://www.fnfresearch.com/privacy-policyhttps://www.fnfresearch.com/privacy-policy
Time period covered
2022 - 2030
Area covered
French, Global
Description
[209+ Pages Report] The global online language learning market size was valued at USD 14.2 billion in 2021 and is expected to reach a value of USD 28.5 billion by 2028 with growth at a CAGR of 18.8% during 2022-2028.
Internet adoption share among non-English users in India - by language 2016
statista.com
Updated Mar 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Internet adoption share among non-English users in India - by language 2016 [Dataset]. https://www.statista.com/statistics/718476/internet-adoption-levels-among-non-english-users-by-language-india/
Explore at:
Dataset updated
Mar 15, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2016
Area covered
India
Description
This statistic represents the share of internet adoption levels among non-English speakers across India in 2016, based on language. Tamil had the highest internet adoption levels during the measured period with about 42 percent, followed by Hindi and Kannada. Malayalam had the lowest in this list with about 27 percent.
Online language education market size growth rate in China 2012-2019
statista.com
ai-chatbox.pro
Updated Sep 23, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2019). Online language education market size growth rate in China 2012-2019 [Dataset]. https://www.statista.com/statistics/968114/china-online-language-education-market-growth-rate/
Explore at:
Dataset updated
Sep 23, 2019
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2012 - 2015
Area covered
China
Description
This statistic shows the annual growth rate of online language education market size in China from 2012 to 2015 with estimates up until 2019. In 2015, the size of online language education market in China increased by almost 30 percent compared to the previous year.
Index of Internet Connectivity
data.wu.ac.at
data.europa.eu
html
Updated May 3, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2014). Index of Internet Connectivity [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/YWFmMDU2NzItMjA4NS00ZmJiLWJmOGMtZTE5MjNmZTE2NGUz
Explore at:
htmlAvailable download formats
Dataset updated
May 3, 2014
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Developed the Index of Internet Connectivity as part of a package of measures to help monitor the UK's use of the Internet and the growth of e-commerce. Source agency: Office for National Statistics Designation: National Statistics Language: English Alternative title: Internet connectivity

Data from: WikiReddit: Tracing Information and Attention Flows Between...

zenodo.org

bin

Updated May 4, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms [Dataset]. http://doi.org/10.5281/zenodo.14653265

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14653265

Dataset updated

May 4, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jan 15, 2025

Description

Preprint

Gildersleve, P., Beers, A., Ito, V., Orozco, A., & Tripodi, F. (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms. arXiv [Cs.CY]. https://doi.org/10.48550/arXiv.2502.04942

Accepted at the International AAAI Conference on Web and Social Media (ICWSM) 2025

Abstract

The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.

Datasheet

Motivation

The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.

Composition

WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.

Collection Process

Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.

Preprocessing/cleaning/labeling

Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.

Uses

We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.

Distribution

The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942

Maintenance

Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.

SQL Database Schema

Table: `posts`

Column Name	Type	Description
`subreddit_id`	TEXT	The unique identifier for the subreddit.
`crosspost_parent_id`	TEXT	The ID of the original Reddit post if this post is a crosspost.
`post_id`	TEXT	Unique identifier for the Reddit post.
`created_at`	TIMESTAMP	The timestamp when the post was created.
`updated_at`	TIMESTAMP	The timestamp when the post was last updated.
`language_code`	TEXT	The language code of the post.
`score`	INTEGER	The score (upvotes minus downvotes) of the post.
`upvote_ratio`	REAL	The ratio of upvotes to total votes.
`gildings`	INTEGER	Number of awards (gildings) received by the post.
`num_comments`	INTEGER	Number of comments on the post.

Table: `comments`

Column Name	Type	Description
`subreddit_id`	TEXT	The unique identifier for the subreddit.
`post_id`	TEXT	The ID of the Reddit post the comment belongs to.
`parent_id`	TEXT	The ID of the parent comment (if a reply).
`comment_id`	TEXT	Unique identifier for the comment.
`created_at`	TIMESTAMP	The timestamp when the comment was created.
`last_modified_at`	TIMESTAMP	The timestamp when the comment was last modified.
`score`	INTEGER	The score (upvotes minus downvotes) of the comment.
`upvote_ratio`	REAL	The ratio of upvotes to total votes for the comment.
`gilded`	INTEGER	Number of awards (gildings) received by the comment.

Table: `postlinks`

Column Name	Type	Description
`post_id`	TEXT	Unique identifier for the Reddit post.
`end_processed_valid`	INTEGER	Whether the extracted URL from the post resolves to a valid URL.
`end_processed_url`	TEXT	The extracted URL from the Reddit post.
`final_valid`	INTEGER	Whether the final URL from the post resolves to a valid URL after redirections.
`final_status`	INTEGER	HTTP status code of the final URL.
`final_url`	TEXT	The final URL after redirections.
`redirected`	INTEGER	Indicator of whether the posted URL was redirected (1) or not (0).
`in_title`	INTEGER	Indicator of whether the link appears in the post title (1) or post body (0).

Table: `commentlinks`

Column Name	Type	Description
`comment_id`	TEXT	Unique identifier for the Reddit comment.
`end_processed_valid`	INTEGER	Whether the extracted URL from the comment resolves to a valid URL.
`end_processed_url`	TEXT	The extracted URL from the comment.
`final_valid`	INTEGER	Whether the final URL from the comment resolves to a valid URL after redirections.
`final_status`	INTEGER	HTTP status code of the final

Table 15.2 - Number of households by type of internet connection by Language...
census.geohive.ie
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistics Office (2023). Table 15.2 - Number of households by type of internet connection by Language Planning Areas (Census 2022) [Dataset]. https://census.geohive.ie/datasets/c404272deb24416abe3cc1ae165367de
Explore at:
Dataset updated
Dec 14, 2023
Dataset provided by
Central Statistics Office Irelandhttps://www.cso.ie/en/
Authors
Central Statistics Office
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Number of households by type of internet connection by Limistéir Pleanála Teanga. (Census 2022 Theme 15 Table 2 )Census 2022 table 15.2 is number of households with types of internet connection. Attributes include a breakdown of households by access to internet. Census 2022 theme 15 is Motor Car Availability and Internet Access. Teorainneacha na Limistéar Pleanála Teanga Gaeltachta. I gcomhréir le forálacha Acht na Gaeltachta 2012, tá 26 Limistéar Pleanála Teanga Gaeltachta sainaitheanta ag an Aire Ealaíon, Oidhreachta agus Gaeltachta. Faoin Acht, athainmneofar an Ghaeltacht atá ann faoi láthair mar Limistéir Pleanála Teanga Ghaeltachta ach pleananna teanga a bheith aontaithe ag pobail sna limistéir éagsúla de réir na gcritéar pleanála teanga atá forordaithe faoin Acht. Tá Údarás na Gaeltachta freagrach faoin Acht as tacú le heagraíochtaí maidir le hullmhú agus cur i bhfeidhm na bpleananna teanga sna Limistéir Pleanála Teanga Ghaeltachta. Gaeltacht Language Planning Area Boundaries. In line with the provisions of the Gaeltacht Act 2012, the Minister for Arts, Heritage and the Gaeltacht has identified 26 Gaeltacht Language Planning Areas. Under the Act, the existing Gaeltacht will be redesignated as Gaeltacht Language Planning Areas provided that language plans are agreed by the communities in the various areas in accordance with the language planning criteria prescribed under the Act. Údarás na Gaeltachta is responsible under the Act for supporting organisations with regard to the preparation and implementation of the language plans in the Gaeltacht Language Planning Areas. Coordinate reference system: Irish Transverse Mercator (EPSG 2157). These boundaries are based on 20m generalised boundaries sourced from Tailte Éireann Open Data Portal. This dataset is provided by Tailte Éireann, Limistéir Pleanála Teanga 2015.
e
Population aged 15 and over in the Basque Country Internet user by place of...
data.europa.eu
unknown
Updated Jul 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Comunidad Autónoma de País Vasco (2023). Population aged 15 and over in the Basque Country Internet user by place of access and languages used, according to Historical Territory (%). [Dataset]. https://data.europa.eu/data/datasets/https-opendata-euskadi-eus-catalogo-poblacion-15-y-mas-anos-c-euskadi-usuaria-internet-lugar-acceso-e-idiomas-utilizados-territorio-historico-
Explore at:
unknown(18330), unknown(737)Available download formats
Dataset updated
Jul 10, 2023
Dataset authored and provided by
Comunidad Autónoma de País Vasco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Basque Country
Description
The statistical operation Information Society Survey-ESI-Familias, provides periodic information on the implementation of the new Information and Communication Technologies -ICT- in the population of the Basque Country. In particular, it computes and describes the ICT equipment of the population both in the home and in the study center or in the workplace, and measures the level of use that is made of them, especially those related to the Internet. It allows us to compare the level of implementation of these ICT technologies in Basque society in relation to other countries in its environment. The statistical operation Information Society Survey-ESI-Familias, provides periodic information on the implementation of the new Information and Communication Technologies -ICT- in the population of the Basque Country. In particular, it computes and describes the ICT equipment of the population both in the home and in the study center or in the workplace, and measures the level of use that is made of them, especially those related to the Internet. It allows us to compare the level of implementation of these ICT technologies in Basque society in relation to other countries in its environment. The statistical operation Information Society Survey-ESI-Familias, provides periodic information on the implementation of the new Information and Communication Technologies -ICT- in the population of the Basque Country. In particular, it computes and describes the ICT equipment of the population both in the home and in the study center or in the workplace, and measures the level of use that is made of them, especially those related to the Internet. It allows us to compare the level of implementation of these ICT technologies in Basque society in relation to other countries in its environment.
The most spoken languages worldwide 2025
statista.com
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
Explore at:
Dataset updated
Apr 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
World
Description
In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.
Top programming languages used for Internet of Things projects 2016
statista.com
Updated Apr 14, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2016). Top programming languages used for Internet of Things projects 2016 [Dataset]. https://www.statista.com/statistics/658792/worldwide-internet-of-things-survey-programming-languages-used/
Explore at:
Dataset updated
Apr 14, 2016
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 11, 2016 - Mar 25, 2016
Area covered
Worldwide
Description
The statistic shows distribution of programming languages used by Internet of Things developers, according to a survey conducted in 2016. At that time, 31.5 percent of respondents indicated that they were using Node.js when developing Internet of Things solutions.
e
Irish Language Statistics for Irish Language Networks (Census 2011-2016)
data.europa.eu
csv, xls
Updated Nov 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marine Institute (2024). Irish Language Statistics for Irish Language Networks (Census 2011-2016) [Dataset]. https://data.europa.eu/data/datasets/922691a0-c1b0-463b-9743-58f7fcd0ff81
Explore at:
csv, xlsAvailable download formats
Dataset updated
Nov 12, 2024
Dataset authored and provided by
Marine Institute
Description
Summary: This dataset shows statistics on the use of Irish in Irish-Language Networks from the 2011 and 2016 censuses. The Irish Language Networks are defined by Settlement or Electoral Divisional boundaries. This dataset is published online through the Language Planning Viewer run by the Department of Culture, Heritage and the Gaeltacht: http://arcg.is/2nkqdMb Abstract: The dataset presents statistics from the 2011 and 2016 censuses relating to the use of Irish language for the Irish Language Networks. The Irish Language Networks are defined as settlement or Electoral Division boundaries. This dataset is published online through the Language Planning Viewer application run by the Department of Culture, Heritage and the Gaeltacht: http://arcg.is/2nkqdMb
Sanako Corp Online Language Learning Market Insights
statistics.technavio.org
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio, Sanako Corp Online Language Learning Market Insights [Dataset]. https://statistics.technavio.org/sanako-corp-online-language-learning-market-insights
Explore at:
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Worldwide
Description
Download Free Sample
The online language learning market is expected to grow at a CAGR of 20% during the forecast period. This market growth can be attributed to various factors including increasing enrollment of foreign students.

The online language learning market report offers several other valuable insights such as:

CAGR of the market during the forecast period 2020-2024 Detailed information on factors that will drive online language learning market growth during the next five years Precise estimation of the online language learning market size and its contribution to the parent market Accurate predictions on upcoming trends and changes in consumer behavior The growth of the online language learning market industry across APAC, Europe, North America, South America, and MEA A thorough analysis of the market’s competitive landscape and detailed information on vendors Comprehensive details of factors that will challenge the growth of online language learning market vendors
o
Data from: WSDL, Web Service Description Language
opendata.fi
vip.avoindata.fi
+1more
html
Updated Jan 15, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valtiovarainministeriö (2018). WSDL, Web Service Description Language [Dataset]. https://www.opendata.fi/data/dataset/wsdl-web-service-description-language
Explore at:
htmlAvailable download formats
Dataset updated
Jan 15, 2018
Dataset provided by
Valtiovarainministeriö
Description
WSDL on W3C:n määrittämä XML-perustainen kieli, jolla kuvataan tietoverkossa tarjolla oleva web-teknologioihin perustuva palvelu, eli Web Service. (31.08.2011)
Main language of Steam users worldwide 2024
ai-chatbox.pro
statista.com
Updated Nov 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Main language of Steam users worldwide 2024 [Dataset]. https://www.ai-chatbox.pro/?_=%2Fstatistics%2F957319%2Fsteam-user-language%2F%23XgboD02vawLYpGJjSPEePEUG%2FVFd%2Bik%3D
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2024
Area covered
World
Description
As of October 2024, an estimated 33.48 percent of Steam gaming platform users worldwide used Simplified Chinese as their main language. English was the second-most common language, selected by 32.68 percent of users.
E
Data from: Bosnian web corpus CLASSLA-web.bs 1.0
live.european-language-grid.eu
binary format
Updated Mar 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Bosnian web corpus CLASSLA-web.bs 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/23265
Explore at:
binary formatAvailable download formats
Dataset updated
Mar 25, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Bosnian web corpus CLASSLA-web.bs 1.0 is based on the MaCoCu-bs 1.0 web corpus crawl (http://hdl.handle.net/11356/1808), which was additionally cleaned and enriched with linguistic and genre information. The CLASSLA-web.bs corpus is a part of the South Slavic CLASSLA-web corpus collection, which is the first collection of comparable corpora that encompasses the entire South Slavic language group.

The MaCoCu-bs 1.0 crawl was built by crawling the ".ba" internet top-level domain in 2021 and 2022, as well as extending the crawl dynamically to other domains. During the development of CLASSLA-web corpora, the MaCoCu web crawls were cleaned by removing paragraphs that are not in the target language, and by removing very short texts (less than 75 words or consisting only of paragraphs shorter than 70 characters). The corpus was also linguistically annotated with the CLASSLA-Stanza pipeline (https://github.com/clarinsi/classla). The linguistic processing involved tokenization, morphosyntactic annotation, and lemmatization. Additionally, the corpus was automatically annotated with genres using the Transformer-based X-GENRE classifier (https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier). The following genre categories are used: News, Information/Explanation, Promotion, Opinion/Argumentation, Instruction, Legal, Prose/Lyrical, Forum, Other and Mix.

The corpus is available in vertical format, as used by Sketch Engine and CWB concordancers. Information is provided on the text-, paragraph-, sentence- and token-level. Each text is accompanied by the following metadata: text id, title, url, domain, top-level domain (tld, e.g., "com"), and predicted genre category. Each text is divided into paragraphs that are accompanied by the following metadata: paragraph id, the automatically identified language of the text in the paragraph, and paragraph quality. For quality, labels, such as "short" or "good" are assigned based on paragraph length, URL and stopword density via the jusText tool (https://corpus.tools/wiki/Justext). Paragraphs are further divided into sentences that have as metadata their sentence id. Inside sentences, tokens are provided in tabular format with their linguistic annotation. Details about the structural and positional attributes are also given in the accompanying registry file which was used to install the corpus on the CLARIN.SI concordancers.

Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/

Common languages used for web content 2025, by share of websites

Explore at:

63 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Feb 11, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Feb 2025

Area covered

Worldwide

Description

As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

Clear search

Close search

Google apps

Main menu

Common languages used for web content 2025, by share of websites

Preferred language to access the internet India 2023

Number of Indian and English language internet users in India 2011-2021

Data from: Exploring the Dominance of the English Language on the Websites...

Internet use, by language used to search for information

Most common sources of language errors on the internet in Poland 2023

Online Language Learning Market By Product (Institutional Learners and...

Internet adoption share among non-English users in India - by language 2016

Online language education market size growth rate in China 2012-2019

Index of Internet Connectivity

Data from: WikiReddit: Tracing Information and Attention Flows Between...

Preprint

Abstract

Datasheet

Motivation

Composition

Collection Process

Preprocessing/cleaning/labeling

Uses

Distribution

Maintenance

SQL Database Schema

Table: posts

Table: comments

Table: postlinks

Table: commentlinks

Table 15.2 - Number of households by type of internet connection by Language...

Population aged 15 and over in the Basque Country Internet user by place of...

The most spoken languages worldwide 2025

Top programming languages used for Internet of Things projects 2016

Irish Language Statistics for Irish Language Networks (Census 2011-2016)

Sanako Corp Online Language Learning Market Insights

Data from: WSDL, Web Service Description Language

Main language of Steam users worldwide 2024

Data from: Bosnian web corpus CLASSLA-web.bs 1.0

Common languages used for web content 2025, by share of websitesSee More Versions

Table: `posts`

Table: `comments`

Table: `postlinks`

Table: `commentlinks`

Common languages used for web content 2025, by share of websites