95 datasets found
  1. Common languages used for web content 2025, by share of websites

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  2. Preferred language to access the internet India 2023

    • statista.com
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Preferred language to access the internet India 2023 [Dataset]. https://www.statista.com/statistics/1459294/india-internet-access-by-language/
    Explore at:
    Dataset updated
    Mar 28, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    According to a 2023 survey, 43 percent of internet users in urban India preferred using the Internet in English. Meanwhile, 57 percent of users accessed the internet in Indian languages, with Hindi being the most preferred language among them. Over 300 million internet users reside in the urban areas of India.

  3. Number of Indian and English language internet users in India 2011-2021

    • statista.com
    Updated Mar 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Number of Indian and English language internet users in India 2011-2021 [Dataset]. https://www.statista.com/statistics/718420/internet-user-base-by-language-india/
    Explore at:
    Dataset updated
    Mar 15, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    This statistic displays the number of Indian and English language internet users across India from 2011 to 2021. In 2016, the number of English internet users amounted to about 175 million and was projected to increase to 199 million in 2021. For Indian language users, this number was about 234 million users in 2016, and was projected to reach 536 million in 2021.

  4. Z

    Data from: Exploring the Dominance of the English Language on the Websites...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pergantis Minas (2020). Exploring the Dominance of the English Language on the Websites of EU Countries [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3698007
    Explore at:
    Dataset updated
    Mar 5, 2020
    Dataset provided by
    Lamprogeorgos Aristeidis
    Varlamis Iraklis
    Giannakoulopoulos Andreas
    Limniati Laida
    Pergantis Minas
    Konstantinou Nikos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    European Union
    Description

    This Dataset, in 29 files of xlsx format, contains the data of all metrics and accumulated information as they are described in the methodology, results and discussion section of the research article "Exploring the Dominance of the English Language on the Websites of EU Countries".

  5. G

    Internet use, by language used to search for information

    • open.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Jan 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Internet use, by language used to search for information [Dataset]. https://open.canada.ca/data/en/dataset/e2617831-7e2d-4da5-919f-47311eea3349
    Explore at:
    html, xml, csvAvailable download formats
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Canadian Internet use survey, Internet use, by language used to search for information, for Canada in 2005. (Terminated)

  6. Most common sources of language errors on the internet in Poland 2023

    • statista.com
    Updated Feb 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most common sources of language errors on the internet in Poland 2023 [Dataset]. https://www.statista.com/statistics/1098947/poland-most-common-places-for-language-errors-online/
    Explore at:
    Dataset updated
    Feb 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Poland
    Description

    According to the source, 9,154 language errors were published each day on the internet in Poland in 2023. Over 38 percent of mistakes were found on Facebook, 20.21 percent on Twitter.

  7. Online Language Learning Market By Product (Institutional Learners and...

    • fnfresearch.com
    pdf
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Facts and Factors (2025). Online Language Learning Market By Product (Institutional Learners and Individual Learners), By Language (Japanese, German, French, Chinese, Spanish, English, and Others), and By Region - Global and Regional Industry Trends, Market Insights, Data analysis, Historical Information, and Forecast 2022–2028 [Dataset]. https://www.fnfresearch.com/online-language-learning-market
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2025
    Dataset authored and provided by
    Facts and Factors
    License

    https://www.fnfresearch.com/privacy-policyhttps://www.fnfresearch.com/privacy-policy

    Time period covered
    2022 - 2030
    Area covered
    French, Global
    Description

    [209+ Pages Report] The global online language learning market size was valued at USD 14.2 billion in 2021 and is expected to reach a value of USD 28.5 billion by 2028 with growth at a CAGR of 18.8% during 2022-2028.

  8. Internet adoption share among non-English users in India - by language 2016

    • statista.com
    Updated Mar 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Internet adoption share among non-English users in India - by language 2016 [Dataset]. https://www.statista.com/statistics/718476/internet-adoption-levels-among-non-english-users-by-language-india/
    Explore at:
    Dataset updated
    Mar 15, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2016
    Area covered
    India
    Description

    This statistic represents the share of internet adoption levels among non-English speakers across India in 2016, based on language. Tamil had the highest internet adoption levels during the measured period with about 42 percent, followed by Hindi and Kannada. Malayalam had the lowest in this list with about 27 percent.

  9. Online language education market size growth rate in China 2012-2019

    • statista.com
    • ai-chatbox.pro
    Updated Sep 23, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2019). Online language education market size growth rate in China 2012-2019 [Dataset]. https://www.statista.com/statistics/968114/china-online-language-education-market-growth-rate/
    Explore at:
    Dataset updated
    Sep 23, 2019
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2012 - 2015
    Area covered
    China
    Description

    This statistic shows the annual growth rate of online language education market size in China from 2012 to 2015 with estimates up until 2019. In 2015, the size of online language education market in China increased by almost 30 percent compared to the previous year.

  10. Index of Internet Connectivity

    • data.wu.ac.at
    • data.europa.eu
    html
    Updated May 3, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2014). Index of Internet Connectivity [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/YWFmMDU2NzItMjA4NS00ZmJiLWJmOGMtZTE5MjNmZTE2NGUz
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 3, 2014
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Developed the Index of Internet Connectivity as part of a package of measures to help monitor the UK's use of the Internet and the growth of e-commerce. Source agency: Office for National Statistics Designation: National Statistics Language: English Alternative title: Internet connectivity

  11. Data from: WikiReddit: Tracing Information and Attention Flows Between...

    • zenodo.org
    bin
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms [Dataset]. http://doi.org/10.5281/zenodo.14653265
    Explore at:
    binAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 15, 2025
    Description

    Preprint

    Gildersleve, P., Beers, A., Ito, V., Orozco, A., & Tripodi, F. (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms. arXiv [Cs.CY]. https://doi.org/10.48550/arXiv.2502.04942
    Accepted at the International AAAI Conference on Web and Social Media (ICWSM) 2025

    Abstract

    The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.

    Datasheet

    Motivation

    The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.

    Composition

    WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.

    Collection Process

    Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.

    Preprocessing/cleaning/labeling

    Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.

    Uses

    We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.

    Distribution

    The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942

    Maintenance

    Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.


    SQL Database Schema

    Table: posts

    Column NameTypeDescription
    subreddit_idTEXTThe unique identifier for the subreddit.
    crosspost_parent_idTEXTThe ID of the original Reddit post if this post is a crosspost.
    post_idTEXTUnique identifier for the Reddit post.
    created_atTIMESTAMPThe timestamp when the post was created.
    updated_atTIMESTAMPThe timestamp when the post was last updated.
    language_codeTEXTThe language code of the post.
    scoreINTEGERThe score (upvotes minus downvotes) of the post.
    upvote_ratioREALThe ratio of upvotes to total votes.
    gildingsINTEGERNumber of awards (gildings) received by the post.
    num_commentsINTEGERNumber of comments on the post.

    Table: comments

    Column NameTypeDescription
    subreddit_idTEXTThe unique identifier for the subreddit.
    post_idTEXTThe ID of the Reddit post the comment belongs to.
    parent_idTEXTThe ID of the parent comment (if a reply).
    comment_idTEXTUnique identifier for the comment.
    created_atTIMESTAMPThe timestamp when the comment was created.
    last_modified_atTIMESTAMPThe timestamp when the comment was last modified.
    scoreINTEGERThe score (upvotes minus downvotes) of the comment.
    upvote_ratioREALThe ratio of upvotes to total votes for the comment.
    gildedINTEGERNumber of awards (gildings) received by the comment.

    Table: postlinks

    Column NameTypeDescription
    post_idTEXTUnique identifier for the Reddit post.
    end_processed_validINTEGERWhether the extracted URL from the post resolves to a valid URL.
    end_processed_urlTEXTThe extracted URL from the Reddit post.
    final_validINTEGERWhether the final URL from the post resolves to a valid URL after redirections.
    final_statusINTEGERHTTP status code of the final URL.
    final_urlTEXTThe final URL after redirections.
    redirectedINTEGERIndicator of whether the posted URL was redirected (1) or not (0).
    in_titleINTEGERIndicator of whether the link appears in the post title (1) or post body (0).

    Table: commentlinks

    Column NameTypeDescription
    comment_idTEXTUnique identifier for the Reddit comment.
    end_processed_validINTEGERWhether the extracted URL from the comment resolves to a valid URL.
    end_processed_urlTEXTThe extracted URL from the comment.
    final_validINTEGERWhether the final URL from the comment resolves to a valid URL after redirections.
    final_statusINTEGERHTTP status code of the final

  12. Table 15.2 - Number of households by type of internet connection by Language...

    • census.geohive.ie
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Office (2023). Table 15.2 - Number of households by type of internet connection by Language Planning Areas (Census 2022) [Dataset]. https://census.geohive.ie/datasets/c404272deb24416abe3cc1ae165367de
    Explore at:
    Dataset updated
    Dec 14, 2023
    Dataset provided by
    Central Statistics Office Irelandhttps://www.cso.ie/en/
    Authors
    Central Statistics Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Number of households by type of internet connection by Limistéir Pleanála Teanga. (Census 2022 Theme 15 Table 2 )Census 2022 table 15.2 is number of households with types of internet connection. Attributes include a breakdown of households by access to internet. Census 2022 theme 15 is Motor Car Availability and Internet Access. Teorainneacha na Limistéar Pleanála Teanga Gaeltachta. I gcomhréir le forálacha Acht na Gaeltachta 2012, tá 26 Limistéar Pleanála Teanga Gaeltachta sainaitheanta ag an Aire Ealaíon, Oidhreachta agus Gaeltachta. Faoin Acht, athainmneofar an Ghaeltacht atá ann faoi láthair mar Limistéir Pleanála Teanga Ghaeltachta ach pleananna teanga a bheith aontaithe ag pobail sna limistéir éagsúla de réir na gcritéar pleanála teanga atá forordaithe faoin Acht. Tá Údarás na Gaeltachta freagrach faoin Acht as tacú le heagraíochtaí maidir le hullmhú agus cur i bhfeidhm na bpleananna teanga sna Limistéir Pleanála Teanga Ghaeltachta. Gaeltacht Language Planning Area Boundaries. In line with the provisions of the Gaeltacht Act 2012, the Minister for Arts, Heritage and the Gaeltacht has identified 26 Gaeltacht Language Planning Areas. Under the Act, the existing Gaeltacht will be redesignated as Gaeltacht Language Planning Areas provided that language plans are agreed by the communities in the various areas in accordance with the language planning criteria prescribed under the Act. Údarás na Gaeltachta is responsible under the Act for supporting organisations with regard to the preparation and implementation of the language plans in the Gaeltacht Language Planning Areas. Coordinate reference system: Irish Transverse Mercator (EPSG 2157). These boundaries are based on 20m generalised boundaries sourced from Tailte Éireann Open Data Portal. This dataset is provided by Tailte Éireann, Limistéir Pleanála Teanga 2015.

  13. e

    Population aged 15 and over in the Basque Country Internet user by place of...

    • data.europa.eu
    unknown
    Updated Jul 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Comunidad Autónoma de País Vasco (2023). Population aged 15 and over in the Basque Country Internet user by place of access and languages used, according to Historical Territory (%). [Dataset]. https://data.europa.eu/data/datasets/https-opendata-euskadi-eus-catalogo-poblacion-15-y-mas-anos-c-euskadi-usuaria-internet-lugar-acceso-e-idiomas-utilizados-territorio-historico-
    Explore at:
    unknown(18330), unknown(737)Available download formats
    Dataset updated
    Jul 10, 2023
    Dataset authored and provided by
    Comunidad Autónoma de País Vasco
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Basque Country
    Description

    The statistical operation Information Society Survey-ESI-Familias, provides periodic information on the implementation of the new Information and Communication Technologies -ICT- in the population of the Basque Country. In particular, it computes and describes the ICT equipment of the population both in the home and in the study center or in the workplace, and measures the level of use that is made of them, especially those related to the Internet. It allows us to compare the level of implementation of these ICT technologies in Basque society in relation to other countries in its environment. The statistical operation Information Society Survey-ESI-Familias, provides periodic information on the implementation of the new Information and Communication Technologies -ICT- in the population of the Basque Country. In particular, it computes and describes the ICT equipment of the population both in the home and in the study center or in the workplace, and measures the level of use that is made of them, especially those related to the Internet. It allows us to compare the level of implementation of these ICT technologies in Basque society in relation to other countries in its environment. The statistical operation Information Society Survey-ESI-Familias, provides periodic information on the implementation of the new Information and Communication Technologies -ICT- in the population of the Basque Country. In particular, it computes and describes the ICT equipment of the population both in the home and in the study center or in the workplace, and measures the level of use that is made of them, especially those related to the Internet. It allows us to compare the level of implementation of these ICT technologies in Basque society in relation to other countries in its environment.

  14. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  15. Top programming languages used for Internet of Things projects 2016

    • statista.com
    Updated Apr 14, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2016). Top programming languages used for Internet of Things projects 2016 [Dataset]. https://www.statista.com/statistics/658792/worldwide-internet-of-things-survey-programming-languages-used/
    Explore at:
    Dataset updated
    Apr 14, 2016
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 11, 2016 - Mar 25, 2016
    Area covered
    Worldwide
    Description

    The statistic shows distribution of programming languages used by Internet of Things developers, according to a survey conducted in 2016. At that time, 31.5 percent of respondents indicated that they were using Node.js when developing Internet of Things solutions.

  16. e

    Irish Language Statistics for Irish Language Networks (Census 2011-2016)

    • data.europa.eu
    csv, xls
    Updated Nov 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marine Institute (2024). Irish Language Statistics for Irish Language Networks (Census 2011-2016) [Dataset]. https://data.europa.eu/data/datasets/922691a0-c1b0-463b-9743-58f7fcd0ff81
    Explore at:
    csv, xlsAvailable download formats
    Dataset updated
    Nov 12, 2024
    Dataset authored and provided by
    Marine Institute
    Description

    Summary: This dataset shows statistics on the use of Irish in Irish-Language Networks from the 2011 and 2016 censuses. The Irish Language Networks are defined by Settlement or Electoral Divisional boundaries. This dataset is published online through the Language Planning Viewer run by the Department of Culture, Heritage and the Gaeltacht: http://arcg.is/2nkqdMb Abstract: The dataset presents statistics from the 2011 and 2016 censuses relating to the use of Irish language for the Irish Language Networks. The Irish Language Networks are defined as settlement or Electoral Division boundaries. This dataset is published online through the Language Planning Viewer application run by the Department of Culture, Heritage and the Gaeltacht: http://arcg.is/2nkqdMb

  17. Sanako Corp Online Language Learning Market Insights

    • statistics.technavio.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio, Sanako Corp Online Language Learning Market Insights [Dataset]. https://statistics.technavio.org/sanako-corp-online-language-learning-market-insights
    Explore at:
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Worldwide
    Description

    Download Free Sample
    The online language learning market is expected to grow at a CAGR of 20% during the forecast period. This market growth can be attributed to various factors including increasing enrollment of foreign students.

    The online language learning market report offers several other valuable insights such as:

    CAGR of the market during the forecast period 2020-2024
    Detailed information on factors that will drive online language learning market growth during the next five years
    Precise estimation of the online language learning market size and its contribution to the parent market
    Accurate predictions on upcoming trends and changes in consumer behavior
    The growth of the online language learning market industry across APAC, Europe, North America, South America, and MEA
    A thorough analysis of the market’s competitive landscape and detailed information on vendors
    Comprehensive details of factors that will challenge the growth of online language learning market vendors
    
  18. o

    Data from: WSDL, Web Service Description Language

    • opendata.fi
    • vip.avoindata.fi
    • +1more
    html
    Updated Jan 15, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valtiovarainministeriö (2018). WSDL, Web Service Description Language [Dataset]. https://www.opendata.fi/data/dataset/wsdl-web-service-description-language
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jan 15, 2018
    Dataset provided by
    Valtiovarainministeriö
    Description

    WSDL on W3C:n määrittämä XML-perustainen kieli, jolla kuvataan tietoverkossa tarjolla oleva web-teknologioihin perustuva palvelu, eli Web Service. (31.08.2011)

  19. Main language of Steam users worldwide 2024

    • ai-chatbox.pro
    • statista.com
    Updated Nov 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Main language of Steam users worldwide 2024 [Dataset]. https://www.ai-chatbox.pro/?_=%2Fstatistics%2F957319%2Fsteam-user-language%2F%23XgboD02vawLYpGJjSPEePEUG%2FVFd%2Bik%3D
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2024
    Area covered
    World
    Description

    As of October 2024, an estimated 33.48 percent of Steam gaming platform users worldwide used Simplified Chinese as their main language. English was the second-most common language, selected by 32.68 percent of users.

  20. E

    Data from: Bosnian web corpus CLASSLA-web.bs 1.0

    • live.european-language-grid.eu
    binary format
    Updated Mar 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Bosnian web corpus CLASSLA-web.bs 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/23265
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Mar 25, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Bosnian web corpus CLASSLA-web.bs 1.0 is based on the MaCoCu-bs 1.0 web corpus crawl (http://hdl.handle.net/11356/1808), which was additionally cleaned and enriched with linguistic and genre information. The CLASSLA-web.bs corpus is a part of the South Slavic CLASSLA-web corpus collection, which is the first collection of comparable corpora that encompasses the entire South Slavic language group.

    The MaCoCu-bs 1.0 crawl was built by crawling the ".ba" internet top-level domain in 2021 and 2022, as well as extending the crawl dynamically to other domains. During the development of CLASSLA-web corpora, the MaCoCu web crawls were cleaned by removing paragraphs that are not in the target language, and by removing very short texts (less than 75 words or consisting only of paragraphs shorter than 70 characters). The corpus was also linguistically annotated with the CLASSLA-Stanza pipeline (https://github.com/clarinsi/classla). The linguistic processing involved tokenization, morphosyntactic annotation, and lemmatization. Additionally, the corpus was automatically annotated with genres using the Transformer-based X-GENRE classifier (https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-classifier). The following genre categories are used: News, Information/Explanation, Promotion, Opinion/Argumentation, Instruction, Legal, Prose/Lyrical, Forum, Other and Mix.

    The corpus is available in vertical format, as used by Sketch Engine and CWB concordancers. Information is provided on the text-, paragraph-, sentence- and token-level. Each text is accompanied by the following metadata: text id, title, url, domain, top-level domain (tld, e.g., "com"), and predicted genre category. Each text is divided into paragraphs that are accompanied by the following metadata: paragraph id, the automatically identified language of the text in the paragraph, and paragraph quality. For quality, labels, such as "short" or "good" are assigned based on paragraph length, URL and stopword density via the jusText tool (https://corpus.tools/wiki/Justext). Paragraphs are further divided into sentences that have as metadata their sentence id. Inside sentences, tokens are provided in tabular format with their linguistic annotation. Details about the structural and positional attributes are also given in the accompanying registry file which was used to install the corpus on the CLARIN.SI concordancers.

    Notice and take down: Should you consider that our data contains material that is owned by you and should therefore not be reproduced here, please: (1) Clearly identify yourself, with detailed contact data such as an address, telephone number or email address at which you can be contacted. (2) Clearly identify the copyrighted work claimed to be infringed. (3) Clearly identify the material that is claimed to be infringing and information reasonably sufficient in order to allow us to locate the material. (4) Please write to the contact person for this resource whose email is available in the full item record. We will comply with legitimate requests by removing the affected sources from the next release of the corpus.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
Organization logo

Common languages used for web content 2025, by share of websites

Explore at:
63 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description

As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

Search
Clear search
Close search
Google apps
Main menu