100+ datasets found
  1. Common languages used for web content 2025, by share of websites

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  2. Preferred language to access the internet India 2023

    • statista.com
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Preferred language to access the internet India 2023 [Dataset]. https://www.statista.com/statistics/1459294/india-internet-access-by-language/
    Explore at:
    Dataset updated
    Jun 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    According to a 2023 survey, ** percent of internet users in urban India preferred using the internet in English. Meanwhile, ** percent of users accessed the internet in Indian languages, with Hindi being the most preferred language among them. Over *** million internet users reside in the urban areas of India.

  3. G

    Internet use, by language used to search for information

    • open.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Jan 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Internet use, by language used to search for information [Dataset]. https://open.canada.ca/data/en/dataset/e2617831-7e2d-4da5-919f-47311eea3349
    Explore at:
    html, xml, csvAvailable download formats
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Canadian Internet use survey, Internet use, by language used to search for information, for Canada in 2005. (Terminated)

  4. Top programming languages used for Internet of Things projects 2016

    • statista.com
    Updated Apr 14, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2016). Top programming languages used for Internet of Things projects 2016 [Dataset]. https://www.statista.com/statistics/658792/worldwide-internet-of-things-survey-programming-languages-used/
    Explore at:
    Dataset updated
    Apr 14, 2016
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 11, 2016 - Mar 25, 2016
    Area covered
    Worldwide
    Description

    The statistic shows distribution of programming languages used by Internet of Things developers, according to a survey conducted in 2016. At that time, **** percent of respondents indicated that they were using Node.js when developing Internet of Things solutions.

  5. f

    Data_Sheet_1_The method behind the unprecedented production of indicators of...

    • frontiersin.figshare.com
    docx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Pimienta; Álvaro Blanco; Gilvan Müller de Oliveira (2023). Data_Sheet_1_The method behind the unprecedented production of indicators of the presence of languages in the Internet.docx [Dataset]. http://doi.org/10.3389/frma.2023.1149347.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Daniel Pimienta; Álvaro Blanco; Gilvan Müller de Oliveira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reliable and updated indicators of the presence of languages in the Internet are required to drive efficiently policies for languages, to forecast e-commerce market or to support further researches on the field of digital support of languages. This article presents a complete description of the methodological elements involved in the production of an unprecedented set of indicators of the presence in the Internet of the 329 languages with more than 1 million L1 speakers. A special emphasis is given to the treatment of the comprehensive set of biases involved in the process, either from the method or the various sources used in the modeling process. The biases related to other sources providing similar data are also discussed, and in particular, it is shown how the lack of consideration of the high level of multilingualism of the Web leads to a huge overestimation of the presence of English. The detailed list of sources is presented in the various annexes. For the first time in the history of the Internet, the production of indicators about virtual presence of a large set of languages could allow progress in the fields of economy of languages, cyber-geography of languages and language policies for multilingualism.

  6. Number of Indian and English language internet users in India 2011-2021

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of Indian and English language internet users in India 2011-2021 [Dataset]. https://www.statista.com/statistics/718420/internet-user-base-by-language-india/
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    This statistic displays the number of Indian and English language internet users across India from 2011 to 2021. In 2016, the number of English internet users amounted to about *** million and was projected to increase to *** million in 2021. For Indian language users, this number was about *** million users in 2016, and was projected to reach *** million in 2021.

  7. e

    Languages available on the web in establishments with a website in the...

    • euskadi.eus
    csv, xlsx
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Languages available on the web in establishments with a website in the Basque Country according to province, activity branch (A38) and employment strata (%). [Dataset]. https://www.euskadi.eus/languages-available-on-the-web-in-establishments-with-a-website-in-the-basque-country-according-to-province-activity-branch-a38-and-employment-strata/web01-ejeduki/en/
    Explore at:
    xlsx(21.23), csv(3.51)Available download formats
    Dataset updated
    Jul 12, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Basque Country
    Description

    The statistical operation Survey on the Information Society-ESI- Companies, provides regular information on the implementation of New Information and Communication Technology -ICT- in the companies of the Basque Country. Specifically, it records and describes the level of use of the Internet in the different establishments: the systems of Internet access, activities carried out via the Internet, as well as the availability of the website and its main characteristics. It also measures the implementation of E-commerce purchases and sales in economic activity and the means used to carry it out.

  8. Data from: Exploring the Dominance of the English Language on the Websites...

    • zenodo.org
    • data.niaid.nih.gov
    bin, xls
    Updated Mar 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giannakoulopoulos Andreas; Pergantis Minas; Konstantinou Nikos; Lamprogeorgos Aristeidis; Limniati Laida; Varlamis Iraklis; Giannakoulopoulos Andreas; Pergantis Minas; Konstantinou Nikos; Lamprogeorgos Aristeidis; Limniati Laida; Varlamis Iraklis (2020). Exploring the Dominance of the English Language on the Websites of EU Countries [Dataset]. http://doi.org/10.5281/zenodo.3698008
    Explore at:
    xls, binAvailable download formats
    Dataset updated
    Mar 5, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giannakoulopoulos Andreas; Pergantis Minas; Konstantinou Nikos; Lamprogeorgos Aristeidis; Limniati Laida; Varlamis Iraklis; Giannakoulopoulos Andreas; Pergantis Minas; Konstantinou Nikos; Lamprogeorgos Aristeidis; Limniati Laida; Varlamis Iraklis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    European Union
    Description

    This Dataset, in 29 files of xlsx format, contains the data of all metrics and accumulated information as they are described in the methodology, results and discussion section of the research article "Exploring the Dominance of the English Language on the Websites of EU Countries".

  9. English Word Frequency

    • kaggle.com
    Updated Sep 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachael Tatman (2017). English Word Frequency [Dataset]. https://www.kaggle.com/datasets/rtatman/english-word-frequency/discussion?sortBy=hot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 6, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rachael Tatman
    Description

    Context:

    How frequently a word occurs in a language is an important piece of information for natural language processing and linguists. In natural language processing, very frequent words tend to be less informative than less frequent one and are often removed during preprocessing. Human language users are also sensitive to word frequency. How often a word is used affects language processing in humans. For example, very frequent words are read and understood more quickly and can be understood more easily in background noise.

    Content:

    This dataset contains the counts of the 333,333 most commonly-used single words on the English language web, as derived from the Google Web Trillion Word Corpus.

    Acknowledgements:

    Data files were derived from the Google Web Trillion Word Corpus (as described by Thorsten Brants and Alex Franz, and distributed by the Linguistic Data Consortium) by Peter Norvig. You can find more information on these files and the code used to generate them here.

    The code used to generate this dataset is distributed under the MIT License.

    Inspiration:

    • Can you tag the part of speech of these words? Which parts of speech are most frequent? Is this similar to other languages, like Japanese?
    • What differences are there between the very frequent words in this dataset, and the the frequent words in other corpora, such as the Brown Corpus or the TIMIT corpus? What might these differences tell us about how language is used?
  10. Multi Indic Languages News Dataset

    • kaggle.com
    zip
    Updated Jun 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Shahebaz (2020). Multi Indic Languages News Dataset [Dataset]. https://www.kaggle.com/shaz13/multi-indic-languages-news-dataset
    Explore at:
    zip(608094970 bytes)Available download formats
    Dataset updated
    Jun 14, 2020
    Authors
    Mohammad Shahebaz
    Description

    Dataset

    This dataset was created by Mohammad Shahebaz

    Contents

    It contains the following files:

  11. e

    German Legal monolingual corpus from the contensts of the...

    • data.europa.eu
    • live.european-language-grid.eu
    zip
    Updated Jul 22, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Directorate-General for Communications Networks, Content and Technology (2019). German Legal monolingual corpus from the contensts of the https://www.gesetze-im-internet.de/ web site [Dataset]. https://data.europa.eu/euodp/en/data/dataset/elrc_2446
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 22, 2019
    Dataset authored and provided by
    Directorate-General for Communications Networks, Content and Technology
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    German Legal monolingual corpus from the contensts of the https://www.gesetze-im-internet.de/ web site

    This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) actions SMART 2014/1074 and SMART 2015/1091. For further information on the project: http://lr-coordination.eu.

  12. w

    Dataset of books about Language and languages-Web-based instruction

    • workwithdata.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books about Language and languages-Web-based instruction [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_subject&fop0=%3D&fval0=Language+and+languages-Web-based+instruction&j=1&j0=book_subjects
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 4 rows and is filtered where the book subjects is Language and languages-Web-based instruction. It features 9 columns including author, publication date, language, and book publisher.

  13. Most common sources of language errors on the internet in Poland 2023

    • statista.com
    Updated Feb 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most common sources of language errors on the internet in Poland 2023 [Dataset]. https://www.statista.com/statistics/1098947/poland-most-common-places-for-language-errors-online/
    Explore at:
    Dataset updated
    Feb 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Poland
    Description

    According to the source, 9,154 language errors were published each day on the internet in Poland in 2023. Over 38 percent of mistakes were found on Facebook, 20.21 percent on Twitter.

  14. g

    Population aged 15 and over in the Basque Country Internet user by place of...

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Population aged 15 and over in the Basque Country Internet user by place of access and languages used, according to Historical Territory (%). | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_7be3bfe48c6f9f255cb7c58bedb4c2c05a4341a3/
    Explore at:
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Basque Country
    Description

    The statistical operation Information Society Survey-ESI-Familias, provides periodic information on the implementation of the new Information and Communication Technologies -ICT- in the population of the Basque Country. In particular, it computes and describes the ICT equipment of the population both in the home and in the study center or in the workplace, and measures the level of use that is made of them, especially those related to the Internet. It allows us to compare the level of implementation of these ICT technologies in Basque society in relation to other countries in its environment. The statistical operation Information Society Survey-ESI-Familias, provides periodic information on the implementation of the new Information and Communication Technologies -ICT- in the population of the Basque Country. In particular, it computes and describes the ICT equipment of the population both in the home and in the study center or in the workplace, and measures the level of use that is made of them, especially those related to the Internet. It allows us to compare the level of implementation of these ICT technologies in Basque society in relation to other countries in its environment. The statistical operation Information Society Survey-ESI-Familias, provides periodic information on the implementation of the new Information and Communication Technologies -ICT- in the population of the Basque Country. In particular, it computes and describes the ICT equipment of the population both in the home and in the study center or in the workplace, and measures the level of use that is made of them, especially those related to the Internet. It allows us to compare the level of implementation of these ICT technologies in Basque society in relation to other countries in its environment.

  15. Language: Introductory Web Quest

    • library.ncge.org
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCGE (2021). Language: Introductory Web Quest [Dataset]. https://library.ncge.org/documents/956087ccd96b491583c06d0a0ef2cb99
    Explore at:
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    National Council for Geographic Educationhttp://www.ncge.org/
    Authors
    NCGE
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Author: Lisa SandersGrade/Audience: high school, ap human geographyResource type: activitySubject topic(s): culture, human geographyRegion: worldStandards: APHG CED Unit 3Objectives: See the APHG CED Unit 3 for specific objectives covered in this activity. Summary: This webquest gives students links to follow to learn about the concepts associated with langugage in AP Human Geography. Can be adapted for World Geography classes.

  16. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  17. e

    Population aged 15 and over of the Basque Country who are Internet users by...

    • euskadi.eus
    csv, xlsx
    Updated Oct 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Population aged 15 and over of the Basque Country who are Internet users by place of access and languages used, according to Province (%). [Dataset]. https://www.euskadi.eus/population-aged-15-and-over-of-the-basque-country-who-are-internet-users-by-place-of-access-and-languages-used-according-to-province/web01-ejeduki/en/
    Explore at:
    csv(0.64), xlsx(16.96)Available download formats
    Dataset updated
    Oct 30, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Basque Country
    Description

    The statistical operation Survey on the Information Society-ESI-Families, provides regular information on the implementation of New Information and Communication Technology -ICT- among the population of the Basque Country. Specifically, it records and describes ICT equipment of the population both in the home and the place of study or in the workplace and measures the level of use made of it, especially as related to the Internet. It lets us compare the level of implementation of these ICT technologies In Basque society in relation to other surrounding communities.

  18. p

    Trends in Reading and Language Arts Proficiency (2010-2022): Internet...

    • publicschoolreview.com
    Updated Feb 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2025). Trends in Reading and Language Arts Proficiency (2010-2022): Internet Academy vs. Washington vs. Federal Way School District [Dataset]. https://www.publicschoolreview.com/internet-academy-profile
    Explore at:
    Dataset updated
    Feb 9, 2025
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Federal Way School District
    Description

    This dataset tracks annual reading and language arts proficiency from 2010 to 2022 for Internet Academy vs. Washington and Federal Way School District

  19. e

    Flash Eurobarometer 313: User language preferences online

    • data.europa.eu
    zip
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Directorate-General for Communication, Flash Eurobarometer 313: User language preferences online [Dataset]. https://data.europa.eu/data/datasets/s880_313?locale=sv
    Explore at:
    zipAvailable download formats
    Dataset authored and provided by
    Directorate-General for Communication
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Flash Eurobarometer studied how Europeans use different languages online. While 90% of European internet users prefer to surf the internet in their own language, 55% at least occasionally use a language other than their own when online according to a pan-EU Eurobarometer survey released today. However, 44% feel they are missing interesting information because web pages are not in a language that they understand.

    The results by volumes are distributed as follows:
    • Volume A: Countries
    • Volume AA: Groups of countries
    • Volume A' (AP): Trends
    • Volume AA' (AAP): Trends of groups of countries
    • Volume B: EU/socio-demographics
    • Volume B' (BP) : Trends of EU/ socio-demographics
    • Volume C: Country/socio-demographics ---- Researchers may also contact GESIS - Leibniz Institute for the Social Sciences: https://www.gesis.org/eurobarometer
  20. Jigsaw Train Translated (Yandex API)

    • kaggle.com
    Updated May 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ma7555 (2020). Jigsaw Train Translated (Yandex API) [Dataset]. https://www.kaggle.com/ma7555/jigsaw-train-translated-yandex-api/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ma7555
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Dataset

    This dataset was created by ma7555

    Released under GPL 2

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
Organization logo

Common languages used for web content 2025, by share of websites

Explore at:
75 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description

As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

Search
Clear search
Close search
Google apps
Main menu