29 datasets found
  1. Common languages used for web content 2025, by share of websites

    • statista.com
    • ai-chatbox.pro
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 2025
    Area covered
    Worldwide
    Description

    As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

  2. The most spoken languages worldwide 2025

    • statista.com
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). The most spoken languages worldwide 2025 [Dataset]. https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2025
    Area covered
    World
    Description

    In 2025, there were around 1.53 billion people worldwide who spoke English either natively or as a second language, slightly more than the 1.18 billion Mandarin Chinese speakers at the time of survey. Hindi and Spanish accounted for the third and fourth most widespread languages that year. Languages in the United States The United States does not have an official language, but the country uses English, specifically American English, for legislation, regulation, and other official pronouncements. The United States is a land of immigration, and the languages spoken in the United States vary as a result of the multicultural population. The second most common language spoken in the United States is Spanish or Spanish Creole, which over than 43 million people spoke at home in 2023. There were also 3.5 million Chinese speakers (including both Mandarin and Cantonese),1.8 million Tagalog speakers, and 1.57 million Vietnamese speakers counted in the United States that year. Different languages at home The percentage of people in the United States speaking a language other than English at home varies from state to state. The state with the highest percentage of population speaking a language other than English is California. About 45 percent of its population was speaking a language other than English at home in 2023.

  3. Most common sources of language errors on the internet in Poland 2023

    • statista.com
    Updated Feb 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most common sources of language errors on the internet in Poland 2023 [Dataset]. https://www.statista.com/statistics/1098947/poland-most-common-places-for-language-errors-online/
    Explore at:
    Dataset updated
    Feb 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Poland
    Description

    According to the source, 9,154 language errors were published each day on the internet in Poland in 2023. Over 38 percent of mistakes were found on Facebook, 20.21 percent on Twitter.

  4. Preferred language to access the internet India 2023

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Preferred language to access the internet India 2023 [Dataset]. https://www.statista.com/statistics/1459294/india-internet-access-by-language/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    According to a 2023 survey, ** percent of internet users in urban India preferred using the internet in English. Meanwhile, ** percent of users accessed the internet in Indian languages, with Hindi being the most preferred language among them. Over *** million internet users reside in the urban areas of India.

  5. D

    Digital Spanish Language Learning Market Report | Global Forecast From 2025...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Digital Spanish Language Learning Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-digital-spanish-language-learning-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Digital Spanish Language Learning Market Outlook




    The global market size for digital Spanish language learning was valued at approximately USD 1.2 billion in 2023 and is projected to reach around USD 3.8 billion by 2032, growing at a robust CAGR of 13.6% from 2024 to 2032. This impressive growth is driven by numerous factors, including the increasing globalization and cultural exchange, technological advancements in digital learning platforms, and the rising demand for multilingual proficiency in the professional world. These growth factors are collectively contributing to the substantial expansion of the digital Spanish language learning market.




    One of the primary growth drivers for this market is the increasing globalization of business and the growing importance of Spanish as a global language. With over 580 million speakers worldwide, Spanish ranks as the second most spoken native language, following Mandarin. Businesses, educational institutions, and individuals are increasingly recognizing the value of Spanish proficiency, leading to a surge in demand for effective and accessible language learning solutions. This trend is particularly pronounced in the corporate sector, where organizations are looking to enhance their workforce's language skills to facilitate better communication with Spanish-speaking clients and partners.




    Technological advancements have also played a crucial role in propelling the market forward. The proliferation of smartphones, high-speed internet connections, and advanced software applications has made digital language learning more accessible and engaging. Innovative features such as artificial intelligence, machine learning, and immersive virtual reality experiences are being integrated into language learning platforms, providing users with personalized and interactive learning experiences. These technological innovations are not only enhancing the effectiveness of language learning but also making it more appealing to a broader audience.




    Furthermore, the COVID-19 pandemic has acted as a catalyst for the growth of the digital Spanish language learning market. With traditional classroom-based learning disrupted, there has been a significant shift towards online education, including language learning. The convenience, flexibility, and accessibility offered by digital platforms have attracted a diverse range of learners, from individual enthusiasts to educational institutions and corporate entities. This shift is expected to have a lasting impact, with online and digital learning becoming an integral part of the education landscape even in the post-pandemic era.




    Regionally, North America and Europe have been at the forefront of adopting digital Spanish language learning solutions, driven by a combination of high internet penetration, a strong emphasis on education, and a multicultural population. However, the Asia Pacific region is emerging as a significant growth market, fueled by increasing interest in language learning, rapid digitalization, and the growing presence of global businesses requiring multilingual capabilities. Latin America, with its native Spanish-speaking population, also presents substantial opportunities for market expansion, particularly in the educational and corporate sectors.



    The rise of the Language Learning App has significantly contributed to the accessibility and convenience of acquiring new languages. These apps offer a variety of features, such as interactive exercises, real-time feedback, and community engagement, which make learning more engaging and effective. The ability to learn anytime and anywhere has made language learning apps particularly popular among busy professionals and students who seek to integrate language acquisition into their daily routines. As technology continues to evolve, these apps are incorporating advanced features like speech recognition and AI-driven personalized learning paths, further enhancing the user experience and effectiveness of language learning.



    Product Type Analysis




    The digital Spanish language learning market is segmented by product type into software, apps, online courses, and tutoring services. Each segment caters to different preferences and needs of learners, offering a diverse range of options for acquiring Spanish language skills. Software solutions, including comprehensive language learning programs, h

  6. Pinyin Input Method Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Pinyin Input Method Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-pinyin-input-method-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Pinyin Input Method Market Outlook



    The Pinyin Input Method Market has been experiencing a significant trajectory in market size, with global figures estimated at $1.5 billion in 2023 and projected to reach approximately $2.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 7%. This robust growth can be attributed to several key factors, including the increasing digitalization across various sectors, the proliferation of smartphones, and the growing demand for efficient input methods that cater to Mandarin-speaking populations worldwide. The escalation of internet usage and the need for seamless communication in one of the most spoken languages globally is further propelling the market's upward trend.



    One of the primary growth factors driving the Pinyin Input Method Market is the rapid digital transformation across industries. As businesses and educational institutions increasingly adopt digital platforms, there is a heightened need for effective input methods that can cater to Chinese-speaking users. The Pinyin input method, being one of the most efficient and widely used systems for Chinese character input, aligns perfectly with the needs of this growing user base. Additionally, the rise of e-learning platforms and remote work has necessitated reliable input methods, further contributing to market growth. The integration of Pinyin input across multiple devices and platforms, such as smartphones, tablets, and computers, has broadened its accessibility and usability, making it indispensable in the digital age.



    Another significant growth factor is the increasing penetration of smartphones and mobile internet services. With Asia, particularly China, witnessing a surge in smartphone adoption, the demand for user-friendly and efficient input methods like Pinyin has soared. Mobile users require quick and intuitive typing solutions that can seamlessly integrate with their devices and applications. The Pinyin input method, with its ease of use and compatibility, perfectly meets these demands, thereby driving market expansion. Moreover, ongoing technological advancements in natural language processing and machine learning have enhanced the accuracy and predictive capabilities of Pinyin input systems, further boosting their adoption across diverse user segments.



    The expansion of the Pinyin Input Method Market is also fueled by globalization and the growing significance of the Chinese language in international business, education, and cultural exchanges. As more non-native speakers seek to learn Mandarin for professional and personal reasons, the demand for effective learning tools, including Pinyin input methods, has surged. Educational institutions and language learning platforms are increasingly incorporating Pinyin input systems to facilitate the learning process and improve user engagement. This trend is expected to continue as the Chinese language gains prominence on the global stage, contributing to sustained market growth.



    Regionally, Asia Pacific dominates the Pinyin Input Method Market due to the high concentration of Mandarin speakers and the widespread adoption of digital technologies. North America and Europe are also witnessing growth, driven by the increasing interest in Mandarin language learning and cross-cultural communications. In Latin America and the Middle East & Africa, the market is gradually expanding as more educational and business entities recognize the value of integrating Chinese language capabilities. The regional outlook highlights the global significance of the Pinyin input method in facilitating communication and bridging linguistic gaps in an increasingly interconnected world.



    Product Type Analysis



    The Pinyin Input Method Market can be segmented by product type into software and hardware. Software solutions dominate this market segment, primarily due to their versatility and wide applicability across various devices and platforms. These solutions can be easily installed and integrated into existing systems, making them a preferred choice for both individual users and organizations. Software-based Pinyin input methods offer extensive customization options, allowing users to tailor their typing experience to their preferences, which enhances user satisfaction and drives market growth. The continuous development of advanced features, such as predictive text and voice recognition, further elevates the value proposition of software solutions in this market.



    On the other hand, hardware solutions, although a smaller segment, play a crucial role in specific applications. Dedicated Pinyin input hardware, such as keyboards

  7. d

    R2 & NE: County Level 2006-2010 ACS Languages Spoken Summary.

    • datadiscoverystudio.org
    • cloud.csiss.gmu.edu
    Updated Jan 9, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). R2 & NE: County Level 2006-2010 ACS Languages Spoken Summary. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/67bb9d50a74c4aa1aed14510f63462b8/html
    Explore at:
    Dataset updated
    Jan 9, 2018
    Description

    description: The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The primary legal divisions of most States are termed counties. In Louisiana, these divisions are known as parishes. In Alaska, which has no counties, the equivalent entities are the organized boroughs, city and boroughs, and municipalities, and for the unorganized area, census areas. The latter are delineated cooperatively for statistical purposes by the State of Alaska and the Census Bureau. In four States (Maryland, Missouri, Nevada, and Virginia), there are one or more incorporated places that are independent of any county organization and thus constitute primary divisions of their States. These incorporated places are known as independent cities and are treated as equivalent entities for purposes of data presentation. The District of Columbia and Guam have no primary divisions, and each area is considered an equivalent entity for purposes of data presentation. The Census Bureau treats the following entities as equivalents of counties for purposes of data presentation: Municipios in Puerto Rico, Districts and Islands in American Samoa, Municipalities in the Commonwealth of the Northern Mariana Islands, and Islands in the U.S. Virgin Islands. The entire area of the United States, Puerto Rico, and the Island Areas is covered by counties or equivalent entities. The 2010 Census boundaries for counties and equivalent entities are as of January 1, 2010, primarily as reported through the Census Bureau's Boundary and Annexation Survey (BAS).

    This table contains data on individual languages spoken from the American Community Survey 2006-2010 database for counties. The American Community Survey (ACS) is a household survey conducted by the U.S. Census Bureau that currently has an annual sample size of about 3.5 million addresses. ACS estimates provides communities with the current information they need to plan investments and services. Information from the survey generates estimates that help determine how more than $400 billion in federal and state funds are distributed annually. Each year the survey produces data that cover the periods of 1-year, 3-year, and 5-year estimates for geographic areas in the United States and Puerto Rico, ranging from neighborhoods to Congressional districts to the entire nation. This table also has a companion table (Same table name with MOE Suffix) with the margin of error (MOE) values for each estimated element. MOE is expressed as a measure value for each estimated element. So a value of 25 and an MOE of 5 means 25 +/- 5 (or statistical certainty between 20 and 30). There are also special cases of MOE. An MOE of -1 means the associated estimates do not have a measured error. An MOE of 0 means that error calculation is not appropriate for the associated value. An MOE of 109 is set whenever an estimate value is 0. The MOEs of aggregated elements and percentages must be calculated. This process means using standard error calculations as described in "American Community Survey Multiyear Accuracy of the Data (3-year 2008-2010 and 5-year 2006-2010)". Also, following Census guidelines, aggregated MOEs do not use more than 1 0-element MOE (109) to prevent over estimation of the error. Due to the complexity of the calculations, some percentage MOEs cannot be calculated (these are set to null in the summary-level MOE tables).

    The name for table 'ACS10LSPCNTYMOE' was added as a prefix to all field names imported from that table. Be sure to turn off 'Show Field Aliases' to see complete field names in the Attribute Table of this feature layer. This can be done in the 'Table Options' drop-down menu in the Attribute Table or with key sequence '[CTRL]+[SHIFT]+N'. Due to database restrictions, the prefix may have been abbreviated if the field name exceded the maximum allowed characters.; abstract: The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The primary legal divisions of most States are termed counties. In Louisiana, these divisions are known as parishes. In Alaska, which has no counties, the equivalent entities are the organized boroughs, city and boroughs, and municipalities, and for the unorganized area, census areas. The latter are delineated cooperatively for statistical purposes by the State of Alaska and the Census Bureau. In four States (Maryland, Missouri, Nevada, and Virginia), there are one or more incorporated places that are independent of any county organization and thus constitute primary divisions of their States. These incorporated places are known as independent cities and are treated as equivalent entities for purposes of data presentation. The District of Columbia and Guam have no primary divisions, and each area is considered an equivalent entity for purposes of data presentation. The Census Bureau treats the following entities as equivalents of counties for purposes of data presentation: Municipios in Puerto Rico, Districts and Islands in American Samoa, Municipalities in the Commonwealth of the Northern Mariana Islands, and Islands in the U.S. Virgin Islands. The entire area of the United States, Puerto Rico, and the Island Areas is covered by counties or equivalent entities. The 2010 Census boundaries for counties and equivalent entities are as of January 1, 2010, primarily as reported through the Census Bureau's Boundary and Annexation Survey (BAS).

    This table contains data on individual languages spoken from the American Community Survey 2006-2010 database for counties. The American Community Survey (ACS) is a household survey conducted by the U.S. Census Bureau that currently has an annual sample size of about 3.5 million addresses. ACS estimates provides communities with the current information they need to plan investments and services. Information from the survey generates estimates that help determine how more than $400 billion in federal and state funds are distributed annually. Each year the survey produces data that cover the periods of 1-year, 3-year, and 5-year estimates for geographic areas in the United States and Puerto Rico, ranging from neighborhoods to Congressional districts to the entire nation. This table also has a companion table (Same table name with MOE Suffix) with the margin of error (MOE) values for each estimated element. MOE is expressed as a measure value for each estimated element. So a value of 25 and an MOE of 5 means 25 +/- 5 (or statistical certainty between 20 and 30). There are also special cases of MOE. An MOE of -1 means the associated estimates do not have a measured error. An MOE of 0 means that error calculation is not appropriate for the associated value. An MOE of 109 is set whenever an estimate value is 0. The MOEs of aggregated elements and percentages must be calculated. This process means using standard error calculations as described in "American Community Survey Multiyear Accuracy of the Data (3-year 2008-2010 and 5-year 2006-2010)". Also, following Census guidelines, aggregated MOEs do not use more than 1 0-element MOE (109) to prevent over estimation of the error. Due to the complexity of the calculations, some percentage MOEs cannot be calculated (these are set to null in the summary-level MOE tables).

    The name for table 'ACS10LSPCNTYMOE' was added as a prefix to all field names imported from that table. Be sure to turn off 'Show Field Aliases' to see complete field names in the Attribute Table of this feature layer. This can be done in the 'Table Options' drop-down menu in the Attribute Table or with key sequence '[CTRL]+[SHIFT]+N'. Due to database restrictions, the prefix may have been abbreviated if the field name exceded the maximum allowed characters.

  8. c

    Online English Learning Platform market was valued at USD 4.15 Billion in...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). Online English Learning Platform market was valued at USD 4.15 Billion in 2022 [Dataset]. https://www.cognitivemarketresearch.com/online-english-learning-platform-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Apr 7, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    The online English learning platform market was valued at USD 4.15 Billion in 2022 and will reach USD 17.73 Billion, with a CAGR of 15.63% from 2023 to 2030 Market Dynamics of the Online English Learning Platform Market:

    Adoption of English as a world language will propel the market: 
    

    The market is expanding as a result of the acceptance of English as a global language. English is spoken by 20% of the world's population. It is used as a standard form of training in many nations. Additionally, basic English proficiency is a requirement for admission to many overseas colleges. International firms like Airbus, Daimler-Chrysler, Renault, and Samsung have made English their primary language of commerce. As a result, cross-border commercial and corporate communications are frequently held in English. Over the course of the projected period, these factors will support the expansion of the worldwide market for online English learning platforms.

    Advancement in technology is predicted to boost the market:
    

    The ongoing improvements in technology resulted in the growing popularity of the market for online English learning platforms. The on-premise and cloud-based technologies offered by digital English learning trainers include online portals, CDs, DVDs, and app-based learning. Students may complete all aspects of their education—from the entrance to certification—without physically visiting a college or other institution by means of the Internet. High-speed internet enables the student to download the required materials and to continuously view online courses. These factors are boosting the online English learning platform market.

    Technical issues may obstruct the market growth:
    

    The primary obstacle anticipated to limit the growth of the online English learning platform market is the technical challenges related to digital learning platforms. Unreliable internet connections, issues with video conferencing platforms, and restricted access to essential hardware or software are few of the technical hurdles that might arise in online English learning platforms.

    Impact of the COVID-19 Pandemic on the Online English Learning Platform Market:

    Due to the outbreak of COVID-19, the education system shifted to online platforms. The market for online English learning platforms has gained massive demand due to restrictions like lockdowns and social distancing. Many students, teachers, and other users choose online platforms to learn many things including the English language. Numerous projects were made by different companies to expand the worldwide market for online language instruction. Post-COVID-19 pandemic, the market is still growing due to its convenience and cost-effectiveness. Introduction of Online English Learning Platform

    More than one million individuals use English as a first language, and it is widely accepted. After Mandarin, English is the second most popular language to study. The demand for studying the English language is consistently rising as a result of growing urbanization and digitization, as well as the need for improved educational and career prospects. Information and communications technology (ICT) technologies have been utilized to offer educational content in digital formats since the invention of the Internet. Digital English language learning refers to products and services that make it easier to learn languages through ICT.

  9. Non-English digital payment internet users in India - by language 2016

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Non-English digital payment internet users in India - by language 2016 [Dataset]. https://www.statista.com/statistics/719004/digital-payments-non-english-internet-users-by-language-india/
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2016
    Area covered
    India
    Description

    This statistic represents the number of non-English digital payment internet users across India in 2016, based on language. Hindi internet users had the highest number of digital payment users amounting to about ** million, followed by Tamil internet users at about **** million during the measured time period.

  10. Barometer for Swedish-speaking Finns B5/2021

    • services.fsd.tuni.fi
    • datacatalogue.cessda.eu
    zip
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Finnish Research Infrastructure for Public Opinion; Lindell, Marina (2025). Barometer for Swedish-speaking Finns B5/2021 [Dataset]. http://doi.org/10.60686/t-fsd3777
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    Finnish Social Science Data Archive
    Authors
    Finnish Research Infrastructure for Public Opinion; Lindell, Marina
    Description

    The survey charted the consumption of news and media and the trust in different media by Swedish-speaking Finns. Views on corruption were also examined. The data was collected as part of the Citizen Panel of Swedish-speaking Finns (Barometern), which is part of The Finnish Research Infrastructure for Public Opinion (FIRIPO). Respondents were first asked about the amount of media they use, followed by more detailed questions about their use of news media and social media. Next, respondents were asked to rate their level of trust in the different news media. In this context, they were also asked about their perception of the objectivity of journalists. Respondents were also asked about their willingness to pay for Swedish-speaking Finns online news. They were asked about the reasons why they would pay for online news and how much they would be prepared to pay for it. Similarly, they were asked about the reasons for not being willing to pay for online news and the reasons for cancelling a subscription. Next, respondents were asked about their use of other digital services and changes in their use since the Covid19 pandemic. Finally, respondents were asked about their views on the occurrence of corruption and their trust in different institutions. Background variables included the respondent's NUTS3 region of residence, age, gender, mother tongue, level of education, occupational status and political party choice.

  11. D

    LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED...

    • data.seattle.gov
    • hub.arcgis.com
    • +1more
    application/rdfxml +5
    Updated Oct 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER IN LIMITED ENGLISH SPEAKING HOUSEHOLDS (B16003) [Dataset]. https://data.seattle.gov/d/f7be-4spp
    Explore at:
    csv, application/rssxml, tsv, application/rdfxml, json, xmlAvailable download formats
    Dataset updated
    Oct 22, 2024
    Description

    Table from the American Community Survey (ACS) B16003 of age by language spoken at home for the population 5 years and over in limited English-speaking households. These are multiple, nonoverlapping vintages of the 5-year ACS estimates of population and housing attributes starting in 2010 shown by the corresponding census tract vintage. Also includes the most recent release annually.


    King County, Washington census tracts with nonoverlapping vintages of the 5-year American Community Survey (ACS) estimates starting in 2010. Vintage identified in the "ACS Vintage" field.

    The census tract boundaries match the vintage of the ACS data (currently 2010 and 2020) so please note the geographic changes between the decades.

    Tracts have been coded as being within the City of Seattle as well as assigned to neighborhood groups called "Community Reporting Areas". These areas were created after the 2000 census to provide geographically consistent neighborhoods through time for reporting U.S. Census Bureau data. This is not an attempt to identify neighborhood boundaries as defined by neighborhoods themselves.

    Vintages: 2010, 2015, 2020, 2021, 2022, 2023
    ACS Table(s): B16003


    The United States Census Bureau's American Community Survey (ACS):
    This ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.

    Data Note from the Census:
    Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.

    Data Processing Notes:
    • Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb(year)a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as <span style='font-family:inherit; margin:0px;

  12. Language Identifier

    • kaggle.com
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shrivats Sudhir (2023). Language Identifier [Dataset]. https://www.kaggle.com/datasets/shrivatssudhir/language-identifier
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shrivats Sudhir
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Common Voice (by Moz://a) is a publicly available dataset that contains speech audio in various languages. Further details can be found here. The dataset shown here was downloaded from their site, where only those languages were downloaded which had sufficient data, validated hours, and unique speakers.

    I extracted the audio from 34 languages, and created this dataset to be used for creating ML projects that conduct audio classification based on languages. Each audio is an mp3 file where the narrators records themselves saying a phrase in their respective language. For each language there is a varying number of unique speakers, hours spoken, and validated hours. Going through the previews will give a good sense of what the audio sound like.

    Please refer back to the Common Voice for any questions and concerns about the dataset. I have merely set up this Kaggle page so that more people are aware of the website and can volunteer to speak, listen, or write and increase the quantity and quality of their database. There are many fruitful applications of audio and language related datasets, however it is extremely crucial that all ethical, moral, and intellectual rights are adhered to when using this dataset.

  13. w

    R2 & NE: Tract Level 2006-2010 ACS Languages Spoken Summary

    • data.wu.ac.at
    • cloud.csiss.gmu.edu
    tgrshp (compressed)
    Updated Jan 13, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2018). R2 & NE: Tract Level 2006-2010 ACS Languages Spoken Summary [Dataset]. https://data.wu.ac.at/schema/data_gov/MjdhMDBlMWUtM2RjMi00ODA5LWI4OGItNTI3MzIyZmQ1M2Iw
    Explore at:
    tgrshp (compressed)Available download formats
    Dataset updated
    Jan 13, 2018
    Dataset provided by
    U.S. Environmental Protection Agency
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    121a12e71abbcab7746a9257517a6acd099b28e4
    Description

    The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.

    This table contains data on individual languages spoken from the American Community Survey 2006-2010 database for tracts. The American Community Survey (ACS) is a household survey conducted by the U.S. Census Bureau that currently has an annual sample size of about 3.5 million addresses. ACS estimates provides communities with the current information they need to plan investments and services. Information from the survey generates estimates that help determine how more than $400 billion in federal and state funds are distributed annually. Each year the survey produces data that cover the periods of 1-year, 3-year, and 5-year estimates for geographic areas in the United States and Puerto Rico, ranging from neighborhoods to Congressional districts to the entire nation. This table also has a companion table (Same table name with MOE Suffix) with the margin of error (MOE) values for each estimated element. MOE is expressed as a measure value for each estimated element. So a value of 25 and an MOE of 5 means 25 +/- 5 (or statistical certainty between 20 and 30). There are also special cases of MOE. An MOE of -1 means the associated estimates do not have a measured error. An MOE of 0 means that error calculation is not appropriate for the associated value. An MOE of 109 is set whenever an estimate value is 0. The MOEs of aggregated elements and percentages must be calculated. This process means using standard error calculations as described in "American Community Survey Multiyear Accuracy of the Data (3-year 2008-2010 and 5-year 2006-2010)". Also, following Census guidelines, aggregated MOEs do not use more than 1 0-element MOE (109) to prevent over estimation of the error. Due to the complexity of the calculations, some percentage MOEs cannot be calculated (these are set to null in the summary-level MOE tables).

    The name for table 'ACS10LSPTRMOE' was added as a prefix to all field names imported from that table. Be sure to turn off 'Show Field Aliases' to see complete field names in the Attribute Table of this feature layer. This can be done in the 'Table Options' drop-down menu in the Attribute Table or with key sequence '[CTRL]+[SHIFT]+N'. Due to database restrictions, the prefix may have been abbreviated if the field name exceded the maximum allowed characters.

  14. Leading internet activities among Indian language users India 2023

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading internet activities among Indian language users India 2023 [Dataset]. https://www.statista.com/statistics/1459306/india-internet-activities-of-indian-language-users/
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    According to a 2023 survey, the leading activity carried out on the internet by users in Indian languages was watching videos as reported by ** percent of the respondents. Listening to music was the second most popular activity within this demographic. Over *** million internet users reside in the urban areas of India.

  15. P

    Skit-S2I Dataset

    • paperswithcode.com
    Updated Dec 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shangeth Rajaa; Swaraj Dalmia; Kumarmanas Nethil (2022). Skit-S2I Dataset [Dataset]. https://paperswithcode.com/dataset/skit-s2i
    Explore at:
    Dataset updated
    Dec 25, 2022
    Authors
    Shangeth Rajaa; Swaraj Dalmia; Kumarmanas Nethil
    Description

    This dataset for Intent classification from human speech covers 14 coarse-grained intents from the Banking domain. This work is inspired by a similar release in the Minds-14 dataset - here, we restrict ourselves to Indian English but with a much larger training set. The data was generated by 11 (Indian English) speakers and recorded over a telephony line. We also provide access to anonymized speaker information - like gender, languages spoken, and native language - to allow more structured discussions around robustness and bias in the models you train.

  16. D

    Languages and English Ability - Seattle Neighborhoods

    • data.seattle.gov
    • gimi9.com
    • +3more
    application/rdfxml +5
    Updated Oct 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Languages and English Ability - Seattle Neighborhoods [Dataset]. https://data.seattle.gov/dataset/Languages-and-English-Ability-Seattle-Neighborhood/d2c7-tkpy
    Explore at:
    json, csv, tsv, xml, application/rssxml, application/rdfxmlAvailable download formats
    Dataset updated
    Oct 22, 2024
    Area covered
    Seattle
    Description

    Table from the American Community Survey (ACS) 5-year series on languages spoken and English ability related topics for City of Seattle Council Districts, Comprehensive Plan Growth Areas and Community Reporting Areas. Table includes B16004 Age by Language Spoken at Home by Ability to Speak English, C16002 Household Language by Household Limited English-Speaking Status. Data is pulled from block group tables for the most recent ACS vintage and summarized to the neighborhoods based on block group assignment.


    Table created for and used in the Neighborhood Profiles application.

    Vintages: 2023
    ACS Table(s): B16004, C16002


    The United States Census Bureau's American Community Survey (ACS):
    This ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.

    Data Note from the Census:
    Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.

    Data Processing Notes:
    • Boundaries come from the US Census TIGER geodatabases, specifically, the National Sub-State Geography Database (named tlgdb(year)a_us_substategeo.gdb). Boundaries are updated at the same time as the data updates (annually), and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For census tracts, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract level boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2020 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters).
    • The States layer contains 52 records - all US states, Washington D.C., and Puerto Rico
    • Census tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).
    • Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications <a href='https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020_ch08.pdf' style='color:rgb(0, 121, 193); text-decoration-line:none; font-family:inherit;' target='_blank' rel='nofollow ugc

  17. f

    List of country code (CC), countries as birth places of historical figures,...

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Ho Eom; Pablo Aragón; David Laniado; Andreas Kaltenbrunner; Sebastiano Vigna; Dima L. Shepelyansky (2023). List of country code (CC), countries as birth places of historical figures, and language code (LC) for each country. [Dataset]. http://doi.org/10.1371/journal.pone.0114825.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Young-Ho Eom; Pablo Aragón; David Laniado; Andreas Kaltenbrunner; Sebastiano Vigna; Dima L. Shepelyansky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LC is determined by the most spoken language in the given country. Country codes are based on country codes of Internet top-level domains and language codes are based on language edition codes of Wikipedia; WR represents all languages other than the considered 24 languages.

  18. E

    HinDialect 1.1: 26 Hindi-related languages and dialects of the Indic...

    • live.european-language-grid.eu
    binary format
    Updated Jul 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). HinDialect 1.1: 26 Hindi-related languages and dialects of the Indic Continuum in North India [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/20494
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Jul 13, 2022
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    North India, India
    Description

    HinDialect: 26 Hindi-related languages and dialects of the Indic Continuum in North India

    Languages This is a collection of folksongs for 26 languages that form a dialect continuum in North India and nearby regions.

    Namely Angika, Awadhi, Baiga, Bengali, Bhadrawahi, Bhili, Bhojpuri, Braj, Bundeli, Chhattisgarhi, Garhwali, Gujarati, Haryanvi, Himachali, Hindi, Kanauji, Khadi Boli, Korku, Kumaoni, Magahi, Malvi, Marathi, Nimadi, Panjabi, Rajasthani, Sanskrit.

    This data is originally collected by the Kavita Kosh Project at http://www.kavitakosh.org/ . Here are the main characteristics of the languages in this collection: - They are all Indic languages except for Korku. - The majority of them are closely related to the standard Hindi dialect genealogically (such as Hariyanvi and Bhojpuri), although the collection also contains languages such as Bengali and Gujarati which are more distant relatives. - They are all primarily spoken in (North) India (Bengali is also spoken in Bangladesh) - All except Sanksrit are alive languages

    Data Categorising them by pre-existing available NLP resources, we have: * Band 1 languages : Hindi, Panjabi, Gujarati, Bengali, Nepali. These languages already have other large standard datasets available. Kavita Kosh may have very little data for these languages. * Band 2 languages: Bhojpuri, Magahi, Awadhi, Braj. These languages have growing interest and some datasets of a relatively small size as compared to Band 1 language resources. * Band 3 languages: All other languages in the collection are previously zero-resource languages. These are the languages for which this dataset is the most relevant.

    Script This dataset is entirely in Devanagari. Content in the case of languages not written in Devanagari (such as Bengali and Gujarati) has been transliterated by the Kavita Kosh Project.

    Format The dataset contains a single text file containing folksongs per language. Folksongs are separated from each other by an empty line. The first line of a new piece is the title of the folksong, and line separation within folksongs is preserved.

  19. Parliamentary Elections 2007: Swedish-speaking Finns

    • services.fsd.tuni.fi
    • datasearch.gesis.org
    zip
    Updated Jan 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grönlund, Kimmo (2025). Parliamentary Elections 2007: Swedish-speaking Finns [Dataset]. http://doi.org/10.60686/t-fsd2431
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2025
    Dataset provided by
    Finnish Social Science Data Archive
    Authors
    Grönlund, Kimmo
    Description

    The election survey was conducted in Swedish and collected after the 2007 parliamentary elections. It is part of a four-year research project exploring politics in Swedish-speaking Finland (Svensk politik i Finland - beteende, opinion, framtid (2007-20010)). First, the respondents were queried how interested they were in politics and how much they followed the parliamentary elections in different media. The respondents' identification with various groups, self-perceived social class, and the stability of their political party preference were investigated, as well as what kind of measures they would be willing to take to promote things they considered important. The respondents were also presented with a set of attitudinal statements on voting, participating in parliamentary elections, political participation, political power, institutions, the Swedish People's Party in Finland, and services provided in Swedish. Some questions pertained to the Internet. The respondents were asked whether they had signed online petitions, contacted political decision-makers, discussed politics, or participated in other discussions. Municipal elected offices and organisational participation were also charted in the survey. The respondents were also asked to indicate the performance of Matti Vanhanen's Government, and to place different political parties as well as themselves on a left-right scale. Some questions examined how well political parties had distinguished themselves during the election campaign and the functioning of democracy in Finland. In addition, the respondents were asked whether they had voted in the 2007 parliamentary elections, and if not, what the reason for abstaining from voting was. Those who had voted were asked which party they had voted, which factors had affected their party choice, and whether the candidate or the party was a more important selection criterion. The respondents also evaluated how the Government had succeeded in managing the issues related to Swedish-speaking Finns and the status of Åland. Additionally, the respondents indicated which language was spoken at their home, how well they could speak Finnish, and how satisfied they were with their financial situation and life in general. Background variables included the respondent's year of birth, gender, education, marital status, economic activity, occupational group, household size, household income, region of residence, language spoken at home, language proportion in municipality of residence, religiosity, and electoral district.

  20. E-retail adoption among non-English internet users in India - by language...

    • statista.com
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). E-retail adoption among non-English internet users in India - by language 2016-2021 [Dataset]. https://www.statista.com/statistics/718960/e-retail-adoption-among-non-english-internet-users-by-language-india/
    Explore at:
    Dataset updated
    Mar 17, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2016
    Area covered
    India
    Description

    This statistic represents the share of e-retail adoption among non-English internet users across India in 2016, based on language, with a forecast for 2021. The adoption rate among Tamil users was the highest in 2016 with about 33 percent, and was projected to reach about 53 percent in 2021.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
Organization logo

Common languages used for web content 2025, by share of websites

Explore at:
69 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description

As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.

Search
Clear search
Close search
Google apps
Main menu