Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This assessment evaluates 34 internet activity data sources for use in disease surveillance. Our goals are to (a) understand the available data on internet usage and activity well enough to (b) identify real-world internet data sources that can be used both for evaluating our theories of disease surveillance and buildingoperational internet data-based disease surveillance systems.The assessment (pdf) and raw data (excel spreadsheet) are attached.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset provides a comprehensive overview of internet usage across countries as of 2024. It includes data on the percentage of the population using the internet, sourced from multiple organizations such as the World Bank (WB), International Telecommunication Union (ITU), and the CIA. The dataset covers all United Nations member states, excluding North Korea, and provides insights into internet penetration rates, user counts, and trends over recent years. The data is derived from household surveys and internet subscription statistics, offering a reliable snapshot of global digital connectivity.
This dataset can be used in various data science applications, including: - Digital Divide Analysis: Evaluate disparities in internet access between developed and developing nations. - Trend Analysis: Study the growth of internet penetration over time across different regions. - Policy Recommendations: Assist policymakers in identifying underserved areas and strategizing for improved connectivity. - Market Research: Help businesses identify potential markets for digital products or services. - Correlation Studies: Analyze relationships between internet penetration and socioeconomic indicators like GDP, education levels, or urbanization.
The dataset contains the following columns: 1. Location: Country or region name. 2. Rate (WB): Percentage of the population using the internet (World Bank data). 3. Year (WB): Year corresponding to the World Bank data. 4. Rate (ITU): Percentage of the population using the internet (ITU data). 5. Year (ITU): Year corresponding to the ITU data. 6. Users (CIA): Estimated number of internet users in absolute terms (CIA data). 7. Year (CIA): Year corresponding to the CIA data. 8. Notes: Additional notes or observations about specific entries.
The data has been sourced from publicly available and reputable organizations such as the World Bank, ITU, and CIA. These sources ensure transparency and ethical collection methods through household surveys and official statistics. The dataset excludes North Korea due to limited reliable information on its internet usage.
This dataset is based on information compiled from: - World Bank - International Telecommunication Union - CIA World Factbook - Wikipedia's "List of countries by number of Internet users" page
Special thanks to these organizations for providing open access to this valuable information, enabling deeper insights into global digital connectivity trends.
Citations: [1] https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users [2] https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users
Facebook
TwitterAI training draws heavily from the whole web, the largest data source with trillions of tokens, followed by sources like the indexed web and common crawl. This represents the estimated finality of tokens available in 2025, leading to a potential blockage for any AI models training on them.
Facebook
Twitter
According to our latest research, the global Internet Data Center market size stood at USD 68.3 billion in 2024, registering a robust growth trajectory. The market is forecasted to reach USD 165.7 billion by 2033, expanding at a healthy CAGR of 10.4% during the 2025-2033 period. The key growth factor driving this surge is the exponential rise in data generation, cloud computing adoption, and the proliferation of digital transformation initiatives across industries worldwide. As organizations increasingly prioritize business continuity, security, and scalability, the demand for advanced data center infrastructure is at an all-time high, shaping the future of the Internet Data Center market.
One of the primary drivers fueling the growth of the Internet Data Center market is the rapid expansion of digital services and applications, which has led to an unprecedented surge in global data traffic. The proliferation of Internet of Things (IoT) devices, video streaming, e-commerce, and social media platforms has necessitated the deployment of high-capacity, low-latency data centers capable of handling massive workloads. Enterprises and service providers are investing heavily in data center modernization, focusing on energy efficiency, automation, and robust connectivity to support these evolving digital ecosystems. The growing emphasis on hybrid and multi-cloud strategies further amplifies the need for flexible and scalable data center solutions, propelling market growth.
Another significant growth factor is the increasing adoption of artificial intelligence (AI), machine learning, and big data analytics across various sectors, including healthcare, finance, and retail. These technologies require substantial computational power and storage capabilities, driving demand for advanced data center infrastructure. Modern data centers are being designed to support high-density computing, GPU acceleration, and edge computing, enabling real-time data processing and analytics at scale. Additionally, the shift toward software-defined data centers (SDDC) and virtualization is transforming traditional data center architectures, enabling greater agility, cost-efficiency, and operational resilience. This evolution is further supported by advancements in network technologies such as 5G, which facilitate faster data transmission and improved user experiences.
Sustainability and energy efficiency have emerged as crucial considerations in the Internet Data Center market, as organizations and governments worldwide prioritize environmental responsibility. Data centers are significant consumers of electricity, prompting the adoption of green technologies, renewable energy sources, and innovative cooling solutions to minimize carbon footprints. Regulatory mandates and industry standards are driving investments in energy-efficient hardware, intelligent power management, and sustainable building practices. Leading market players are increasingly focusing on achieving carbon neutrality and leveraging circular economy principles, which not only reduce operational costs but also enhance brand reputation and stakeholder trust. This sustainable approach is expected to shape investment decisions and technological advancements in the coming years.
As the demand for data processing and storage continues to grow, the concept of a Hyperscale Data Center has emerged as a pivotal solution to meet these needs. Hyperscale data centers are designed to efficiently scale up resources, accommodating the vast amounts of data generated by modern digital activities. These facilities are characterized by their ability to support thousands of servers and millions of virtual machines, ensuring seamless performance and reliability. The architecture of hyperscale data centers focuses on maximizing energy efficiency and optimizing cooling systems, making them a sustainable choice for large-scale operations. As businesses increasingly rely on cloud services and big data analytics, the role of hyperscale data centers becomes ever more critical in providing the necessary infrastructure to support these advanced technologies.
Regionally, the Asia Pacific market is witnessing remarkable growth, outpacing other regions due to rapid digitalization, government initiatives, and increasing internet penetration. Countries such as China, India, and Singapo
Facebook
TwitterThe total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Digital data sources have become ubiquitous in modern culture in the era of digital technology but often tend to be under-researched because of restricted access to data sources due to fragmentation, privacy issues, or industry ownership, and the methodological complexity of demonstrating their measurable impact on human health. Even though new big data sources have shown unprecedented potential for disease diagnosis and outbreak detection, we need to investigate results in the existing literature to gain a comprehensive understanding of their impact on and benefits to human health.Objective: A systematic review of systematic reviews on identifying digital data sources and their impact area on people's health, including challenges, opportunities, and good practices.Methods: A multidatabase search was performed. Peer-reviewed papers published between January 2010 and November 2020 relevant to digital data sources on health were extracted, assessed, and reviewed.Results: The 64 reviews are covered by three domains, that is, universal health coverage (UHC), public health emergencies, and healthier populations, defined in WHO's General Programme of Work, 2019โ2023, and the European Programme of Work, 2020โ2025. In all three categories, social media platforms are the most popular digital data source, accounting for 47% (N = 8), 84% (N = 11), and 76% (N = 26) of studies, respectively. The second most utilized data source are electronic health records (EHRs) (N = 13), followed by websites (N = 7) and mass media (N = 5). In all three categories, the most studied impact of digital data sources is on prevention, management, and intervention of diseases (N = 40), and as a tool, there are also many studies (N = 10) on early warning systems for infectious diseases. However, they could also pose health hazards (N = 13), for instance, by exacerbating mental health issues and promoting smoking and drinking behavior among young people.Conclusions: The digital data sources presented are essential for collecting and mining information about human health. The key impact of social media, electronic health records, and websites is in the area of infectious diseases and early warning systems, and in the area of personal health, that is, on mental health and smoking and drinking prevention. However, further research is required to address privacy, trust, transparency, and interoperability to leverage the potential of data held in multiple datastores and systems. This study also identified the apparent gap in systematic reviews investigating the novel big data streams, Internet of Things (IoT) data streams, and sensor, mobile, and GPS data researched using artificial intelligence, complex network, and other computer science methods, as in this domain systematic reviews are not common.
Facebook
Twitterhttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
A dataset of broadband subscriptions and GDP per capita statistics, for 217 countries, between the years 2000-2020. The data is in long format, suitable for time series analysis.
The variables in the dataset are: - year: Year of the observation, between 2000-2020. - country: Country of the observation, 217 in total. - broadband_subs: Number of broadband subscriptions per 100 people. - GDPPC: GDP per capita, in 2022 US$
There are some missing values (NAs) for broadband_subs and GDPPC, especially in the eariler years.
The data source is World Bank Open Data. The original data was retrieved as two separate datasets in wide format, and converted into long format, in May 2022.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.
Data content areas include:
Facebook
TwitterAccording to a survey on the state of digital literacy in 2022, over ** percent of respondents in Indonesia stated that social media was their main source of information consumption. In comparison, online news was preferred by about **** percent of respondents. The same survey found that the majority of Indonesians used mobile internet data to access the internet everywhere.
Facebook
TwitterAs of October 2025, 6.04 billion individuals worldwide were internet users, which amounted to 73.2 percent of the global population. Of this total, 5.66 billion, or 68.7 percent of the world's population, were social media users. Global internet usage Connecting billions of people worldwide, the internet is a core pillar of the modern information society. Northern Europe ranked first among worldwide regions by the share of the population using the internet in 2025. In the Netherlands, Norway, and Saudi Arabia, 99 percent of the population used the internet as of February 2025. North Korea was at the opposite end of the spectrum, with virtually no internet usage penetration among the general population, ranking last worldwide. Eastern Asia was home to the largest number of online users worldwideโover 1.34 billion at the latest count. Southern Asia ranked second, with around 1.2 billion internet users. China, India, and the United States rank ahead of other countries worldwide by the number of internet users. Worldwide internet user demographics As of 2024, the share of female internet users worldwide was 65 percent, five percent less than that of men. Gender disparity in internet usage was bigger in African countries, with around a 10-percent difference. Worldwide regions, like the Commonwealth of Independent States and Europe, showed a smaller usage gap between these two genders. As of 2024, global internet usage was higher among individuals between 15 and 24 years old across all regions, with young people in Europe representing the most considerable usage penetration, 98 percent. In comparison, the worldwide average for the age group of 15 to 24 years was 79 percent. The income level of the countries was also an essential factor for internet access, as 93 percent of the population of the countries with high income reportedly used the internet, as opposed to only 27 percent of the low-income markets.
Facebook
TwitterTempe Census Census Tracts and internet access by household. Data source: U.S. Census Bureau, 2013-2017 American Community Survey 5-Year Estimates, table BD28011 (Internet Subscription in Household). Also includes "low response scores" from the the Census Bureau's data from the 2018 Planning Database (PDB), which was established to prepare for the upcoming 2020 Census.For more information on the low response score, see the United States Census Bureau 2018 Planning Database:https://www.census.gov/topics/research/guidance/planning-databases.htmlLayer generally supports 2020 Census story map Ensuring a Complete Count in the 2020 Census.
Facebook
TwitterFor the original data source: https://data.census.gov/table/ACSST5Y2023.S2801. Layer published for the Equity Explorer, a web experience developed by the LA County CEO Anti-Racism, Diversity, and Inclusion (ARDI) initiative in collaboration with eGIS and ISD. Visit the Equity Explorer to explore internet access and other equity related datasets and indices, including the COVID Vulnerability and Recovery Index. Internet and computer access for census tracts in LA County from the US Census American Communities Survey (ACS), 2023. Estimates are based on 2020 census tract boundaries, and tracts are joined to 2021 Supervisorial Districts, Service Planning Areas (SPA), and Countywide Statistical Areas (CSA). For more information about this dataset, please contact egis@isd.lacounty.gov.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionFollowing the identification of Local Area Energy Planning (LAEP) use cases, this dataset lists the data sources and/or information that could help facilitate this research. View our dedicated page to find out how we derived this list: Local Area Energy Plan โ UK Power Networks (opendatasoft.com)
Methodological Approach Data upload: a list of datasets and ancillary details are uploaded into a static Excel file before uploaded onto the Open Data Portal.
Quality Control Statement
Quality Control Measures include: Manual review and correct of data inconsistencies Use of additional verification steps to ensure accuracy in the methodology
Assurance Statement The Open Data Team and Local Net Zero Team worked together to ensure data accuracy and consistency.
Other Download dataset information: Metadata (JSON)
Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/
Please note that "number of records" in the top left corner is higher than the number of datasets available as many datasets are indexed against multiple use cases leading to them being counted as multiple records.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains country-level internet usage data from 2000 to 2023. It provides the percentage of the population using the internet in different countries over time. This data can be useful for analyzing global internet penetration, digital adoption trends, and technological growth across regions.
๐น Dataset Information:
๐ Potential Use Cases:
๐ Source:
Modified from this source World bank group data
This dataset is valuable for data visualization, time-series analysis, and policy-making research related to digital growth.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Attached are the attendant data sources to the paper "Predicting Dengue Incidence Leveraging Internet-Based Data Sources. A Case Study in 20 cities in Brazil."
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Twitter and Facebook statistics from various NYC agencies and organizations.
Update Frequency: As required
This is a dataset hosted by the City of New York. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York City using Kaggle and all of the data sources available through the City of New York organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
Photo by NordWood Themes on Unsplash
Facebook
TwitterKey indicators of the availability of internet service choice and speed based on publicly available data from the Federal Communications Commission
Data Limitations: Data accuracy is limited as of the date of publication and by the methodology and accuracy of the original sources. The City shall not be liable for any costs related to, or in reliance of, the data contained in these datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about countries. It has 194 rows. It features 3 columns: electricity production from nuclear sources, and individuals using the Internet. It is 90% filled with non-null values.
Facebook
TwitterThis web map application was created to show the number of households with no internet connection in the City of Dallas by census tract boundaries. Additionally, the City of Dallas "Neighborhood Associations" layer has been added to show the neighborhoods of the areas of interest. The map is symbolized to show the percentage of households with no internet connection.The ACS layer that feeds this application was created by Esri and is updated automatically when the most current vintage of ACS data is released each year, usually in December. The layer always contains the latest available ACS 5-year estimates. It is updated annually within days of the Census Bureau's release schedule. Click here to learn more about ACS data releases.This application was created using this web map: Households with No Internet Access.Data Source: ACS Internet Connectivity Variables - Boundaries
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Sources of Revenue: Internet Access Services for Cellular and Other Wireless Telecommunications, All Establishments, Employer Firms was 113132.00000 Mil. of $ in January of 2022, according to the United States Federal Reserve. Historically, United States - Sources of Revenue: Internet Access Services for Cellular and Other Wireless Telecommunications, All Establishments, Employer Firms reached a record high of 113132.00000 in January of 2022 and a record low of 62878.00000 in January of 2012. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Sources of Revenue: Internet Access Services for Cellular and Other Wireless Telecommunications, All Establishments, Employer Firms - last updated from the United States Federal Reserve on November of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This assessment evaluates 34 internet activity data sources for use in disease surveillance. Our goals are to (a) understand the available data on internet usage and activity well enough to (b) identify real-world internet data sources that can be used both for evaluating our theories of disease surveillance and buildingoperational internet data-based disease surveillance systems.The assessment (pdf) and raw data (excel spreadsheet) are attached.