The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
By Amber Thomas [source]
This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.
According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.
This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use the US Broadband Usage Dataset
This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.
The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].
- Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
- Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
- Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: broadband_data_2020October.csv
If you use this dataset in your research,...
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual data on internet usage in Great Britain, including frequency of internet use, internet activities and internet purchasing.
As of February 2025, 5.56 billion individuals worldwide were internet users, which amounted to 67.9 percent of the global population. Of this total, 5.24 billion, or 63.9 percent of the world's population, were social media users. Global internet usage Connecting billions of people worldwide, the internet is a core pillar of the modern information society. Northern Europe ranked first among worldwide regions by the share of the population using the internet in 20254. In The Netherlands, Norway and Saudi Arabia, 99 percent of the population used the internet as of February 2025. North Korea was at the opposite end of the spectrum, with virtually no internet usage penetration among the general population, ranking last worldwide. Eastern Asia was home to the largest number of online users worldwide – over 1.34 billion at the latest count. Southern Asia ranked second, with around 1.2 billion internet users. China, India, and the United States rank ahead of other countries worldwide by the number of internet users. Worldwide internet user demographics As of 2024, the share of female internet users worldwide was 65 percent, five percent less than that of men. Gender disparity in internet usage was bigger in African countries, with around a ten percent difference. Worldwide regions, like the Commonwealth of Independent States and Europe, showed a smaller usage gap between these two genders. As of 2024, global internet usage was higher among individuals between 15 and 24 years old across all regions, with young people in Europe representing the most significant usage penetration, 98 percent. In comparison, the worldwide average for the age group 15–24 years was 79 percent. The income level of the countries was also an essential factor for internet access, as 93 percent of the population of the countries with high income reportedly used the internet, as opposed to only 27 percent of the low-income markets.
As of 2024, the estimated number of internet users worldwide was 5.5 billion, up from 5.3 billion in the previous year. This share represents 68 percent of the global population. Internet access around the world Easier access to computers, the modernization of countries worldwide, and increased utilization of smartphones have allowed people to use the internet more frequently and conveniently. However, internet penetration often pertains to the current state of development regarding communications networks. As of January 2023, there were approximately 1.05 billion total internet users in China and 692 million total internet users in the United States. Online activities Social networking is one of the most popular online activities worldwide, and Facebook is the most popular online network based on active usage. As of the fourth quarter of 2023, there were over 3.07 billion monthly active Facebook users, accounting for well more than half of the internet users worldwide. Connecting with family and friends, expressing opinions, entertainment, and online shopping are amongst the most popular reasons for internet usage.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset contains comprehensive performance data of National Basketball Association (NBA) players during the 2019-20 season. It includes all the crucial performance metrics crucial to assess a player’s quality of play. Here, you can compare players across teams, positions and categories and gain deeper insight into their overall performance. This dataset includes useful statistics such as GP (Games Played), Player name, Position, Assists Turnovers Ratio, Blocks per Game, Fouls per Minutes Played, Rebounds per Game and more. Dive in to this detailed overview of NBA player performance and take your understanding of athletes within the organization to another level!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides an in-depth look into the performance of NBA Players throughout the 2019-20 season, allowing an informed analysis of various important statistics. There are a number of ways to use this dataset to both observe and compare players, teams and positions.
By looking at the data you can get an idea of how players are performing across all metrics. The “Points Per Game” metric is particularly useful as it allows quick comparison between different players and teams on their offensive ability. Additionally, exploratory analysis can be conducted by looking at metrics like rebounds or assists per game which allows one to make interesting observations within the game itself such as ball movement being a significant factor for team success.
This dataset also enables further comparison between players from different positions on particular metrics that might be position orientated or generic across all positions such as points per game (ppg). This includes adjusting for positional skill sets; For example guard’s field goal attempts might include more three point shots because it would benefit them more than larger forwards or centres who rely more heavily on in close shot attempts due to their size advantage over their opponents.
This dataset also allows for simple visualisation of player performance with respect to each other; For example one can view points scored against assists ratio when comparing multiple point guards etc., providing further insight into individual performances on certain metrics which otherwise could not be analysed quickly with traditional methods like statistical analysis only within similarly situated groups (e.g.: same position). Furthermore this data set could aid further research in emerging areas such as targeted marketing analytics where identify potential customers based off publically available data regarding factors like ppg et cetera which may highly affect team success orotemode profitability dynamicsincreasedancefficiencyoftheirownopponentteams etcet
- Develop an AI-powered recommendation system that can suggest optimal players to fill out a team based on their performances in the past season.
- Examine trends in player performance across teams and positions, allowing coaches and scouts to make informed decisions when evaluating talent.
- Create a web or mobile app that can compare the performances of multiple players, allowing users to explore different performance metrics head-to-head
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: assists-turnovers.csv | Column name | Description | |:--------------|:----------------------------------| | GP | Number of games played. (Integer) | | Player | Player name. (String) | | Position | Player position. (String) |
File: blocks.csv | Column name | Description | |:--------------|:----------------------------------| | GP | Number of games played. (Integer) | | Player | Player name. (String) | | Position | Player position. (String) |
File: fouls-minutes.csv | Column name | Description | |:--------------|:----------------------...
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Internet use in the UK annual estimates by age, sex, disability, ethnic group, economic activity and geographical location, including confidence intervals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook is fast approaching 3 billion monthly active users. That’s about 36% of the world’s entire population that log in and use Facebook at least once a month.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Like other sub-sectors in the education app market, skills and online training courses experienced significant growth at the beginning of the coronavirus pandemic, as many people lost jobs or were...
As of February 2025, around 322 million people in the United States accessed the internet, making it one of the largest online markets worldwide. The country currently ranks third after China and India by the online audience size. Overview of internet usage in the United States The digital population in the United States has constantly increased in recent years. Among the most common reasons is the growing accessibility of broadband internet. A big part of the country's digital audience accesses the web via mobile phones. In 2024, the country saw an estimated 97.1 percent mobile internet user penetration. According to a 2024 survey, over 51 percent of U.S. women and 43 percent of men said it is important to them to have mobile internet access anywhere, at any time. Another 41 percent of respondents could not imagine their everyday life without the internet. Google and YouTube are the most visited websites in the country, while music, food, and drinks were the most discussed online topics. Internet usage demographics in the United States While some users can no longer imagine their life without the internet, others do not use it at all. According to 2021 data, 25 percent of U.S. adults 65 and older reported not using the internet. Despite this, online usage was strong across other age groups, especially young adults aged 18 to 49. This age group also reported the highest percentage of smartphone usage in the country as of 2023. Due to a persistent lack of connectivity in rural areas, more online users were based in urban areas of the U.S. than in the countryside.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Any work using this dataset should cite the following paper:
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Africa - Population and Internet users statistics
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Source: https://data.humdata.org/dataset/africa-population-and-internet-users-statistics Last updated at https://data.humdata.org/organization/openafrica : 2019-09-11
The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.
Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).
The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.
The survey is focused on three core areas of research:
Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.
If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".
Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.
Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.
The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."
The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:
The survey data will be provided under embargo in both comma-delimited and statistical formats.
Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)
Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.
Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.
Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.
Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Iran IR: Internet Users: Individuals: % of Population data was reported at 60.416 % in 2017. This records an increase from the previous number of 53.227 % for 2016. Iran IR: Internet Users: Individuals: % of Population data is updated yearly, averaging 8.100 % from Dec 1990 (Median) to 2017, with 25 observations. The data reached an all-time high of 60.416 % in 2017 and a record low of 0.000 % in 1990. Iran IR: Internet Users: Individuals: % of Population data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Iran – Table IR.World Bank: Telecommunication. Internet users are individuals who have used the Internet (from any location) in the last 3 months. The Internet can be used via a computer, mobile phone, personal digital assistant, games machine, digital TV etc.; ; International Telecommunication Union, World Telecommunication/ICT Development Report and database.; Weighted average; Please cite the International Telecommunication Union for third-party use of these data.
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SMRT27 - Individuals who used any internet connected devices or systems for private purposes and problems encountered. Published by Central Statistics Office. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).Individuals who used any internet connected devices or systems for private purposes and problems encountered...
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
App Download Key StatisticsApp and Game DownloadsiOS App and Game DownloadsGoogle Play App and Game DownloadsGame DownloadsiOS Game DownloadsGoogle Play Game DownloadsApp DownloadsiOS App...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains counts of live births for California as a whole based on information entered on birth certificates. Final counts are derived from static data and include out of state births to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all births that occurred during the time period.
The final data tables include both births that occurred in California regardless of the place of residence (by occurrence) and births to California residents (by residence), whereas the provisional data table only includes births that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by parent giving birth's age, parent giving birth's race-ethnicity, and birth place type. See temporal coverage for more information on which strata are available for which years.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Individuals who used any internet connected devices or... Details Download XLSX Individuals who used any internet connected devices or...
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.