64 datasets found
  1. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

  2. Data from: Internet users

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Apr 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2021). Internet users [Dataset]. https://www.ons.gov.uk/businessindustryandtrade/itandinternetindustry/datasets/internetusers
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 6, 2021
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Internet use in the UK annual estimates by age, sex, disability, ethnic group, economic activity and geographical location, including confidence intervals.

  3. Africa - Population and Internet users statistics

    • kaggle.com
    Updated Dec 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ishmeet singh (2020). Africa - Population and Internet users statistics [Dataset]. https://www.kaggle.com/datasets/ishmeet/africa-population-and-internet-users-statistics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 17, 2020
    Dataset provided by
    Kaggle
    Authors
    Ishmeet singh
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Area covered
    Africa
    Description

    Context

    Africa - Population and Internet users statistics

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    Source: https://data.humdata.org/dataset/africa-population-and-internet-users-statistics Last updated at https://data.humdata.org/organization/openafrica : 2019-09-11

  4. Data from: WikiReddit: Tracing Information and Attention Flows Between...

    • zenodo.org
    bin
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms [Dataset]. http://doi.org/10.5281/zenodo.14653265
    Explore at:
    binAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 15, 2025
    Description

    Preprint

    Gildersleve, P., Beers, A., Ito, V., Orozco, A., & Tripodi, F. (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms. arXiv [Cs.CY]. https://doi.org/10.48550/arXiv.2502.04942
    Accepted at the International AAAI Conference on Web and Social Media (ICWSM) 2025

    Abstract

    The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.

    Datasheet

    Motivation

    The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.

    Composition

    WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.

    Collection Process

    Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.

    Preprocessing/cleaning/labeling

    Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.

    Uses

    We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.

    Distribution

    The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942

    Maintenance

    Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.


    SQL Database Schema

    Table: posts

    Column NameTypeDescription
    subreddit_idTEXTThe unique identifier for the subreddit.
    crosspost_parent_idTEXTThe ID of the original Reddit post if this post is a crosspost.
    post_idTEXTUnique identifier for the Reddit post.
    created_atTIMESTAMPThe timestamp when the post was created.
    updated_atTIMESTAMPThe timestamp when the post was last updated.
    language_codeTEXTThe language code of the post.
    scoreINTEGERThe score (upvotes minus downvotes) of the post.
    upvote_ratioREALThe ratio of upvotes to total votes.
    gildingsINTEGERNumber of awards (gildings) received by the post.
    num_commentsINTEGERNumber of comments on the post.

    Table: comments

    Column NameTypeDescription
    subreddit_idTEXTThe unique identifier for the subreddit.
    post_idTEXTThe ID of the Reddit post the comment belongs to.
    parent_idTEXTThe ID of the parent comment (if a reply).
    comment_idTEXTUnique identifier for the comment.
    created_atTIMESTAMPThe timestamp when the comment was created.
    last_modified_atTIMESTAMPThe timestamp when the comment was last modified.
    scoreINTEGERThe score (upvotes minus downvotes) of the comment.
    upvote_ratioREALThe ratio of upvotes to total votes for the comment.
    gildedINTEGERNumber of awards (gildings) received by the comment.

    Table: postlinks

    Column NameTypeDescription
    post_idTEXTUnique identifier for the Reddit post.
    end_processed_validINTEGERWhether the extracted URL from the post resolves to a valid URL.
    end_processed_urlTEXTThe extracted URL from the Reddit post.
    final_validINTEGERWhether the final URL from the post resolves to a valid URL after redirections.
    final_statusINTEGERHTTP status code of the final URL.
    final_urlTEXTThe final URL after redirections.
    redirectedINTEGERIndicator of whether the posted URL was redirected (1) or not (0).
    in_titleINTEGERIndicator of whether the link appears in the post title (1) or post body (0).

    Table: commentlinks

    Column NameTypeDescription
    comment_idTEXTUnique identifier for the Reddit comment.
    end_processed_validINTEGERWhether the extracted URL from the comment resolves to a valid URL.
    end_processed_urlTEXTThe extracted URL from the comment.
    final_validINTEGERWhether the final URL from the comment resolves to a valid URL after redirections.
    final_statusINTEGERHTTP status code of the final

  5. Attitudes towards the internet in Mexico 2025

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umair Bashir (2025). Attitudes towards the internet in Mexico 2025 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Umair Bashir
    Description

    When asked about "Attitudes towards the internet", most Mexican respondents pick "It is important to me to have mobile internet access in any place" as an answer. 56 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.

  6. c

    Anonymized Internet Traces 2016

    • catalog.caida.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CAIDA, Anonymized Internet Traces 2016 [Dataset]. https://catalog.caida.org/dataset/passive_2016_pcap
    Explore at:
    Dataset authored and provided by
    CAIDA
    License

    https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/

    Time period covered
    Jan 2016 - Dec 2016
    Description

    Packet headers (upto transport layer, inclusive) for Anonymized Internet Traces 2016 Dataset. Derived from OC192 traces on Equinix San Jose and Chicago monitors.

  7. Cary Broadband Internet Access

    • catalog.data.gov
    • data.townofcary.org
    • +2more
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Census Bureau (2024). Cary Broadband Internet Access [Dataset]. https://catalog.data.gov/dataset/cary-broadband-internet-access-american-community-survey
    Explore at:
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    Cary
    Description

    Part of the What Works Cities criterion to achieve Certification, we need to meet the industry standard of at least 75% of our households have subscriptions / access to high-speed broadband servicesPart of the American Community Survey (ACS) asks the levels of internet access residents have. We use the 5-Year Estimates to have a greater level of precision to our data, according to the Distinguishing features of ACS 1-year, 1-year supplemental, 3-year, and 5-year estimates table.We query attributes of the DP02 (Selected Social Characteristics in the United States) Group of questions for years available.This dataset has been narrowed down to Cary township using following the geographies codes supported for the ACS dataset:state: 37county: 183county subdivision: 90536

  8. Internet Traffic Data Set

    • kaggle.com
    Updated May 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asfand Yar (2023). Internet Traffic Data Set [Dataset]. http://doi.org/10.34740/kaggle/dsv/5658579
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Asfand Yar
    Description

    This data set contains internet traffic data captured by an Internet Service Provider (ISP) using Mikrotik SDN Controller and packet sniffer tools. The data set includes traffic from over 2000 customers who use Fibre to the Home (FTTH) and Gpon internet connections. The data was collected over a period of several months and contains all traffic in its original format with headers and packets.

    The data set contains information on inbound and outbound traffic, including web browsing, email, file transfers, and more. The data set can be used for research in areas such as network security, traffic analysis, and machine learning.

    **Data Collection Method: ** The data was captured using Mikrotik SDN Controller and packet sniffer tools. These tools capture traffic data by monitoring network traffic in real-time. The data set contains all traffic data in its original format, including headers and packets.

    **Data Set Content: ** The data set is provided in a CSV format and includes the following fields:

    1. Timestamp: The date and time the traffic was captured
    2. Source IP Address: The IP address of the device that sent the traffic Destination IP Address: The IP address of the device that received the traffic Protocol: The network protocol used for the traffic (e.g. TCP, UDP) Source Port: The port used by the source device for the traffic Destination Port: The port used by the destination device for the traffic Packet Size: The size of the packet in bytes Payload: The payload data of the packet The data set contains a large volume of traffic data from over 2000 customers. The data is organized by timestamp and includes all traffic data in its original format, including headers and packets. The data set contains both inbound and outbound traffic, and covers various types of internet traffic, including web browsing, email, file transfers, and more. one of listed protocols: ipsec-ah - IPsec AH protocol *ipsec-esp - IPsec ESP protocol ddp - datagram delivery protocol egp - exterior gateway protocol ggp - gateway-gateway protocol gre - general routing encapsulation hmp - host monitoring protocol idpr-cmtp - idpr control message transport icmp - internet control message protocol icmpv6 - internet control message protocol v6 igmp - internet group management protocol ipencap - ip encapsulated in ip ipip - ip encapsulation encap - ip encapsulation iso-tp4 - iso transport protocol class 4 ospf - open shortest path first pup - parc universal packet protocol pim - protocol independent multicast rspf - radio shortest path first rdp - reliable datagram protocol st - st datagram mode tcp - transmission control protocol udp - user datagram protocol vmtp - versatile message transport vrrp - virtual router redundancy protocol xns-idp - xerox xns idp xtp - xpress transfer protocol

    MAC Protocol Examples 802.2 - 802.2 Frames (0x0004) arp - Address Resolution Protocol (0x0806) homeplug-av - HomePlug AV MME (0x88E1) ip - Internet Protocol version 4 (0x0800) ipv6 - Internet Protocol Version 6 (0x86DD) ipx - Internetwork Packet Exchange (0x8137) lldp - Link Layer Discovery Protocol (0x88CC) loop-protect - Loop Protect Protocol (0x9003) mpls-multicast - MPLS multicast (0x8848) mpls-unicast - MPLS unicast (0x8847) packing-compr - Encapsulated packets with compressed IP packing (0x9001) packing-simple - Encapsulated packets with simple IP packing (0x9000) pppoe - PPPoE Session Stage (0x8864) pppoe-discovery - PPPoE Discovery Stage (0x8863) rarp - Reverse Address Resolution Protocol (0x8035) service-vlan - Provider Bridging (IEEE 802.1ad) & Shortest Path Bridging IEEE 802.1aq (0x88A8) vlan - VLAN-tagged frame (IEEE 802.1Q) and Shortest Path Bridging IEEE 802.1aq with NNI compatibility (0x8100)

    **Data Usage: ** The data set can be used for research in areas such as network security, traffic analysis, and machine learning. Researchers can use the data to develop new algorithms for detecting and preventing cyber attacks, analyzing internet traffic patterns, and more.

    **Data Availability: ** If you are interested in using this data set for research purposes, please contact us at asfandyar250@gmail.com for more information and references. The data set is available for download on Kaggle and can be accessed by researchers who have obtained permission from the ISP.

    We hope this data set will be useful for researchers in the field of network security and traffic analysis. If you have any questions or need further information, please do not hesitate to contact us. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5985737%2F61c81ce9eb393f8fc7c15540c9819b95%2FData.PNG?generation=1683750473536727&alt=media" alt=""> You can use Wireshark or other software's to view files

  9. Internet Verification File (IVF)

    • catalog.data.gov
    • data.amerigeoss.org
    Updated Aug 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Internet Verification File (IVF) [Dataset]. https://catalog.data.gov/dataset/internet-verification-file-ivf
    Explore at:
    Dataset updated
    Aug 11, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    Internal listing of current employees and authorized users who can access SSA applications.

  10. Z

    Data from: #PraCegoVer dataset

    • data.niaid.nih.gov
    Updated Jan 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandra Avila (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
    Explore at:
    Dataset updated
    Jan 19, 2023
    Dataset provided by
    Esther Luna Colombini
    Gabriel Oliveira dos Santos
    Sandra Avila
    Description

    Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

    PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

    PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

    Dataset Structure

    PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

    containing the images. The file dataset.json comprehends a list of json objects with the attributes:

    user: anonymized user that made the post;

    filename: image file name;

    raw_caption: raw caption;

    caption: clean caption;

    date: post date.

    Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

    Download Instructions

    If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

    cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

    Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

    python download_dataset.py --access_token=

  11. Attitudes towards the internet in Australia 2025

    • statista.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umair Bashir (2025). Attitudes towards the internet in Australia 2025 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Umair Bashir
    Description

    When asked about "Attitudes towards the internet", most Australian respondents pick "It is important to me to have mobile internet access in any place" as an answer. 55 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.

  12. National Broadband Data

    • open.canada.ca
    • gimi9.com
    • +1more
    csv, gpkg, kmz, shp +2
    Updated Jun 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Innovation, Science and Economic Development Canada (2025). National Broadband Data [Dataset]. https://open.canada.ca/data/en/dataset/00a331db-121b-445d-b119-35dbbe3eedd9
    Explore at:
    txt, kmz, tab, csv, gpkg, shpAvailable download formats
    Dataset updated
    Jun 24, 2025
    Dataset provided by
    Innovation, Science and Economic Development Canadahttp://www.ic.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    The National Broadband Data represents coverage information across Canada for existing broadband service providers with their associated technology types. The coverage information is aggregated and deployed over a grid of hexagons, which cover areas of roughly 25 square km each. Broadband Internet service availability is provided for download/upload speed markers (5/1, 10/2, 25/5 and 50/10 Mbps) where more than 75% of total dwellings covered within the hexagon have access to broadband service offerings meeting these markers. In order to improve the granularity of the broadband data, ISED and the CRTC are providing aggregated and anonymous broadband services data based on the pseudo-household statistical model, hence achieving higher precision in depicting the broadband Internet service availability. This information is available below under the "NBD PHH Speeds" resource. For more information on the pseudo-household statistical model, refer to the Pseudo-Household Demographic Distribution dataset. A representation of broadband services per 250m road segments is now available for download under the “NBD Roads” resource. To generate this dataset, the NBD PHH Speeds information was projected over the nearest road arc from Statistics Canada’s Road Network File, and those roads were spliced in approximately 250m segments. NEW: The data has been augmented to include new presentation layers as published on the National Broadband Map.

  13. c

    Broadband Data by Town - 2023

    • broadbandmaps.ct.gov
    • data.ct.gov
    • +4more
    Updated Nov 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Connecticut (2023). Broadband Data by Town - 2023 [Dataset]. https://broadbandmaps.ct.gov/datasets/broadband-data-by-town-2023
    Explore at:
    Dataset updated
    Nov 25, 2023
    Dataset authored and provided by
    State of Connecticut
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Description

    This feature layer includes all OPM collected data at the town level.-------------The Connecticut Broadband Availability and Adoption Maps were created to help citizens and policymakers understand the strengths and weaknesses of broadband infrastructure in the state. Data is aggregated to the block, tract, and town (county subdivision) levels and includes counts of locations classified as unserved, underserved, and served as well as whether they meet the state goal of 1000Mbps/100Mbps. This application splits its visualizations into block, tract, and town layers for both unserved locations and progress to the state goal.

    This map uses OPM collected availability and adoption data.

    As of 2023, OPM collected availability data was submitted by internet service providers pursuant to PA 21-159 and processed by the GIS Office in the Office of Policy and Management, cleaned, and matched to the CostQuest location fabric.

    Metadata:

    All feature layers, maps, and datasets including OPM's internal broadband availability data follows the same basic schema with additional fields added in some case for convenience.

    Fields named no service, unserved, underserved, served, and GigC are counts of locations where a particular level of broadband service is provided, No service locations are those where there is no reported service at all. Unserved locations are locations where there is a provider offering wireline service, but not at or above 25 Mbps download and 3 Mbps upload. Underserved locations are locations where at least one provider offers wireline service of 25 Mbps download and 3 Mbps upload, but there is no provider offering wireline service of 100 Mbps download and 20 Mbps upload. Served locations are locations where there is wireline service of at least 100 Mbps download and 20 Mbps upload. GigC denotes the count of locations that have service at 1000 Mbps download and 100 Mbps upload. Accordingly, total locations is equal to the sum of no service, unserved, underserved, served, and "GigC" locations. Availability also includes fields for average download and upload speeds. These are calculated at the relevant level of census geography based on the maximum for all locations.

    The final field included in all availability data is the provider list.

    OPM collected adoption data:

    OPM collected adoption data uses many of the same naming conventions as the availability data, but there are some notable differences.

    Fields named unserved_Sub, underserved_Sub, served_Sub, and GigC _Sub are counts of subscriptions where a particular level of broadband service is currently subscribed to, Unserved subscriptions are subscriptions that do not meet the standard of 25 Mbps download and 3 Mbps upload. Underserved subscriptions are subscriptions with speeds of 25 Mbps download and 3 Mbps upload, but not meeting 100 Mbps download and 20 Mbps upload. Served subscriptions are subscriptions where speeds are between 100 Mbps download and 20 Mbps upload and 1000 Mbps download and 100 Mbps upload. GigC denotes the count of locations that have a subscription at 1000 Mbps download and 100 Mbps upload or higher. For subscription data these locations are NOT included in the "served" field as this does not directly apply to FCC use of the terms.

  14. c

    Data from: Dataset for Cyber-Physical Anomaly Detection in Smart Homes

    • research-data.cardiff.ac.uk
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasar Majib; Mohammed Alosaimi; Andre Asaturyan; Charith Perera (2024). Dataset for Cyber-Physical Anomaly Detection in Smart Homes [Dataset]. http://doi.org/10.17035/d.2023.0259651425
    Explore at:
    Dataset updated
    Sep 19, 2024
    Dataset provided by
    Cardiff University
    Authors
    Yasar Majib; Mohammed Alosaimi; Andre Asaturyan; Charith Perera
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Smart homes contain programmable electronic devices (mostly IoT) that enable home au- tomation. People who live in smart homes benefit from interconnected devices by controlling them either remotely or manually/autonomously. However, high interconnectivity comes with an increased attack surface, making the smart home an attractive target for adversaries. NCC Group and the Global Cyber Alliance recorded over 12,000 attacks to log into smart home devices maliciously. Recent statistics show that over 200 million smart homes can be subjected to these attacks. Conventional security systems are either focused on network traffic (e.g., firewalls) or physical environment (e.g., CCTV or basic motion sensors), but not both. A key challenge in de- veloping cyber-physical security systems is the lack of datasets and test beds. For cyber-physical datasets to be meaningful, they need to be collected in real smart home environments. Due to the inherited difficulties and challenges (e.g. effort, costs, test-bed availability), such cyber-physical smart home datasets are quite rare. This paper aims to fill this gap by contributing a dataset we collected in a real smart home with annotated labels. This paper explains the process we followed to collect the data and how we organised them to facilitate wider use within research communities.A related article can be found at https://doi.org/10.3389/friot.2023.1275080

  15. e

    Geography of digital inequality - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Jun 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Geography of digital inequality - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/aac47d03-1fdf-5f48-8021-627e02f643e9
    Explore at:
    Dataset updated
    Jun 27, 2023
    Description

    These data consist of measures of Internet use estimated using small area estimation. The small area estimation is based on census Output Areas (OAs) using the 2013 Oxford Internet Survey (OxIS) and the 2011 British census. There is an estimate for each OA in Great Britain. By combining the 2013 OxIS survey data with the comprehensive small area coverage of the 2011 British census we can use the strengths of one to offset the gaps in the other. Specifically, we follow a two-step process. First, we use the information that is reliably available in OxIS to create model that estimates the proportion of Internet users in OAs. Second, we use the parameters from this model combined with census data to estimate the proportion of Internet users each OA in Britain. Once these estimates are available, we aggregate the estimates up to higher levels of geography. In this way we can estimate Internet use in Glasgow, Manchester and Cardiff as well as other small areas in Britain. This procedure is referred to as indirect, model-based or synthetic estimation. In recent years such SAE techniques have been widely used throughout Europe and North America. See the project website for more details.The objective of the Geography of Digital Inequality project was to explore the geographical contours of Internet use and penetration in Britain. Specifically, the project assembled from existing datasets a new dataset which contains Internet information at fine-grained geographic levels, census output areas (OAs). From OAs we were able to aggregate to higher geographic levels such as counties, Welsh and Scottish Councils, metropolitan areas, or others. Through this unique dataset we explored digital divides and the geography of the Internet, a capability possessed by no other dataset. Specifically, we explored the extent of use versus non-use of the Internet. There were 2 datasets used to assemble this dataset. First, the 2013 Oxford Internet Survey (OxIS) is a random sample of the 2657 people age 14+ from the British population (England, Scotland & Wales). Interviews were conducted face-to-face by an independent survey research company. The response rate for 2013 was 51%. The data collection was a two-stage sample. A random sample of census output areas (OAs) was selected and respondents were randomly sampled within each selected OA. For details, see "Data collection technical report.pdf" which has been uploaded. We use six variables from OxIS: Internet use, region, age, lifestage, gender and education. The questionnaire for OxIS contains about 300 variables and it is available from the OxIS website, see the URL in the "related resources" section. Second, the 2011 British Census. For information on how the census was conducted,see the census website. The URL for the 2011 census is given below in "related resources".

  16. Available Wireless Sensor Network and Internet of Things testbed facilities:...

    • data.europa.eu
    unknown
    Updated Oct 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2022). Available Wireless Sensor Network and Internet of Things testbed facilities: dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-7157221?locale=cs
    Explore at:
    unknown(2365963)Available download formats
    Dataset updated
    Oct 7, 2022
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this data set, we present data collected for the purpose of carrying out a systematic review of the available Wireless Sensor Network and Internet of Things testbed facilities. The data was collected through multiple stages and in each stage the pre-defined criteria were applied. We provide a dataset describing the hardware and software aspects of Wireless Sensor Network and Internet of Things testbed facilities available in the market and scientific community. The data were gathered through an extensive systematic review process of scientific articles published between the years 2011 and 2021. The review aims to obtain good quality data for people who are actively researching the Internet of Things facilities or anyone who is interested in that field.

  17. m

    Data from: MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022...

    • data.mendeley.com
    Updated Jul 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). MonkeyPox2022Tweets: The First Public Twitter Dataset on the 2022 MonkeyPox Outbreak [Dataset]. http://doi.org/10.17632/xmcg82mx9k.3
    Explore at:
    Dataset updated
    Jul 25, 2022
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: The first public Twitter dataset on the 2022 MonkeyPox outbreak,” Preprints, 2022, DOI: 10.20944/preprints202206.0172.v2

    Abstract The world is currently facing an outbreak of the monkeypox virus, and confirmed cases have been reported from 28 countries. Following a recent “emergency meeting”, the World Health Organization just declared monkeypox a global health emergency. As a result, people from all over the world are using social media platforms, such as Twitter, for information seeking and sharing related to the outbreak, as well as for familiarizing themselves with the guidelines and protocols that are being recommended by various policy-making bodies to reduce the spread of the virus. This is resulting in the generation of tremendous amounts of Big Data related to such paradigms of social media behavior. Mining this Big Data and compiling it in the form of a dataset can serve a wide range of use-cases and applications such as analysis of public opinions, interests, views, perspectives, attitudes, and sentiment towards this outbreak. Therefore, this work presents MonkeyPox2022Tweets, an open-access dataset of Tweets related to the 2022 monkeypox outbreak that were posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

    Data Description The dataset consists of a total of 255,363 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 23rd July 2022 (the most recent date at the time of dataset upload). The Tweet IDs are presented in 6 different .txt files based on the timelines of the associated tweets. The following provides the details of these dataset files. • Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the Tweet IDs: May 7, 2022 to May 21, 2022) • Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the Tweet IDs: May 21, 2022 to May 27, 2022) • Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the Tweet IDs: May 27, 2022 to June 5, 2022) • Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the Tweet IDs: June 5, 2022 to June 11, 2022) • Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 47718, Date Range of the Tweet IDs: June 12, 2022 to June 30, 2022) • Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the Tweet IDs: July 1, 2022 to July 23, 2022)

    The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used.

  18. d

    ICA243 - Percentage of Internet users who purchased Travel/Culture related...

    • datasalsa.com
    csv, json-stat, px +1
    Updated Jan 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Office (2025). ICA243 - Percentage of Internet users who purchased Travel/Culture related services online in the previous 3 months [Dataset]. https://datasalsa.com/dataset/?catalogue=data.gov.ie&name=ica243-rnet-users-who-purchased-travelculture-related-services-online-in-the-previous-3-months-8219
    Explore at:
    csv, px, json-stat, xlsxAvailable download formats
    Dataset updated
    Jan 4, 2025
    Dataset authored and provided by
    Central Statistics Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 8, 2025
    Description

    ICA243 - Percentage of Internet users who purchased Travel/Culture related services online in the previous 3 months. Published by Central Statistics Office. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).Percentage of Internet users who purchased Travel/Culture related services online in the previous 3 months...

  19. C

    Internet Access Technology Options

    • data.ccrpc.org
    csv
    Updated Jun 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Champaign County Regional Planning Commission (2022). Internet Access Technology Options [Dataset]. https://data.ccrpc.org/dataset/internet-access-options
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 3, 2022
    Dataset authored and provided by
    Champaign County Regional Planning Commission
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The Internet access indicator measures the prevalence of different Internet technology options available in Champaign County, Illinois, and the U.S., at two different speeds: 4/1 Mbps and 25/3 Mbps.

    Seven types of connection options are evaluated: ADSL, cable, fiber, fixed wireless, satellite, "other" technology, and "any" technology, which includes the previous six options.

    Satellite internet, at both speeds, is the most widely available in all three areas. One hundred percent of Champaign County residents have access to satellite internet at both speeds. Cable internet is also widely available across all three areas, and over 90 percent of Champaign County residents have access to cable internet. Fiber internet is the least widely available type of technology, aside from "other" technology. However, fiber internet is now available to almost 38 percent of Champaign County residents as of December 2020, an increase from approximately 25 percent in June 2020.

    The ability of Champaign County residents to access the Internet has become key in many facets of life, especially during the COVID-19 pandemic. Internet access provides economic, educational, and social opportunities; having or not having Internet access has become not only a technological issue, but an equity issue.

    This data was retrieved from the Federal Communications Commission’s Fixed Broadband Deployment Area Comparison, and dates from December 2020.

    Source: Federal Communications Commission. (2020). Fixed Broadband Deployment. Area Comparison. https://broadbandmap.fcc.gov/#/. (Accessed 3 June 2022).

  20. d

    Job Accommodation Network Datasets

    • catalog.data.gov
    • datasets.ai
    Updated Aug 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Disability Employment Policy (2023). Job Accommodation Network Datasets [Dataset]. https://catalog.data.gov/dataset/job-accommodation-network-datasets-72ca8
    Explore at:
    Dataset updated
    Aug 12, 2023
    Dataset provided by
    Office of Disability Employment Policy
    Description

    Data collected from interviews with employers, professionals, self-employed individuals, and individual workers who have been assisted by JAN

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Organization logo

Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028

Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description

The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

Search
Clear search
Close search
Google apps
Main menu