100+ datasets found
  1. Data generation volume worldwide 2010-2029

    • statista.com
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

  2. Z

    Data from: SQL Injection Attack Netflow

    • data.niaid.nih.gov
    • portalcienciaytecnologia.jcyl.es
    • +3more
    Updated Sep 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ignacio Crespo; Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6907251
    Explore at:
    Dataset updated
    Sep 28, 2022
    Authors
    Ignacio Crespo; Adrián Campazas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

    NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

    Datasets

    The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

    The datasets contain both benign and malicious traffic. All collected datasets are balanced.

    The version of NetFlow used to build the datasets is 5.

        Dataset
        Aim
        Samples
        Benign-malicious
        traffic ratio
    
    
    
    
        D1
        Training
        400,003
        50%
    
    
        D2
        Test
        57,239
        50%
    

    Infrastructure and implementation

    Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

    DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

    Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

    The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

    The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

        Parameters
        Description
    
    
    
    
        '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'
        Enumerate users, password hashes, privileges, roles, databases, tables and columns
    
    
        --level=5
        Increase the probability of a false positive identification
    
    
        --risk=3
        Increase the probability of extracting data
    
    
        --random-agent
        Select the User-Agent randomly
    
    
        --batch
        Never ask for user input, use the default behavior
    
    
        --answers="follow=Y"
        Predefined answers to yes
    

    Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

    The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

    However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

    To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

  3. Produced Water DNA Database (PW-DNA): Utilizing KBase to generate an...

    • osti.gov
    Updated Sep 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Geological Survey, Energy Resources and Environmental Health Programs (2025). Produced Water DNA Database (PW-DNA): Utilizing KBase to generate an environmental specific curated molecular database [Dataset]. http://doi.org/10.25982/156785.278/2588866
    Explore at:
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    National Energy Technology Laboratoryhttps://netl.doe.gov/
    Office of Sciencehttp://www.er.doe.gov/
    US Geological Survey, Energy Resources and Environmental Health Programs
    Description

    The deep subsurface is estimated to host the majority of Earth’s microbial biomass yet remains one of the most challenging environments to access and study. One common approach to investigate these microbial communities is through the analysis of produced water from subsurface reservoirs, where researchers can assess water and gas chemistry along with molecular (DNA/RNA) sequence data. Advances in high-throughput sequencing have greatly expanded our understanding of these environments and their biotechnological potential. However, further progress requires large-scale, integrative meta-analyses across diverse datasets. To address this need, we developed the Produced Water-DNA (PW-DNA) Database, a curated, publicly available resource that consolidates microbial DNA/RNA sequences, geochemical data, and relevant metadata from in situ hydrocarbon environments such as coal beds, oil reservoirs, and natural gas systems. The PW-DNA database delivers three core benefits to the research community: (1) it improves data sharing by linking environmental microbial datasets with corresponding geochemical parameters, enabling more robust filtering and analysis; (2) it connects with complementary research databases to promote broader dissemination and interoperability; and (3) it supports technological innovation by serving as a resource for identifying microbial trends and exploring genetic potential. While individual studies have highlighted basin-specific microbial communities and functional redundancy in biogeochemical cycling, a comprehensive, system-wide perspective is needed to better understand connectivity and novelty across subsurface ecosystems. By designing the PW-DNA in the KBase platform, we provide a reproducible, visual framework for integrating large-scale genomic and geochemical data, enabling researchers to perform more informed analyses and experimental design. Ultimately, this resource enhances the ability to identify, characterize, and interpret microbial functions across diverse subsurface environments, thereby accelerating discovery in subsurface microbiology and biotechnology.

  4. Bike Company Database

    • kaggle.com
    zip
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ABOLARIN DAMILARE MATTHEW (2023). Bike Company Database [Dataset]. https://www.kaggle.com/datasets/abolarindam/bike-company-database
    Explore at:
    zip(188165 bytes)Available download formats
    Dataset updated
    Jul 18, 2023
    Authors
    ABOLARIN DAMILARE MATTHEW
    Description

    Dataset

    This dataset was created by ABOLARIN DAMILARE MATTHEW

    Contents

  5. Data used by EPA researchers to generate illustrative figures for overview...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Nov 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Data used by EPA researchers to generate illustrative figures for overview article "Multiscale Modeling of Background Ozone: Research Needs to Inform and Improve Air Quality Management" [Dataset]. https://catalog.data.gov/dataset/data-used-by-epa-researchers-to-generate-illustrative-figures-for-overview-article-multisc
    Explore at:
    Dataset updated
    Nov 14, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Data sets used to prepare illustrative figures for the overview article “Multiscale Modeling of Background Ozone” Overview The CMAQ model output datasets used to create illustrative figures for this overview article were generated by scientists in EPA/ORD/CEMM and EPA/OAR/OAQPS. The EPA/ORD/CEMM-generated dataset consisted of hourly CMAQ output from two simulations. The first simulation was performed for July 1 – 31 over a 12 km modeling domain covering the Western U.S. The simulation was configured with the Integrated Source Apportionment Method (ISAM) to estimate the contributions from 9 source categories to modeled ozone. ISAM source contributions for July 17 – 31 averaged over all grid cells located in Colorado were used to generate the illustrative pie chart in the overview article. The second simulation was performed for October 1, 2013 – August 31, 2014 over a 108 km modeling domain covering the northern hemisphere. This simulation was also configured with ISAM to estimate the contributions from non-US anthropogenic sources, natural sources, stratospheric ozone, and other sources on ozone concentrations. Ozone ISAM results from this simulation were extracted along a boundary curtain of the 12 km modeling domain specified over the Western U.S. for the time period January 1, 2014 – July 31, 2014 and used to generate the illustrative time-height cross-sections in the overview article. The EPA/OAR/OAQPS-generated dataset consisted of hourly gridded CMAQ output for surface ozone concentrations for the year 2016. The CMAQ simulations were performed over the northern hemisphere at a horizontal resolution of 108 km. NO2 and O3 data for July 2016 was extracted from these simulations generate the vertically-integrated column densities shown in the illustrative comparison to satellite-derived column densities. CMAQ Model Data The data from the CMAQ model simulations used in this research effort are very large (several terabytes) and cannot be uploaded to ScienceHub due to size restrictions. The model simulations are stored on the /asm archival system accessible through the atmos high-performance computing (HPC) system. Due to data management policies, files on /asm are subject to expiry depending on the template of the project. Files not requested for extension after the expiry date are deleted permanently from the system. The format of the files used in this analysis and listed below is ioapi/netcdf. Documentation of this format, including definitions of the geographical projection attributes contained in the file headers, are available at https://www.cmascenter.org/ioapi/ Documentation on the CMAQ model, including a description of the output file format and output model species can be found in the CMAQ documentation on the CMAQ GitHub site at https://github.com/USEPA/CMAQ. This dataset is associated with the following publication: Hogrefe, C., B. Henderson, G. Tonnesen, R. Mathur, and R. Matichuk. Multiscale Modeling of Background Ozone: Research Needs to Inform and Improve Air Quality Management. EM Magazine. Air and Waste Management Association, Pittsburgh, PA, USA, 1-6, (2020).

  6. 16S V4-V5 metabarcoding reference databases and weighted naive-bayes...

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Aug 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine Silliman; Katherine Silliman; Luke Thompson; Luke Thompson (2023). 16S V4-V5 metabarcoding reference databases and weighted naive-bayes classifiers [Dataset]. http://doi.org/10.5281/zenodo.8301740
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 31, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katherine Silliman; Katherine Silliman; Luke Thompson; Luke Thompson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    16S metabarcoding databases and naive-bayes classifiers specific to the V4-V5 region. Built from the Silva 138.1 SSU Ref NR 99 database using Qiime2 (version 2023.2 and 2023.5) and the q2-clawback plugin. Includes weighted classifiers for two Earth Microbiome Project Ontology (EMPO) 3 habitat types: "sediment (saline)" and "water (saline)" , with data downloaded from Qiita. Sequences were not dereplicated.

    Primers used:

    EMP 16S 515f: GTGYCAGCMGCCGCGGTAA

    EMP 16S 926r: CCGYCAATTYMTTTRAGTTT

    Stats

    286,948 unique sequences

    388,496 total sequences

    46,254 unique taxa (Level 7)

    File description
    FileDescription
    make new 16S silva V4-V5 database.mdMarkdown with code used to generate databases
    silva-138-99-seqs.qzaFull length Silva 138.1 SSU 99 sequences
    silva-138-99-tax.qzaTaxa for full length Silva 138.1 SSU 99 database
    silva-138_1-99-515f_926r-seqs.qzaSequences for 16S V4-V5 (primers 515f, 926r), extracted from Silva 138.1 SSU 99, generated by qiime2-2023.2 (forward compatible)
    silva-138_1-99-515f_926r-taxa.qzaTaxa for silva-138_1-99-515f_926r-seqs.qza database
    uniform-silva-138_1-99-515f_926r-classifier.qzaUnweighted (uniform) naive-bayes classifier for 16S V4-V5 (primers 515f, 926r) extracted from Silva 138.1 SSU 99, generated by qiime2-2023.2 (forward compatible)
    silva-138_1-99-515f_926r-q2_2023_2-sediment-saline-classifier.qzaWeighted naive-bayes classifier for 16S V4-V5 (primers 515f, 926r) extracted from Silva 138.1 SSU 99, weighted for sediment-saline, generated by qiime2-2023.2 (forward compatible)
    silva-138_1-99-515f_926r-q2_2023_2-sediment-saline-weights.qzaWeights used to generate silva-138_1-99-515f_926r-q2_2023_2-sediment-saline-classifier.qza
    silva-138_1-99-515f_926r-q2_2023_5-sediment-saline-classifier.qzaWeighted naive-bayes classifier for 16S V4-V5 (primers 515f, 926r) extracted from Silva 138.1 SSU 99, weighted for sediment-saline, generated by qiime2-2023.5, NOT backwards compatible with older qiime2 versions
    silva-138_1-99-515f_926r-water-saline-classifier.qzaWeighted naive-bayes classifier for 16S V4-V5 (primers 515f, 926r) extracted from Silva 138.1 SSU 99, weighted for water-saline, generated by qiime2-2023.2 (forward compatible)
    silva-138_1-99-515f_926r-water-saline-weights.qzaWeights used to generate silva-138_1-99-515f_926r-water-saline-classifier.qza

  7. Data from: Creating Database and tables

    • kaggle.com
    zip
    Updated Sep 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikita Kumari (2023). Creating Database and tables [Dataset]. https://www.kaggle.com/datasets/nikitabhardwaj029/creating-database-and-tables
    Explore at:
    zip(7027 bytes)Available download formats
    Dataset updated
    Sep 27, 2023
    Authors
    Nikita Kumari
    Description

    Dataset

    This dataset was created by Nikita Kumari

    Contents

  8. ASTER Global Water Bodies Database V001 - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). ASTER Global Water Bodies Database V001 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/aster-global-water-bodies-database-v001-7ff0b
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Water Bodies Database (ASTWBD) Version 1 data product provides global coverage of water bodies larger than 0.2 square kilometers at a spatial resolution of 1 arc second (approximately 30 meters) at the equator, along with associated elevation information. The ASTWBD data product was created in conjunction with the ASTER Global Digital Elevation Model (ASTER GDEM) Version 3 data product by the Sensor Information Laboratory Corporation (SILC) in Tokyo. The ASTER GDEM Version 3 data product was generated using ASTER Level 1A scenes acquired between March 1, 2000, and November 30, 2013. The ASTWBD data product was then generated to correct elevation values of water body surfaces.To generate the ASTWBD data product, water bodies were separated from land areas and then classified into three categories: ocean, river, or lake. Oceans and lakes have a flattened, constant elevation value. The effects of sea ice were manually removed from areas classified as oceans to better delineate ocean shorelines in high latitude areas. For lake water bodies, the elevation for each lake was calculated from the perimeter elevation data using the mosaic image that covers the entire area of the lake. Rivers presented a unique challenge given that their elevations gradually step down from upstream to downstream; therefore, visual inspection and other manual detection methods were required. The geographic coverage of the ASTWBD extends from 83°N to 83°S. Each tile is distributed in GeoTIFF format and referenced to the 1984 World Geodetic System (WGS84)/1996 Earth Gravitational Model (EGM96) geoid. Each data product is provided as a zipped file that contains an attribute file with the water body classification information and a DEM file, which provides elevation information in meters.

  9. f

    Data from: Correlated RNN Framework to Quickly Generate Molecules with...

    • acs.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chuan Li; Chenghui Wang; Ming Sun; Yan Zeng; Yuan Yuan; Qiaolin Gou; Guangchuan Wang; Yanzhi Guo; Xuemei Pu (2023). Correlated RNN Framework to Quickly Generate Molecules with Desired Properties for Energetic Materials in the Low Data Regime [Dataset]. http://doi.org/10.1021/acs.jcim.2c00997.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Chuan Li; Chenghui Wang; Ming Sun; Yan Zeng; Yuan Yuan; Qiaolin Gou; Guangchuan Wang; Yanzhi Guo; Xuemei Pu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Motivated by the challenging of deep learning on the low data regime and the urgent demand for intelligent design on highly energetic materials, we explore a correlated deep learning framework, which consists of three recurrent neural networks (RNNs) correlated by the transfer learning strategy, to efficiently generate new energetic molecules with a high detonation velocity in the case of very limited data available. To avoid the dependence on the external big data set, data augmentation by fragment shuffling of 303 energetic compounds is utilized to produce 500,000 molecules to pretrain RNN, through which the model can learn sufficient structure knowledge. Then the pretrained RNN is fine-tuned by focusing on the 303 energetic compounds to generate 7153 molecules similar to the energetic compounds. In order to more reliably screen the molecules with a high detonation velocity, the SMILE enumeration augmentation coupled with the pretrained knowledge is utilized to build an RNN-based prediction model, through which R2 is boosted from 0.4446 to 0.9572. The comparable performance with the transfer learning strategy based on an existing big database (ChEMBL) to produce the energetic molecules and drug-like ones further supports the effectiveness and generality of our strategy in the low data regime. High-precision quantum mechanics calculations further confirm that 35 new molecules present a higher detonation velocity and lower synthetic accessibility than the classic explosive RDX, along with good thermal stability. In particular, three new molecules are comparable to caged CL-20 in the detonation velocity. All the source codes and the data set are freely available at https://github.com/wangchenghuidream/RNNMGM.

  10. generate-data

    • kaggle.com
    zip
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vinhnguyen010111 (2025). generate-data [Dataset]. https://www.kaggle.com/datasets/vinhnguyen010111/generate-data/versions/1
    Explore at:
    zip(287890334 bytes)Available download formats
    Dataset updated
    May 11, 2025
    Authors
    vinhnguyen010111
    Description

    Dataset

    This dataset was created by vinhnguyen010111

    Contents

  11. A complementary EsMeCaTa precomputed database for phyla with fewer sequenced...

    • zenodo.org
    zip
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arnaud Belcour; Arnaud Belcour; Hidde de Jong; Hidde de Jong; Delphine Ropers; Delphine Ropers (2025). A complementary EsMeCaTa precomputed database for phyla with fewer sequenced genomes [Dataset]. http://doi.org/10.5281/zenodo.17224194
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Arnaud Belcour; Arnaud Belcour; Hidde de Jong; Hidde de Jong; Delphine Ropers; Delphine Ropers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Presentation

    This is a secondary precomputed database of EsMeCaTa for phyla with fewer sequenced genomes. It complements the EsMeCaTa precomputed database. EsMeCaTa's default parameters ignored these phyla. This database has been generated using lower threshold values for esmecata proteomes in order to include these phyla: busco_percentage_keep/--busco to 55 and minimal_number_proteomes/--minimal-nb-proteomes to 3.

    This repository contains two files:

    • esmecata_database_phyla.zip: zip file containing the files of the precomputed database for the different phyla. It is the file to use with esmecata precomputed command.
    • database_phyla_proteomes_folder.zip: zip file containing several files/folders:
      • scripts to generate input files for esmecata (0_find_phyla_proteomes.py and 1_extract_phyla_poorly_characterised.py). They require the first precomputed database for esmecata (esmecata_database.zip) and generate two files (esmecata_input_phyla.tsv and phylum_uniprot_proteomes.tsv).
      • proteomes_phyla: a folder containing results from esmecata proteomes command with the following parameters: input file esmecata_input_phyla.tsv and with option "--busco 55 --minimal-nb-proteomes 3".

    Usage

    Since EsMeCaTa version 0.6.6, it can be used in conjunction of the first precomputed database:

    esmecata precomputed -i input_file.tsv -o output_folder -d "esmecata_database.zip esmecata_database_phyla.zip"

    Dendencies used to create the database

    DependenciesVersion
    UniProt2025_02
    DateMay 2025
    NCBI Taxonomy database2025-05-01
    esmecata0.6.5
    mmseqs215.6f452
    eggnog database5.0.2
    eggnog-mapper2.1.12
    ete44.3.0
    pandas2.2.2
    biopython1.83
    requests2.32.3
    SPARQLWrapper2.0.0

    Acknowledgements

    Most of the computations presented in this work were performed using the GRICAD infrastructure (https://gricad.univ-grenoble-alpes.fr), which is supported by the Grenoble research community.

    The work was funded by the ANR project HyLife (ANR-23-CETP-0002) associated with the CETP project HyLife.

  12. f

    All original values used to generate graphical data.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kim, Jisun; Parker, Dane; Peignier, Adeline; Lemenze, Alexander (2024). All original values used to generate graphical data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001332051
    Explore at:
    Dataset updated
    Oct 17, 2024
    Authors
    Kim, Jisun; Parker, Dane; Peignier, Adeline; Lemenze, Alexander
    Description

    All original values used to generate graphical data.

  13. D

    SQL Generation AI Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). SQL Generation AI Market Research Report 2033 [Dataset]. https://dataintelo.com/report/sql-generation-ai-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    SQL Generation AI Market Outlook



    According to our latest research, the global SQL Generation AI market size reached USD 1.42 billion in 2024, reflecting a robust expansion driven by the rapid adoption of artificial intelligence technologies in database management and analytics. The market is set to grow at a compelling CAGR of 27.6% from 2025 to 2033, with the total market size forecasted to reach USD 13.18 billion by 2033. This remarkable growth trajectory is primarily fueled by advancements in natural language processing, the increasing complexity of enterprise data environments, and the demand for automation in SQL query generation to enhance productivity and reduce operational costs.




    The primary growth factors propelling the SQL Generation AI market revolve around the escalating need for data-driven decision-making and the democratization of data access across organizations. As enterprises generate and store vast amounts of data, the ability to quickly and accurately extract actionable insights becomes critical. SQL Generation AI solutions, leveraging advanced machine learning and natural language processing algorithms, enable non-technical users to generate complex SQL queries using simple natural language instructions. This not only reduces the dependency on specialized database administrators but also accelerates the pace of business intelligence and analytics initiatives. The proliferation of self-service analytics and the integration of AI-powered query generation into popular business intelligence platforms further amplify market growth, making it easier for organizations to unlock the value of their data assets.




    Another significant driver is the ongoing digital transformation across various industries, which has led to the modernization of legacy IT infrastructures and the adoption of cloud-based data management solutions. Organizations are increasingly migrating their databases to the cloud to benefit from scalability, flexibility, and cost-efficiency. SQL Generation AI tools are being integrated with cloud data warehouses and analytics platforms, allowing for seamless query generation and real-time data analysis. This shift not only optimizes data workflows but also supports hybrid and multi-cloud strategies, enabling enterprises to manage and analyze data across diverse environments. The rising volume and diversity of data, coupled with the need for real-time insights, are compelling organizations to invest in AI-powered SQL generation to maintain a competitive edge.




    Additionally, the COVID-19 pandemic has accelerated the adoption of digital technologies, including AI-driven SQL generation, as organizations seek to automate routine tasks and enhance operational resilience. The growing emphasis on remote work and distributed teams has highlighted the importance of intuitive data access and collaboration tools. SQL Generation AI solutions facilitate seamless collaboration between business users and data teams, bridging the gap between technical and non-technical stakeholders. This has led to increased demand across sectors such as BFSI, healthcare, retail, and manufacturing, where timely data insights are crucial for strategic decision-making. The market is also witnessing heightened interest from small and medium enterprises, which are leveraging AI-powered SQL generation to level the playing field with larger competitors.




    Regionally, North America continues to dominate the SQL Generation AI market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The presence of major technology vendors, early adoption of AI and cloud technologies, and a strong focus on data-driven innovation contribute to North America's leadership position. Europe is witnessing rapid growth, driven by stringent data regulations and increasing investments in digital transformation initiatives. Meanwhile, Asia Pacific is emerging as a high-growth region, fueled by expanding IT infrastructure, a burgeoning startup ecosystem, and rising demand for advanced analytics solutions in countries such as China, India, and Japan. Latin America and the Middle East & Africa are also showing promising growth potential as organizations in these regions accelerate their digital journeys.



    Component Analysis



    The SQL Generation AI market by component is broadly segmented into Software and Services. The software segment commands the majority market share, as organizations increasingly dep

  14. d

    Data from: On-farm wildflower plantings generate opposing reproductive...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: On-farm wildflower plantings generate opposing reproductive outcomes for solitary and bumble bee species [Dataset]. https://catalog.data.gov/dataset/data-from-on-farm-wildflower-plantings-generate-opposing-reproductive-outcomes-for-solitar
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Pollinator habitat can be planted on farms to enhance floral and nesting resources, and subsequently, pollinator populations. There is ample evidence linking such plantings to greater pollinator abundance on farms, but less is known about their effects on pollinator reproduction. We placed Bombus impatiens Cresson (Hymenoptera: Apidae) and Megachile rotundata (F.) (Hymenoptera: Megachilidae) nests out on 19 Mid-Atlantic farms in 2018, where half (n=10) the farms had established wildflower plantings and half (n=9) did not. Bombus impatiens nests were placed at each farm in spring and mid-summer and repeatedly weighed to capture colony growth. We quantified the relative production of reproductive castes and assessed parasitism rates by screening for conopid fly parasitism and Nosema spores within female workers. We also released M. rotundata cocoons at each farm in spring and collected new nests and emergent adult offspring over the next year, recording female weight as an indicator of reproductive potential and quantifying Nosema parasitism and parasitoid infection rates. Bombus impatiens nests gained less weight and contained female workers with Nosema spore loads over 150x greater on farms with wildflower plantings. In contrast, M. rotundata female offspring weighed more on farms with wildflower plantings and marginally less on farms with honey bee hives. We conclude that wildflower plantings likely enhance reproduction in some species, but that they could also enhance microsporidian parasitism rates in susceptible bee species. It will be important to determine how wildflower planting benefits can be harnessed while minimizing parasitism in wild and managed bee species.

  15. Data used to produce figures and tables

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated May 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Data used to produce figures and tables [Dataset]. https://catalog.data.gov/dataset/data-used-to-produce-figures-and-tables-c6864
    Explore at:
    Dataset updated
    May 15, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The data set was used to produce tables and figures in paper. This dataset is associated with the following publications: Lytle, D., S. Pfaller, C. Muhlen, I. Struewing, S. Triantafyllidou, C. White, S. Hayes, D. King, and J. Lu. A Comprehensive Evaluation of Monochloramine Disinfection on Water Quality, Legionella and Other Important Microorganisms in a Hospital. WATER RESEARCH. Elsevier Science Ltd, New York, NY, USA, 189: 116656, (2021). Lytle, D., C. Formal, K. Cahalan, C. Muhlen, and S. Triantafyllidou. The Impact of Sampling Approach and Daily Water Usage on Lead Levels Measured at the Tap. WATER RESEARCH. Elsevier Science Ltd, New York, NY, USA, 197: 117071, (2021).

  16. a

    Lake

    • hub.arcgis.com
    • data-floridaswater.opendata.arcgis.com
    • +1more
    Updated May 13, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SJRWMDGeospatialSolutions (2016). Lake [Dataset]. https://hub.arcgis.com/datasets/49d2f409705045dd96e441f0b5463d18
    Explore at:
    Dataset updated
    May 13, 2016
    Dataset authored and provided by
    SJRWMDGeospatialSolutions
    Area covered
    Description

    Note: This description is taken from a draft report entitled "Creation of a Database of Lakes in the St. Johns River Water Management District of Northeast Florida" by Palmer Kinser. Introduction“Lakes are among the District’s most valued resources. Their aesthetic appeal adds substantially to waterfront property values, which in turn generate tax revenues for local governments. Fish camps and other businesses, that provide lake visitors with supplies and services, benefit local economies directly. Commercial fishing on the District’s larger lakes produces some income, , but far greater economic benefits are produced from sport fishing. Some of the best bass fishing lakes in the world occur in the District. Trophy fishing, guide services and high-stakes fishing tournaments, which they support, also generate substantial revenues for local economies. In addition, the high quality of District lakes has allowed swimming, fishing, and boating to become among the most popular outdoor activities for many District residents and attracts many visitors. Others frequently take advantage of the abundant opportunities afforded for duck hunting, bird watching, photography, and other nature related activities.”(from likelihood of harm to lakes report).ObjectiveThe objective of this work was to create a consistent database of natural lake polygon features for the St. Johns River Water Management District. Other databases examined contained point features only, polygons representing a wide range of dates, water bodies not separated or coded adequately by feature type (i.e. no distinctions were made between lakes, rivers, excavations, etc.), or were incomplete. This new database will allow users to better characterize and measure the lakes resource of the District, allowing comparisons to be made and trends detected; thereby facilitating better protection and management of the resource.BackgroundPrior to creation of this database, the District had 2 waterbody databases. The first of these, the 2002 FDEP Primary Lake Location database, contained 3859 lake point features, state-wide, 1418 of which were in SJRWMD. Only named lakes were included. Data sources were the Geographic Names Information System (GNIS), USGS 1:24000 hydrography data, 1994 Digital orthophoto quarter quadrangles (DOQQs), and USGS digital raster graphics (DRGs). The second was the SJRWMD Hydrologic Network (Lake / Pond and Reservoir classes). This data base contained 42,002 lake / pond and reservoir features for the SJRWMD. Lakes with multiple pools of open water were often mapped as multiple features and many man-made features (borrow pits, reservoirs, etc.) were included. This dataset was developed from USGS map data of varying dates.MethodsPolygons in this new lakes dataset were derived from a "wet period" landcover map (SJRWMD, 1999), in which most lake levels were relatively high. Polygons from other dates, mostly 2009, were used for lakes in regionally dry locations or for lakes that were uncharacteristically wet in 1999, e.g. Alachua Sink. Our intension was to capture lakes in a basin-full condition; neither unusually high nor low. To build the data set, a selection was made of polygons coded as lakes (5200), marshy lakes (5250, enclosed saltwater ponds in salt marsh (5430), slough waters (5600), and emergent aquatic vegetation (6440). Some large, regionally significant or named man-made reservoirs were also included, as well as a small number of named excavations. All polygons were inspected and edited, where appropriate, to correct lake shores and merge adjacent lake basin features. Water polygons separated by marshes or other low-ground features were grouped and merged to form multipart features when clearly associated within a single lake basin. The initial set of lake names were captured from the Florida Primary Lake Location database. Labels were then moved where needed to insure that they fell within the water bodies referenced. Additional lake names were hand entered using data from USGS 7.5 minute quads, Google Maps, MapQuest, Florida Department of Transportation (FDOT) county maps, and other sources. The final dataset contains 4892 polygons, many of which are multi-part.Operationally, lakes, as captured in this data base, are those features that were identified and mapped using the District’s landuse/landcover scheme in the 5200, 5250, 5430, 5600 classes referenced above; in addition to some areas mapped tin the 6440 class. Some additional features named as lakes, ponds, or reservoirs were also included, even when not currently appearing to be lakes. Some are now very marshy or even dry, but apparently held deeper pools of water in the past. A size limit of 1 acre or more was enforced, except for named features, 30 of which were smaller. The smallest lake was Fox Lake, a doline of 0.04 acres in Orange county. The largest lake, Lake George covered 43,212.8 acres.The lakes of the SJRWMD are a diverse set of features that may be classified in many ways. These include: by surrounding landforms or landcover, by successional stage (lacustrine to palustrine gradient), by hydrology (presence of inflows and/or outflows, groundwater linkages, permanence, etc.), by water quality (trophic state, water color, dissolved solids, etc.), and by origin. We chose to classify the lakes in this set by origin, based on the lake type concepts of Hutchinson (1957). These types are listed in the table below (Table 1). We added some additional types and modified the descriptions to better reflect Florida’s geological conditions (Table 2). Some types were readily identified, others are admittedly conjectural or were of mixed origins, making it difficult to pick a primary mechanism. Geological map layers, particularly total thickness of overburden above the Floridan aquifer system and thickness of the intermediate confining unit, were used to estimate the likelihood of sinkhole formation. Wind sculpting appears to be common and sometimes is a primary mechanism but can be difficult to judge from remotely sensed imagery. For these and others, the classification should be considered provisional. Many District lakes appear to have been formed by several processes, for instance, sinkholes may occur within lakes which lie between sand dunes. Here these would be classified as dune / karst. Mixtures of dunes, deflation and karst are common. Saltmarsh ponds vary in origin and were not further classified. In the northern coastal area they are generally small, circular in outline and appear to have been formed by the collapse and breakdown of a peat substrate, Hutchinson type 70. Further south along the coast additional ponds have been formed by the blockage of tidal creeks, a fluvial process, perhaps of Hutchinson’s Type 52, lateral lakes, in which sediments deposited by a main stream back up the waters of a tributary. In the area of the Cape Canaveral, many salt marsh ponds clearly occupy dune swales flooded by rising ocean levels. A complete listing of lake types and combinations is in Table 3. TypeSub-TypeSecondary TypeTectonic BasinsMarine BasinTectonic BasinsMarine BasinCompound dolineTectonic BasinsMarine BasinkarstTectonic BasinsMarine BasinPhytogenic damTectonic BasinsMarine BasinAbandoned channelTectonic BasinsMarine BasinKarstSolution LakesCompound dolineSolution LakesCompound dolineFluvialSolution LakesCompound dolinePhytogenicSolution LakesDolineSolution LakesDolineDeflationSolution LakesDolineDredgedSolution LakesDolineExcavatedSolution LakesDolineExcavationSolution LakesDolineFluvialSolution LakesKarstKarst / ExcavationSolution LakesKarstKarst / FluvialSolution LakesKarstDeflationSolution LakesKarstDeflation / excavationSolution LakesKarstExcavationSolution LakesKarstFluvialSolution LakesPoljeSolution LakesSpring poolSolution LakesSpring poolFluvialFluvialAbandoned channelFluvialFluvialFluvial Fluvial PhytogenicFluvial LeveeFluvial Oxbow lakeFluvial StrathFluvial StrathPhytogenicAeolianDeflationAeolianDeflationDuneAeolianDeflationExcavationAeolianDeflationKarstAeolianDuneAeolianDune DeflationAeolianDuneExcavationAeolianDuneAeolianDuneKarstShoreline lakesMaritime coastalKarst / ExcavationOrganic accumulationPhytogenic damSalt Marsh PondsMan madeExcavationMan madeDam

  17. h

    generated-usa-passeports-dataset

    • huggingface.co
    Updated Jul 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2023). generated-usa-passeports-dataset [Dataset]. https://huggingface.co/datasets/UniqueData/generated-usa-passeports-dataset
    Explore at:
    Dataset updated
    Jul 15, 2023
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Data generation in machine learning involves creating or manipulating data to train and evaluate machine learning models. The purpose of data generation is to provide diverse and representative examples that cover a wide range of scenarios, ensuring the model's robustness and generalization. Data augmentation techniques involve applying various transformations to existing data samples to create new ones. These transformations include: random rotations, translations, scaling, flips, and more. Augmentation helps in increasing the dataset size, introducing natural variations, and improving model performance by making it more invariant to specific transformations. The dataset contains GENERATED USA passports, which are replicas of official passports but with randomly generated details, such as name, date of birth etc. The primary intention of generating these fake passports is to demonstrate the structure and content of a typical passport document and to train the neural network to identify this type of document. Generated passports can assist in conducting research without accessing or compromising real user data that is often sensitive and subject to privacy regulations. Synthetic data generation allows researchers to develop and refine models using simulated passport data without risking privacy leaks.

  18. f

    Data from: Database Creator for Mass Analysis of Peptides and Proteins,...

    • figshare.com
    • acs.figshare.com
    txt
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pandi Boomathi Pandeswari; Arnold Emerson Isaac; Varatharajan Sabareesh (2023). Database Creator for Mass Analysis of Peptides and Proteins, DC-MAPP: A Standalone Tool for Simplifying Manual Analysis of Mass Spectral Data to Identify Peptide/Protein Sequences [Dataset]. http://doi.org/10.1021/jasms.3c00030.s005
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Pandi Boomathi Pandeswari; Arnold Emerson Isaac; Varatharajan Sabareesh
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Proteomic studies typically involve the use of different types of software for annotating experimental tandem mass spectrometric data (MS/MS) and thereby simplifying the process of peptide and protein identification. For such annotations, these softwares calculate the m/z values of the peptide/protein precursor and fragment ions, for which a database of protein sequences must be provided as an input file. The calculated m/z values are stored as another database, which the user usually cannot view. Database Creator for Mass Analysis of Peptides and Proteins (DC-MAPP) is a novel standalone software that can create custom databases for “viewing” the calculated m/z values of precursor and fragment ions, prior to the database search. It contains three modules. Peptide/Protein sequences as per user’s choice can be entered as input to the first module for creating a custom database. In the second module, m/z values must be queried-in, which are searched within the custom database to identify protein/peptide sequences. The third module is suited for peptide mass fingerprinting, which can be used to analyze both ESI and MALDI mass spectral data. The feature of “viewing” the custom database can be helpful not only for better understanding the search engine processes, but also for designing multiple reaction monitoring (MRM) methods. Post-translational modifications and protein isoforms can also be analyzed. Since, DC-MAPP relies on the protein/peptide “sequences” for creating custom databases, it may not be applicable for the searches involving spectral libraries. Python language was used for implementation, and the graphical user interface was built with Page/Tcl, making this tool more user-friendly. It is freely available at https://vit.ac.in/DC-MAPP/.

  19. ai generated faces

    • kaggle.com
    zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    misteick (2022). ai generated faces [Dataset]. https://www.kaggle.com/datasets/chelove4draste/ai-generated-faces
    Explore at:
    zip(105847789285 bytes)Available download formats
    Dataset updated
    Sep 20, 2022
    Authors
    misteick
    Description

    Fully AI generated human faces. Github page of the dataset

  20. Synthetic Data Generation Market Analysis, Size, and Forecast 2025-2029:...

    • technavio.com
    pdf
    Updated May 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Synthetic Data Generation Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Italy, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/synthetic-data-generation-market-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 3, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Description

    Snapshot img

    Synthetic Data Generation Market Size 2025-2029

    The synthetic data generation market size is forecast to increase by USD 4.39 billion, at a CAGR of 61.1% between 2024 and 2029.

    The market is experiencing significant growth, driven by the escalating demand for data privacy protection. With increasing concerns over data security and the potential risks associated with using real data, synthetic data is gaining traction as a viable alternative. Furthermore, the deployment of large language models is fueling market expansion, as these models can generate vast amounts of realistic and diverse data, reducing the reliance on real-world data sources. However, high costs associated with high-end generative models pose a challenge for market participants. These models require substantial computational resources and expertise to develop and implement effectively. Companies seeking to capitalize on market opportunities must navigate these challenges by investing in research and development to create more cost-effective solutions or partnering with specialists in the field. Overall, the market presents significant potential for innovation and growth, particularly in industries where data privacy is a priority and large language models can be effectively utilized.

    What will be the Size of the Synthetic Data Generation Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free SampleThe market continues to evolve, driven by the increasing demand for data-driven insights across various sectors. Data processing is a crucial aspect of this market, with a focus on ensuring data integrity, privacy, and security. Data privacy-preserving techniques, such as data masking and anonymization, are essential in maintaining confidentiality while enabling data sharing. Real-time data processing and data simulation are key applications of synthetic data, enabling predictive modeling and data consistency. Data management and workflow automation are integral components of synthetic data platforms, with cloud computing and model deployment facilitating scalability and flexibility. Data governance frameworks and compliance regulations play a significant role in ensuring data quality and security. Deep learning models, variational autoencoders (VAEs), and neural networks are essential tools for model training and optimization, while API integration and batch data processing streamline the data pipeline. Machine learning models and data visualization provide valuable insights, while edge computing enables data processing at the source. Data augmentation and data transformation are essential techniques for enhancing the quality and quantity of synthetic data. Data warehousing and data analytics provide a centralized platform for managing and deriving insights from large datasets. Synthetic data generation continues to unfold, with ongoing research and development in areas such as federated learning, homomorphic encryption, statistical modeling, and software development. The market's dynamic nature reflects the evolving needs of businesses and the continuous advancements in data technology.

    How is this Synthetic Data Generation Industry segmented?

    The synthetic data generation industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. End-userHealthcare and life sciencesRetail and e-commerceTransportation and logisticsIT and telecommunicationBFSI and othersTypeAgent-based modellingDirect modellingApplicationAI and ML Model TrainingData privacySimulation and testingOthersProductTabular dataText dataImage and video dataOthersGeographyNorth AmericaUSCanadaMexicoEuropeFranceGermanyItalyUKAPACChinaIndiaJapanRest of World (ROW)

    By End-user Insights

    The healthcare and life sciences segment is estimated to witness significant growth during the forecast period.In the rapidly evolving data landscape, the market is gaining significant traction, particularly in the healthcare and life sciences sector. With a growing emphasis on data-driven decision-making and stringent data privacy regulations, synthetic data has emerged as a viable alternative to real data for various applications. This includes data processing, data preprocessing, data cleaning, data labeling, data augmentation, and predictive modeling, among others. Medical imaging data, such as MRI scans and X-rays, are essential for diagnosis and treatment planning. However, sharing real patient data for research purposes or training machine learning algorithms can pose significant privacy risks. Synthetic data generation addresses this challenge by producing realistic medical imaging data, ensuring data privacy while enabling research and development. Moreover

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Organization logo

Data generation volume worldwide 2010-2029

Explore at:
Dataset updated
Nov 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description

The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

Search
Clear search
Close search
Google apps
Main menu