45 datasets found
  1. CSE-CIC-IDS2018

    • kaggle.com
    • huggingface.co
    Updated Aug 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    StrGenIx | Laurens D'hooge (2022). CSE-CIC-IDS2018 [Dataset]. http://doi.org/10.34740/kaggle/dsv/4059899
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    StrGenIx | Laurens D'hooge
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is an academic intrusion detection dataset. All the credit goes to the original authors: Dr. Iman Sharafaldin, Dr. Arash Habibi Lashkari Dr. Ali Ghorbani. Please cite their original paper.

    It was published by the Canadian Institute for Cybersecurity and is the successor to CIC-IDS2017. The biggest difference is the move away from on-premise infrastructure to AWS to generate the dataset. It also vastly increased the representation of 'Infiltration' traffic compared to CIC-IDS2017.

    V1: Base dataset in CSV format as downloaded from here V2: Cleaning -> parquet files V3: Reorganize to save storage, only keep original CSVs in V1/V2

    In the parquet files all data types are already set correctly, there are 0 records with missing information and 0 duplicate records in this clean version. Baseline classification scores with simple models will be available shorty.

  2. m

    CSE-CIC-IDS2018

    • data.mendeley.com
    Updated Feb 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdisalam Mohamed (2024). CSE-CIC-IDS2018 [Dataset]. http://doi.org/10.17632/29hdbdzx2r.1
    Explore at:
    Dataset updated
    Feb 5, 2024
    Authors
    Abdisalam Mohamed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A cleaned version of CSE-CIC-IDS2018 dataset

  3. h

    CSE-CIC-IDS2018-V2

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abluva Inc, CSE-CIC-IDS2018-V2 [Dataset]. https://huggingface.co/datasets/abluva/CSE-CIC-IDS2018-V2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Abluva Inc
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is the updated version CSE-CIC-IDS 2018 dataset. The data is normalised and 1 new class "Comb" which is a combination of existing attacks is added. To cite the dataset, please reference the original paper with DOI: 10.1109/SmartNets61466.2024.10577645. The paper is published in IEEE SmartNets and can be accessed here: https://www.researchgate.net/publication/382034618_Blender-GAN_Multi-Target_Conditional_Generative_Adversarial_Network_for_Novel_Class_Synthetic_Data_Generation. Citation… See the full description on the dataset page: https://huggingface.co/datasets/abluva/CSE-CIC-IDS2018-V2.

  4. CIC-IDS-2018-parquet

    • kaggle.com
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lima mateus (2024). CIC-IDS-2018-parquet [Dataset]. https://www.kaggle.com/datasets/limamateus/cic-ids-2018-parquet/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    lima mateus
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by lima mateus

    Released under Apache 2.0

    Contents

  5. s

    Citation Trends for "Optimizing Intrusion Detection Systems in Three Phases...

    • shibatadb.com
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2023). Citation Trends for "Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset" [Dataset]. https://www.shibatadb.com/article/7VjrMiHC
    Explore at:
    Dataset updated
    Nov 24, 2023
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2024 - 2025
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset".

  6. h

    cic-ids-2018-alldata-textual

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiva Prasad Gyawali, cic-ids-2018-alldata-textual [Dataset]. https://huggingface.co/datasets/gyawalishiva/cic-ids-2018-alldata-textual
    Explore at:
    Authors
    Shiva Prasad Gyawali
    Description

    gyawalishiva/cic-ids-2018-alldata-textual dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. Y

    Citation Network Graph

    • shibatadb.com
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2023). Citation Network Graph [Dataset]. https://www.shibatadb.com/article/7VjrMiHC
    Explore at:
    Dataset updated
    Nov 24, 2023
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Description

    Network of 45 papers and 67 citation links related to "Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset".

  8. Intrusion Detection System Market Analysis North America, APAC, Europe,...

    • technavio.com
    pdf
    Updated Oct 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Intrusion Detection System Market Analysis North America, APAC, Europe, Middle East and Africa, South America - US, China, UK, Germany, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/intrusion-detection-system-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 23, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2024 - 2028
    Area covered
    United Kingdom, United States
    Description

    Snapshot img

    Intrusion Detection System Market Size 2024-2028

    The intrusion detection system market size is forecast to increase by USD 4.65 billion at a CAGR of 14% between 2023 and 2028.

    The market is witnessing significant growth due to the escalating number of cyberattacks and the need to secure IT service infrastructure, particularly in the banking and financial services industry (BFSI). IDS solutions employ two primary identification techniques: signature-based and anomaly detection. Signature-based identification relies on known attack patterns, while anomaly detection identifies deviations from normal behavior.
    Additionally, with the rise in digital transactions, there is a growing emphasis on securing security architecture through traffic monitoring and intrusion detection. The market is driven by the increasing demand for BFSI applications and the subsequent need to protect against cyber threats. However, the high cost of maintaining IDS solutions remains a challenge. In conclusion, the IDS market is expected to continue growing as organizations prioritize securing their IT infrastructure against cyber threats.
    

    What will be the Size of the Market During the Forecast Period?

    Request Free Sample

    The Intrusion Detection System (IDS) market is a significant segment of the cybersecurity industry, playing a crucial role in safeguarding IT infrastructure against various cyber threats. IDS solutions help identify and prevent unauthorized access, malicious activities, and potential security breaches. These systems can be categorized into Network Intrusion Detection Systems (NIDS) and Host-based Intrusion Detection Systems (HIDS). IDS and Intrusion Prevention Systems (IPS) are essential components of an organization's cybersecurity strategy. IPS goes beyond simple identification and provides real-time prevention of attacks. Both IDS and IPS are instrumental in mitigating risks from phishing incidents, cyberattacks, and other malicious threats.
    Additionally, cybersecurity is a major concern for various sectors, including BFSI applications, telecom, defense, and cloud computing. With the increasing reliance on IT infrastructure and work from home arrangements, cybersecurity expenditure has seen a significant rise. IDS and IPS solutions are integral to securing data and maintaining information security. Cybercrimes are on the rise, with malicious threat actors constantly evolving their tactics. Traditional signature-based identification methods may not be sufficient to detect advanced threats. Anomaly detection, a key feature of modern IDS and IPS solutions, can help identify unusual patterns and potential threats. IDS and IPS solutions are not limited to protecting traditional IT infrastructure.
    Simultaneously, they also play a vital role in securing cloud computing environments. IDS and IPS as part of IDP (Intrusion Detection and Prevention) systems offer advanced threat detection and prevention capabilities, ensuring comprehensive protection against cyberattacks. Ransomware attacks have emerged as a major concern, with their disruptive impact on business operations. IDS and IPS solutions can help prevent ransomware attacks by identifying and blocking malicious traffic before it can cause damage. In conclusion, IDS and IPS solutions are essential components of an effective cybersecurity strategy. They help organizations protect their IT infrastructure, data security, and information security against various cyber threats, including phishing incidents, cyberattacks, and malicious threat actors. The market for IDS and IPS solutions is expected to grow as organizations continue to invest in advanced cybersecurity solutions to mitigate risks and maintain business continuity. 
    

    How is this market segmented and which is the largest segment?

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Deployment
    
      On-premises
      Cloud-based
    
    
    Geography
    
      North America
    
        US
    
    
      APAC
    
        China
        Japan
    
    
      Europe
    
        Germany
        UK
    
    
      Middle East and Africa
    
    
    
      South America
    

    By Deployment Insights

    The on-premises segment is estimated to witness significant growth during the forecast period.
    

    The on-premises segment is projected to dominate the market in the US, with substantial growth in terms of revenue. Large enterprises, particularly those with a global footprint, are the primary consumers of on-premises intrusion detection systems. The primary reason for this preference is the control it offers over managing software assets, including data generated and stored within business applications. This deployment model enables organizations to ensure compliance with licensing agreements and automate tasks, making it an attractive choice for many busine

  9. Z

    Network traffic datasets created by Single Flow Time Series Analysis

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josef Koumar (2024). Network traffic datasets created by Single Flow Time Series Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8035723
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Karel Hynek
    Josef Koumar
    Tomáš Čejka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network traffic datasets created by Single Flow Time Series Analysis

    Datasets were created for the paper: Network Traffic Classification based on Single Flow Time Series Analysis -- Josef Koumar, Karel Hynek, Tomáš Čejka -- which was published at The 19th International Conference on Network and Service Management (CNSM) 2023. Please cite usage of our datasets as:

    J. Koumar, K. Hynek and T. Čejka, "Network Traffic Classification Based on Single Flow Time Series Analysis," 2023 19th International Conference on Network and Service Management (CNSM), Niagara Falls, ON, Canada, 2023, pp. 1-7, doi: 10.23919/CNSM59352.2023.10327876.

    This Zenodo repository contains 23 datasets created from 15 well-known published datasets which are cited in the table below. Each dataset contains 69 features created by Time Series Analysis of Single Flow Time Series. The detailed description of features from datasets is in the file: feature_description.pdf

    In the following table is a description of each dataset file:

    File name Detection problem Citation of original raw dataset

    botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

    botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

    cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

    cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

    dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.

    doh_cic.csv Binary detection of DoH

    Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

    doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022

    dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.

    edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

    edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

    https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020

    ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

    ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

    ids_unsw_nb_15_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

    ids_unsw_nb_15_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

    iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23

    ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

    ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

    tor_binary.csv Binary detection of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.

    tor_multiclass.csv Multi-class classification of TOR Arash Habibi Lashkari et al. Characterization of Tor Traffic using Time based Features. In ICISSP 2017, pages 253–262. SciTePress, 2017.

    vpn_iscx_binary.csv Binary detection of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.

    vpn_iscx_multiclass.csv Multi-class classification of VPN Gerard Draper-Gil et al. Characterization of Encrypted and VPN Traffic Using Time-related. In ICISSP, pages 407–414, 2016.

    vpn_vnat_binary.csv Binary detection of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

    vpn_vnat_multiclass.csv Multi-class classification of VPN Steven Jorgensen et al. Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification. CoRR, abs/2205.05628, 2022

  10. f

    Confusion matrix.

    • plos.figshare.com
    xls
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Cui; Xiao Liao; Yang Yang; Shiying Feng; Mingyan Song (2025). Confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0322329.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Wei Cui; Xiao Liao; Yang Yang; Shiying Feng; Mingyan Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the rapid development of smart grids, power grid systems are becoming increasingly complex, posing significant challenges to their security. Traditional network intrusion detection systems often rely on manually engineered features, which are not only resource-intensive but also struggle to handle the diverse range of attack types. This paper aims to address these challenges by proposing an automated DDoS attack detection algorithm using the Informer model. We introduce a windowing technique to segment network traffic into manageable samples, which are then input into the Informer for feature extraction and classification. This model captures both the temporal dependencies and global attention information in the traffic data. Experimental results on the CICIDS-2018 dataset demonstrate the effectiveness of our approach, showing significant improvements in detection accuracy and efficiency. Our findings suggest that the proposed method offers a promising solution for real-time intrusion detection in complex power grid environments.

  11. VHS-22

    • kaggle.com
    Updated Apr 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H2020 SIMARGL (2022). VHS-22 [Dataset]. https://www.kaggle.com/datasets/h2020simargl/vhs-22-network-traffic-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    H2020 SIMARGL
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    VHS-22 is a heterogeneous, flow-level dataset which combines ISOT, CICIDS-17, Booters and CTU-13 datasets, as well as traffic from Malware Traffic Analysis (MTA) site, to increase variety of malicious and legitimate traffic flows. It contains 27.7 million flows (20.3 million legitimate and 7.4 million of attacks). The flows are represented in the form of 45 features; apart from classical NetFlow features, VHS-22 contains statistical parameters and network-level features. Their detailed description and the results of initial detection experiments are presented in the paper:

    Paweł Szumełda, Natan Orzechowski, Mariusz Rawski, and Artur Janicki. 2022. VHS-22 – A Very Heterogeneous Set of Network Traffic Data for Threat Detection. In Proc. European Interdisciplinary Cybersecurity Conference (EICC 2022), June 15–16, 2022, Barcelona, Spain. ACM, New York, NY, USA, https://doi.org/10.1145/3528580.3532843

    Every day contains different attacks mixed with legitimate traffic. 01-01-2022 Botnet attacks from ISOT dataset. 02-01-2022 Various attacks from MTA dataset. 03-01-2022 Web attacks from CICIDS-17 dataset. 04-01-2022 Bruteforce attacks from CICIDS-17 dataset. 05-01-2022 Botnet attacks from CICIDS-17 dataset. 06-01-2022 DDoS attacks from CICIDS-17 dataset 07-01-2022 to 11-01-2022 DDoS attacks from Booters dataset. 12-01-2022 to 23-01-2022 Botnet traffic from CTU-13 dataset.

    The VHS-22 dataset consists of labeled network flows and all data is publicly available for researchers in .csv format. When using VHS-22, please cite our paper which describes the VHS-22 dataset in detail, as well as the publications describing the source datasets:

    Paweł Szumełda, Natan Orzechowski, Mariusz Rawski, and Artur Janicki. 2022. VHS-22 – A Very Heterogeneous Set of Network Traffic Data for Threat Detection. In Proc. European Interdisciplinary Cybersecurity Conference (EICC 2022), June 15–16, 2022, Barcelona, Spain. ACM, New York, NY, USA, https://doi.org/10.1145/3528580.3532843

    Sherif Saad, Issa Traore, Ali Ghorbani, Bassam Sayed, David Zhao, Wei Lu, John Felix, and Payman Hakimian. 2011. Detecting P2P botnets through network behavior analysis and machine learning. In Proc. International Conference on Privacy, Security and Trust. IEEE, Montreal, Canada, 174–1

    Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. 2018. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization, In Proc. 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), Funchal, Portugal

    José Jair Santanna, Romain Durban, Anna Sperotto, and Aiko Pras. 2015. Inside booters: An analysis on operational databases. In Proc. International Symposium on Integrated Network Management (INM 2015). IFIP/IEEE, Ottawa, Canada, 432–440. https://doi.org/10.1109/INM.2015.71403

    Riaz Khan, Xiaosong Zhang, Rajesh Kumar, Abubakar Sharif, Noorbakhsh Amiri Golilarz, and Mamoun Alazab. 2019. An Adaptive Multi-Layer Botnet Detection Technique Using Machine Learning Classifiers. Applied Sciences 9 (06 2019), 2375. https://doi.org/10.3390/app91123

    The Malware Traffic Analysis data originate from https://www.malware-traffic-analysis.net, authored by Brad.

    The work has been funded by the SIMARGL Project -- Secure Intelligent Methods for Advanced RecoGnition of malware and stegomalware, with the support of the European Commission and the Horizon 2020 Program, under Grant Agreement No. 833042.

  12. f

    Algorithm comparison.

    • plos.figshare.com
    xls
    Updated May 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Cui; Xiao Liao; Yang Yang; Shiying Feng; Mingyan Song (2025). Algorithm comparison. [Dataset]. http://doi.org/10.1371/journal.pone.0322329.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Wei Cui; Xiao Liao; Yang Yang; Shiying Feng; Mingyan Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the rapid development of smart grids, power grid systems are becoming increasingly complex, posing significant challenges to their security. Traditional network intrusion detection systems often rely on manually engineered features, which are not only resource-intensive but also struggle to handle the diverse range of attack types. This paper aims to address these challenges by proposing an automated DDoS attack detection algorithm using the Informer model. We introduce a windowing technique to segment network traffic into manageable samples, which are then input into the Informer for feature extraction and classification. This model captures both the temporal dependencies and global attention information in the traffic data. Experimental results on the CICIDS-2018 dataset demonstrate the effectiveness of our approach, showing significant improvements in detection accuracy and efficiency. Our findings suggest that the proposed method offers a promising solution for real-time intrusion detection in complex power grid environments.

  13. Z

    Network traffic datasets with novel extended IP flow called NetTiSA flow

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomáš Čejka (2024). Network traffic datasets with novel extended IP flow called NetTiSA flow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8301042
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    Jaroslav Pešek
    Karel Hynek
    Josef Koumar
    Tomáš Čejka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network traffic datasets with novel extended IP flow called NetTiSA flow

    Datasets were created for the paper: NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification -- Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka -- which is published in The International Journal of Computer and Telecommunications Networking https://doi.org/10.1016/j.comnet.2023.110147Please cite the usage of our datasets as:

    Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka, "NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification", Computer Networks, Volume 240, 2024, 110147, ISSN 1389-1286

    @article{KOUMAR2024110147, title = {NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification}, journal = {Computer Networks}, volume = {240}, pages = {110147}, year = {2024}, issn = {1389-1286}, doi = {https://doi.org/10.1016/j.comnet.2023.110147}, url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923}, author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka} }

    This Zenodo repository contains 23 datasets created from 15 well-known published datasets, which are cited in the table below. Each dataset contains the NetTiSA flow feature vector.

    NetTiSA flow feature vector

    The novel extended IP flow called NetTiSA (Network Time Series Analysed) flow contains a universal bandwidth-constrained feature vector consisting of 20 features. We divide the NetTiSA flow classification features into three groups by computation. The first group of features is based on classical bidirectional flow information---a number of transferred bytes, and packets. The second group contains statistical and time-based features calculated using the time-series analysis of the packet sequences. The third type of features can be computed from the previous groups (i.e., on the flow collector) and improve the classification performance without any impact on the telemetry bandwidth.

    Flow features

    The flow features are:

    Packets is the number of packets in the direction from the source to the destination IP address.

    Packets in reverse order is the number of packets in the direction from the destination to the source IP address.

    Bytes is the size of the payload in bytes transferred in the direction from the source to the destination IP address.

    Bytes in reverse order is the size of the payload in bytes transferred in the direction from the destination to the source IP address.

    Statistical and Time-based features

    The features that are exported in the extended part of the flow. All of them can be computed (exactly or in approximative) by stream-wise computation, which is necessary for keeping memory requirements low. The second type of feature set contains the following features:

    Mean represents mean of the payload lengths of packets

    Min is the minimal value from payload lengths of all packets in a flow

    Max is the maximum value from payload lengths of all packets in a flow

    Standard deviation is a measure of the variation of payload lengths from the mean payload length

    Root mean square is the measure of the magnitude of payload lengths of packets

    Average dispersion is the average absolute difference between each payload length of the packet and the mean value

    Kurtosis is the measure describing the extent to which the tails of a distribution differ from the tails of a normal distribution

    Mean of relative times is the mean of the relative times which is a sequence defined as (st = {t_1 - t_1, t_2 - t_1, ..., t_n - t_1} )

    Mean of time differences is the mean of the time differences which is a sequence defined as (dt = { t_j - t_i | j = i + 1, i \in {1, 2, \dots, n - 1} }.)

    Min from time differences is the minimal value from all time differences, i.e., min space between packets.

    Max from time differences is the maximum value from all time differences, i.e., max space between packets.

    Time distribution describes the deviation of time differences between individual packets within the time series. The feature is computed by the following equation:(tdist = \frac{ \frac{1}{n-1} \sum_{i=1}^{n-1} \left| \mu_{{dt_{n-1}}} - dt_i \right| }{ \frac{1}{2} \left(max\left({dt_{n-1}}\right) - min\left({dt_{n-1}}\right) \right) })

    Switching ratio represents a value change ratio (switching) between payload lengths. The switching ratio is computed by equation:(sr = \frac{s_n}{\frac{1}{2} (n - 1)})

        where \(s_n\) is number of switches.
    

    Features computed at the collectorThe third set contains features that are computed from the previous two groups prior to classification. Therefore, they do not influence the network telemetry size and their computation does not put additional load to resource-constrained flow monitoring probes. The NetTiSA flow combined with this feature set is called the Enhanced NetTiSA flow and contains the following features:

    Max minus min is the difference between minimum and maximum payload lengths

    Percent deviation is the dispersion of the average absolute difference to the mean value

    Variance is the spread measure of the data from its mean

    Burstiness is the degree of peakedness in the central part of the distribution

    Coefficient of variation is a dimensionless quantity that compares the dispersion of a time series to its mean value and is often used to compare the variability of different time series that have different units of measurement

    Directions describe a percentage ratio of packet direction computed as (\frac{d_1}{ d_1 + d_0}), where (d_1) is a number of packets in a direction from source to destination IP address and (d_0) the opposite direction. Both (d_1) and (d_0) are inside the classical bidirectional flow.

    Duration is the duration of the flow

    The NetTiSA flow is implemented into IP flow exporter ipfixprobe.

    Description of dataset files

    In the following table is a description of each dataset file:

    File name

    Detection problem

    Citation of the original raw dataset

    botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

    botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

    cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

    cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

    dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.

    doh_cic.csv Binary detection of DoH Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

    doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022

    dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.

    edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

    edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

    https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020

    ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

    ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

    unsw_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

    unsw_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

    iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23

    ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

    ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets.

  14. Z

    Trace-Share Dataset for Evaluation of Statistical Characteristics...

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cermak, Milan (2020). Trace-Share Dataset for Evaluation of Statistical Characteristics Preservation [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3553062
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Cermak, Milan
    Madeja, Tomas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains all data used during the evaluation of statistical characteristics preservation. Archives are protected by password "trace-share" to avoid false detection by antivirus software.

    For more information, see the project repository at https://github.com/Trace-Share.

    Selected Attack Traces

    We selected 72 different traces of network attacks obtained from various internet databases. File names refer to common names of contained vulnerabilities, malware, or attack tools.

    Background Traffic Data

    Publicly available dataset CSE-CIC-IDS-2018 was used as a background traffic data. The evaluation uses data from the day Thursday-01-03-2018 containing a sufficient proportion of regular traffic without any statistically significant attacks. Only traffic aimed at victim machines (range 172.31.69.0/24) is used to reduce less significant traffic.

    Evaluation Results and Dataset Structure

    Traces variants (traces-normalized.zip, traces-adjusted.zip)

    ./traces-normalized/ — normalized PCAP files and details in YAML format;

    ./traces-adjusted/ — configuration files for traces combination in YAML format.

    Computed statistics (statistics.zip)

    ./statistics-background/ — background traffic statistics computed by ID2T;

    ./statistics-combination/ — combined traces statistics computed by ID2T for all adjust options (selected only combinations where ID2T provided all statistics files);

    ./statistics-difference/ — computed mean and median differences of background and combined traffic traces.

    Evaluation results

    statistics-difference.ipynb — file containing visualization of statistics differences.

  15. Z

    Trace-Share Dataset for Evaluation of Trace Meaning Preservation

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madeja, Tomas (2020). Trace-Share Dataset for Evaluation of Trace Meaning Preservation [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3547527
    Explore at:
    Dataset updated
    May 7, 2020
    Dataset provided by
    Cermak, Milan
    Madeja, Tomas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains all data used during the evaluation of trace meaning preservation. Archives are protected by password "trace-share" to avoid false detection by antivirus software.

    For more information, see the project repository at https://github.com/Trace-Share.

    Selected Attack Traces

    The following list contains trace datasets used for evaluation. Each attack was chosen to have not only a different meaning but also different statistical properties.

    dos_http_flood — the capture of GET and POST requests sent to one server by one attacker (HTTP~traffic);

    ftp_bruteforce — short and unsuccessful attempt to guess a user’s password for FTP service (FTP traffic);

    ponyloader_botnet — Pony Loader botnet used for stealing of credentials from 3 target devices reporting to single IP with a large number of intermediate addresses (DNS and HTTP traffic);

    scan — the capture of nmap tool that scans given subnet using ICMP echo and TCP SYN requests (consist of ARP, ICMP, and TCP traffic);

    wannacry_ransomware — the capture of Wanacry ransomware that spreads in a domain with three workstations, a domain controller, and a file-sharing server (SMB and SMBv2 traffic).

    Background Traffic Data

    Publicly available dataset CSE-CIC-IDS-2018 was used as a background traffic data. The evaluation uses data from the day Thursday-01-03-2018 containing a sufficient proportion of regular traffic without any statistically significant attacks. Only traffic aimed at victim machines (range 172.31.69.0/24) is used to reduce less significant traffic.

    Evaluation Results and Dataset Structure

    Traces variants (traces.zip)

    ./traces-original/ — trace PCAP files and crawled details in YAML format;

    ./traces-normalized — normalized PCAP files and details in YAML format;

    ./traces-adjusted — adjusted PCAP files using various timestamp generation settings, combination configuration in YAML format, and lables provided by ID2T in XML format.

    Extracted alerts (alerts.zip)

    ./alerts-original/ — extracted Suricata alerts, Suricata log, and full Suricata output for all original trace files;

    ./alerts-normalized/ — extracted Suricata alerts, Suricata log, and full Suricata output for all normalized trace files;

    ./alerts-adjusted/ — extracted Suricata alerts, Suricata log, and full Suricata output for all adjusted trace files.

    Evaluation results

    *.csv files in the root directory — data contains extracted alert signatures and their count per each trace variant.

  16. f

    DDoS Detection

    • figshare.com
    zip
    Updated Feb 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Cui (2025). DDoS Detection [Dataset]. http://doi.org/10.6084/m9.figshare.28428494.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 17, 2025
    Dataset provided by
    figshare
    Authors
    Wei Cui
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the development of smart grids, power grid systems are gradually becoming morecomplex. This poses new challenges for ensuring the security of power grid systems.The prevailing approach to network intrusion detection relies heavily on manuallyengineered features, often requiring rigorous expertise and struggling to accommodate adiverse array of attack types. In response to this challenge, we employed a windowingtechnique to segment network traffic data into manageable samples. These samples aresubsequently input into the Informer network for feature extraction and classification,facilitating intrusion detection. Our proposed algorithm simultaneously considers boththe temporal information of sessions and overall attention information, autonomouslylearning features from traffic data. Experimental evaluations using CICIDS-2018network traffic data demonstrate the algorithm’s effectiveness in DDoS attack detection,yielding promising results.

  17. f

    Data types in CICIDS2018 dataset.

    • plos.figshare.com
    xls
    Updated Aug 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Congyuan Xu; Jun Yang; Panpan Li (2025). Data types in CICIDS2018 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0331065.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Congyuan Xu; Jun Yang; Panpan Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The widespread deployment of Internet of Things (IoT) devices has made them prime targets for cyberattacks. Existing intrusion detection systems (IDSs) heavily rely on large-scale labeled datasets, which limits their effectiveness in detecting novel attacks under few-shot scenarios. To address this challenge, we propose a meta-learning-based intrusion detection method called MACML (Marrying Attention and Convolution-based Meta-Learning). It integrates a self-attention mechanism to capture global dependencies and a convolutional neural network to extract local features, thereby enhancing the model’s overall perception of traffic characteristics. MACML adopts an optimization-based meta-learning framework that enables rapid adaptation to new tasks using only a small number of training samples, improving detection performance and generalization capability. We evaluate MACML on the CICIDS2018 and CICIoT2023 datasets. Experimental results show that, with only 10 training samples, MACML achieves an average accuracy of 98.75% and a detection rate of 99.17% on the CICIDS2018 dataset. On the CICIoT2023 dataset, it reaches 94.47% accuracy and a 95.32% detection rate, outperforming existing state-of-the-art methods.

  18. Parkes pulsar observations with undefined project IDs for 2018_BPSR_14

    • researchdata.edu.au
    • data.csiro.au
    datadownload
    Updated Nov 10, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSIRO (2018). Parkes pulsar observations with undefined project IDs for 2018_BPSR_14 [Dataset]. http://doi.org/10.25919/5BE54E9A3F87F
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Nov 10, 2018
    Dataset authored and provided by
    CSIROhttp://www.csiro.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2018 - Dec 31, 2018
    Description

    Parkes pulsar observations with undefined project IDs for 2018.

    This collection contains observations from project P955.

  19. Perimeter Intrusion Detection Systems Market Analysis North America, Europe,...

    • technavio.com
    pdf
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Perimeter Intrusion Detection Systems Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, Germany, China, UK, India - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/perimeter-intrusion-detection-systems-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2024 - 2028
    Area covered
    United States
    Description

    Snapshot img

    Perimeter Intrusion Detection Systems Market Size 2024-2028

    The perimeter intrusion detection systems market size is forecast to increase by USD 6.42 billion, at a CAGR of 10.03% between 2023 and 2028.

    Perimeter intrusion detection systems (PIDS) are essential for securing critical infrastructure against unauthorized access and potential threats, particularly in sectors such as oil refineries and banking and financial institutions. The market for these systems is driven by the need to mitigate criminal activities and prevent terrorist attacks. Technological advancements, including signal processing, artificial intelligence, machine learning, data analytics, computing technologies, video analytics, and visual alarm verification, are key trends in the market. These technologies enhance the system's ability to detect and respond to intrusions effectively. Additionally, the increasing demand for real-time threat detection and response, as well as the need for surveillance data security, further boosts the market's growth. Explosions and other catastrophic events at oil refineries underscore the importance of reliable and effective perimeter security systems.
    

    What will be the Size of the Market During the Forecast Period?

    Request Free Sample

    The market is witnessing significant growth due to the increasing need for advanced security solutions. These systems play a crucial role in safeguarding critical infrastructure from potential threats such as terrorism, criminal activities, burglaries, thefts, explosions, and other unauthorized intrusions. PIDS utilize various technologies, including sensors, video surveillance systems, and radar, to detect and alert security personnel of any unauthorized access or potential threats. Sensors are the backbone of these systems, with microwave sensors, infrared sensors, fiber optic sensors, and radar sensors being commonly used. Microwave sensors operate by detecting the reflection of microwave energy off intruders or objects, while infrared sensors detect heat signatures. Fiber optic sensors can detect vibrations, strain, temperature changes, and other physical disturbances. Radar sensors use electromagnetic waves to detect objects and movements within their range. Video surveillance systems are another essential component of PIDS. Cameras monitor the perimeter, and video management software processes and stores the footage. Hardware, such as servers and storage devices, ensure the efficient processing and retention of data.
    Professional services and managed services further enhance the functionality of these systems. The US oil refineries, power plants, and other critical infrastructure facilities are significant users of PIDS. These systems provide early warning of potential threats, enabling security personnel to respond promptly and effectively. The integration of PIDS with other security systems, such as Access Control Systems (ACS) and Automatic Identification Systems (AIS), further enhances the overall security of these facilities. The US market for Perimeter Intrusion Detection Systems is expected to experience steady growth due to the increasing focus on infrastructure security. The integration of advanced technologies, such as AI and machine learning, is also expected to drive market growth. As the threat landscape continues to evolve, the demand for strong and reliable PIDS solutions will continue to increase.
    

    How is this market segmented and which is the largest segment?

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Component
    
      Solutions
      Services
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        India
    
    
      Middle East and Africa
    
    
    
      South America
    

    By Component Insights

    The solutions segment is estimated to witness significant growth during the forecast period.
    

    Perimeter intrusion detection systems are essential security solutions used in various industries to identify and prevent unauthorized access or suspicious activities. These systems employ advanced technologies such as video analytics, signal processing, artificial intelligence, and machine learning for enhanced security. In industries like oil refineries, where the risk of explosions is high, perimeter intrusion detection systems play a crucial role in safeguarding assets and personnel. Video analytics is a significant component of these systems, utilizing data analytics and computing technologies to analyze video footage for motion detection and other potential threats. Machine learning algorithms are employed to improve the system's accuracy and reduce false alarms.

    Visual alarm verification is another feature that helps verify alarms before alerting security personnel, mini

  20. H

    2018 U.S. Congressional Election Tweet Ids

    • dataverse.harvard.edu
    Updated Feb 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Wrubel; Justin Littman; Dan Kerchner (2019). 2018 U.S. Congressional Election Tweet Ids [Dataset]. http://doi.org/10.7910/DVN/AEZPLU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Laura Wrubel; Justin Littman; Dan Kerchner
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    This dataset contains the tweet ids of 171,248,476 tweets related to the 2018 U.S. Congressional Election. They were collected between January 22, 2018 and January 3, 2019 from the Twitter API using Social Feed Manager. See each collection's README for dates of collection, accounts, and hashtags used in queries. These tweet ids are broken up into 5 collections. Each collection was collected either from the GET statuses/user_timeline method of the Twitter REST API (retrieved on a weekly schedule) or the POST statuses/filter method of the Twitter Stream API. The collections are: Senate candidates (Twitter user timeline): senate_accounts.txt House candidates (Twitter user timeline): house_accounts.txt Election filter (Twitter filter): election-filter-[1-3].txt Partisan Democratic filter (Twitter filter): partisan-dem-[1-4].txt Partisan Republican filter (Twitter filter): partisan-rep-[1-11].txt There is a README.txt file for each collection containing additional documentation on how it was collected. There is also an accounts.csv file for those collections collected from the GET statuses/user_timeline method, listing the Twitter accounts that were collected. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Questions about this dataset can be sent to sfm@gwu.edu. George Washington University researchers should contact us for access to the tweets.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
StrGenIx | Laurens D'hooge (2022). CSE-CIC-IDS2018 [Dataset]. http://doi.org/10.34740/kaggle/dsv/4059899
Organization logo

CSE-CIC-IDS2018

Follow-up to CIC-IDS2017, network intrusion detection, CIC @UNB Fredericton

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
StrGenIx | Laurens D'hooge
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This is an academic intrusion detection dataset. All the credit goes to the original authors: Dr. Iman Sharafaldin, Dr. Arash Habibi Lashkari Dr. Ali Ghorbani. Please cite their original paper.

It was published by the Canadian Institute for Cybersecurity and is the successor to CIC-IDS2017. The biggest difference is the move away from on-premise infrastructure to AWS to generate the dataset. It also vastly increased the representation of 'Infiltration' traffic compared to CIC-IDS2017.

V1: Base dataset in CSV format as downloaded from here V2: Cleaning -> parquet files V3: Reorganize to save storage, only keep original CSVs in V1/V2

In the parquet files all data types are already set correctly, there are 0 records with missing information and 0 duplicate records in this clean version. Baseline classification scores with simple models will be available shorty.

Search
Clear search
Close search
Google apps
Main menu