82 datasets found
  1. Mexico: registered number of malware attacks 2019-2020

    • statista.com
    Updated Jul 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2020). Mexico: registered number of malware attacks 2019-2020 [Dataset]. https://www.statista.com/statistics/1179173/number-registered-malware-attacks-mexico/
    Explore at:
    Dataset updated
    Jul 15, 2020
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Mexico
    Description

    In the first half of 2020, over *** million malware attacks were registered in Mexico. This number represents however a decrease of about three percent in comparison to the same period the previous year, when approximately **** million attacks were registered. In 2020, Mexico was one of the countries with most cyber attacks in Latin America.

  2. Development of malware worldwide 2015-2020

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Development of malware worldwide 2015-2020 [Dataset]. https://www.statista.com/statistics/680953/global-malware-volume/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    As of March 2020, the total number of new malware detections worldwide amounted to ****** million programs, up from *** million new malware detections at the end of January 2020. According to AV-TEST, the cumulative number of new malware samples is projected to surpass *** million within 2020.

  3. Development of Android malware worldwide 2016-2020

    • statista.com
    Updated Aug 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2020). Development of Android malware worldwide 2016-2020 [Dataset]. https://www.statista.com/statistics/680705/global-android-malware-volume/
    Explore at:
    Dataset updated
    Aug 15, 2020
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2013 - Mar 2020
    Area covered
    Worldwide
    Description

    As of March 2020, the total number of new Android malware samples amounted to 482,579 per month. According to AV-Test, trojans were the most common type of malware affecting Android devices.

  4. Data from: Malware Finances and Operations: a Data-Driven Study of the Value...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nurmi, Juha; Niemelä, Mikko; Brumley, Billy (2023). Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8047204
    Explore at:
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    Cyber Intelligence Househttps://cyberintelligencehouse.com/
    Tampere University
    Authors
    Nurmi, Juha; Niemelä, Mikko; Brumley, Billy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.

    Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.

    We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.

    1. MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.

    2. VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.

    3. AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.

    Credits Authors

    Billy Bob Brumley (Tampere University, Tampere, Finland)

    Juha Nurmi (Tampere University, Tampere, Finland)

    Mikko Niemelä (Cyber Intelligence House, Singapore)

    Funding

    This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).

    Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.

  5. Healthcare Ransomware Dataset

    • kaggle.com
    zip
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    River | Datasets for SQL Practice (2025). Healthcare Ransomware Dataset [Dataset]. https://www.kaggle.com/datasets/rivalytics/healthcare-ransomware-dataset
    Explore at:
    zip(221852 bytes)Available download formats
    Dataset updated
    Feb 21, 2025
    Authors
    River | Datasets for SQL Practice
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    📌 Context of the Dataset

    The Healthcare Ransomware Dataset was created to simulate real-world cyberattacks in the healthcare industry. Hospitals, clinics, and research labs have become prime targets for ransomware due to their reliance on real-time patient data and legacy IT infrastructure. This dataset provides insight into attack patterns, recovery times, and cybersecurity practices across different healthcare organizations.

    Why is this important?

    Ransomware attacks on healthcare organizations can shut down entire hospitals, delay treatments, and put lives at risk. Understanding how different healthcare organizations respond to attacks can help develop better security strategies. The dataset allows cybersecurity analysts, data scientists, and researchers to study patterns in ransomware incidents and explore predictive modeling for risk mitigation.

    📌 Sources and Research Inspiration This simulated dataset was inspired by real-world cybersecurity reports and built using insights from official sources, including:

    1️⃣ IBM Cost of a Data Breach Report (2024)

    The healthcare sector had the highest average cost of data breaches ($10.93 million per incident). On average, organizations recovered only 64.8% of their data after paying ransom. Healthcare breaches took 277 days on average to detect and contain.

    2️⃣ Sophos State of Ransomware in Healthcare (2024)

    67% of healthcare organizations were hit by ransomware in 2024, an increase from 60% in 2023. 66% of backup compromise attempts succeeded, making data recovery significantly more difficult. The most common attack vectors included exploited vulnerabilities (34%) and compromised credentials (34%).

    3️⃣ Health & Human Services (HHS) Cybersecurity Reports

    Ransomware incidents in healthcare have doubled since 2016. Organizations that fail to monitor threats frequently experience higher infection rates.

    4️⃣ Cybersecurity & Infrastructure Security Agency (CISA) Alerts

    Identified phishing, unpatched software, and exposed RDP ports as top ransomware entry points. Only 13% of healthcare organizations monitor cyber threats more than once per day, increasing the risk of undetected attacks.

    5️⃣ Emsisoft 2020 Report on Ransomware in Healthcare

    The number of ransomware attacks in healthcare increased by 278% between 2018 and 2023. 560 healthcare facilities were affected in a single year, disrupting patient care and emergency services.

    📌 Why is This a Simulated Dataset?

    This dataset does not contain real patient data or actual ransomware cases. Instead, it was built using probabilistic modeling and structured randomness based on industry benchmarks and cybersecurity reports.

    How It Was Created:

    1️⃣ Defining the Dataset Structure

    The dataset was designed to simulate realistic attack patterns in healthcare, using actual ransomware case studies as inspiration.

    Columns were selected based on what real-world cybersecurity teams track, such as: Attack methods (phishing, RDP exploits, credential theft). Infection rates, recovery time, and backup compromise rates. Organization type (hospitals, clinics, research labs) and monitoring frequency.

    2️⃣ Generating Realistic Data Using ChatGPT & Python

    ChatGPT assisted in defining relationships between attack factors, ensuring that key cybersecurity concepts were accurately reflected. Python’s NumPy and Pandas libraries were used to introduce randomized attack simulations based on real-world statistics. Data was validated against industry research to ensure it aligns with actual ransomware attack trends.

    3️⃣ Ensuring Logical Relationships Between Data Points

    Hospitals take longer to recover due to larger infrastructure and compliance requirements. Organizations that track more cyber threats recover faster because they detect attacks earlier. Backup security significantly impacts recovery time, reflecting the real-world risk of backup encryption attacks.

  6. Android malware detection and classification

    • kaggle.com
    zip
    Updated Feb 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meghna Dhalaria, Ekta Gandotra (JUIT,Waknaghat,HP) (2025). Android malware detection and classification [Dataset]. https://www.kaggle.com/datasets/meghnadhalaria/android-malware-detection-and-classification/data
    Explore at:
    zip(447035 bytes)Available download formats
    Dataset updated
    Feb 19, 2025
    Authors
    Meghna Dhalaria, Ekta Gandotra (JUIT,Waknaghat,HP)
    Description

    Context The growing use of mobile devices has made them a popular target for malicious applications because they store sensitive personal information, including messages, photos, and other private data. To exploit this, modern malware is designed using advanced techniques that make it difficult to detect and analyze. As a result, the rapid increase in new malware variants each day has outpaced the effectiveness of traditional detection methods. Consequently, as cyber threats continue to evolve in complexity, there is an urgent need for advanced security measures to protect user privacy and safeguard sensitive information.

    Content Two datasets are created:

    Binary Classification (Dataset-1) Dataset-1 contains 352 static and 323 dynamic features extracted from 1800 benign and 1747 malicious apps.

    Multi-class Classification (Dataset-2) Dataset-2 contains 352 static and 323 dynamic features extracted from 1747 malicious apps with 13 malicious families.

    AndroMD Dataset This dataset is the combination of both static and dynamic features as mentioned in Binary classification (Dataset-1)

    These datasets are valuable for researchers and experts in Android malware detection and for conducting experimental studies.

    Acknowledgements

    Meghna Dhalaria, Ekta Gandotra (2021). "A Hybrid Approach for Android Malware Detection and Family Classification", International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, issue Regular Issue, no. 6, pp. 174-188. https://doi.org/10.9781/ijimai.2020.09.001

    Meghna Dhalaria, and Ekta Gandotra, “MalDetect: A Classifier Fusion Approach for Detection of Android Malware,” Expert Systems with Applications, vol. 235, pp. 121155, 2023. https://doi.org/10.1016/j.eswa.2023.121155

    Meghna Dhalaria, and Ekta Gandotra, “Binary and Multi-class Classification of Android Applications using Static Features.” International Journal of Applied Management Science. vol. 15, no. 2, pp. 117-140, 2023. https://doi.org/10.1504/IJAMS.2023.131670

    Meghna Dhalaria, and Ekta Gandotra, "CSForest: an approach for imbalanced family classification of android malicious applications." International Journal of Information Technology, vol. 13 no. 3, pp. 1-13, 2021. https://doi.org/10.1007/s41870-021-00661-7

    Meghna Dhalaria, and Ekta Gandotra, “A Framework for Detection of Android Malware using Static Features,” In 2020 IEEE 17th India Council International Conference (INDICON), pp. 1-7, IEEE, 2020. https://doi.org/10.1109/INDICON49873.2020.9342511

    Meghna Dhalaria, and Ekta Gandotra, “Android Malware Detection using Chi-Square Feature Selection and Ensemble Learning Method,” In 2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 36-41, IEEE, 2020. https://doi.org/10.1109/PDGC50313.2020.9315818

  7. [CIC-AndMal-2020] Static-Dynamic Malware analysis

    • kaggle.com
    zip
    Updated Dec 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Zorzetto (2021). [CIC-AndMal-2020] Static-Dynamic Malware analysis [Dataset]. https://www.kaggle.com/datasets/albertozorzetto/cic-andmal-2020-dynamic-static-analysis
    Explore at:
    zip(59107376 bytes)Available download formats
    Dataset updated
    Dec 27, 2021
    Authors
    Alberto Zorzetto
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction

    This dataset contains 200K android malware apps which are labeled and characterized into corresponding family. Benign android apps (200K) are collected from Androzoo dataset to balance the huge dataset. We collected 14 malware categories including adware, backdoor, file infector, no category, Potentially Unwanted Apps (PUA), ransomware, riskware, scareware, trojan, trojan-banker, trojan-dropper, trojan-sms,**trojan-spy** and zero-day.

    A complete taxonomy of all the malware families of captured malware apps is created by dividing them into 8 categories such as sensitive data collection, media, hardware, actions/activities, internet connection, C&C, antivirus and storage & settings.

    Dataset details

    CategoryNumber of familiesNumber of samples
    Adware4847,210
    Backdoor111,538
    File Infector5669
    No Category-2,296
    PUA82,051
    Ransomware86,202
    Riskware2197,349
    Scareware31,556
    Trojan4513,559
    Trojan-Banker11887
    Trojan-Dropper92,302
    Trojan-SMS113,125
    Trojan-Spy113,540
    Zero-day-13,340

    Static analysis

    AndroidManifest.xml contains a lot of features that can be used for static analysis. The main extracted features include:

    • Activities: An android activity is one screen of the android app's user interface
    • Broadcast receivers and providers
    • Metadata: It is basically an additional option to store information that can be accessed through the entire project
    • The permissions requested by application: It protects the privacy of the user and is needed to access sensitive user data (such as contacts and SMS)
    • System features (such as camera and internet)

    Static Features

    FeatureValues
    Package Name"com.fb.iwidget"
    Activities"com.fb.iwidget.OverlayActivity"
    "org.acra.CrashReportDialog"
    "com.batch.android.BatchActionActivity"
    "com.fb.iwidget.MainActivity"
    "com.fb.iwidget.PreferencesActivity"
    "com.fb.iwidget.PickerActivity"
    "com.fb.iwidget.IntroActivity"
    Services"com.batch.android.BatchActionService"
    "com.fb.iwidget.MainService"
    "com.fb.iwidget.SnapAccessService"
    Receivers/Providers"com.fb.iwidget.ExpandWidgetProvider"
    "com.fb.iwidget.ActionReceiver"
    Intents Actions"android.accessibilityservice.AccessibilityService"
    "android.appwidget.action.APPWIDGET_UPDATE"
    "android.intent.action.BOOT_COMPLETED"
    "android.intent.action.CREATE_SHORTCUT"
    "android.intent.action.MAIN"
    "android.intent.action.MY_PACKAGE_REPLACED"
    "android.intent.action.USER_PRESENT"
    "android.intent.action.VIEW"
    "com.fb.iwidget.action.SHOULD_REVIVE"
    Intents Categories"android.intent.category.BROWSABLE"
    "android.intent.category.DEFAULT"
    "android.intent.category.LAUNCHER"
    Permissions"android.permission.ACCESS_NETWORK_STATE"
    "android.permission.CALL_PHONE"
    "android.permission.INTERNET"
    "android.permission.RECEIVE_BOOT_COMPLETED"
    "android.permission.SYSTEM_ALERT_WINDOW"
    "com.android.vending.BILLING"
    "android.permission.BIND_ACCESSIBILITY_SERVICE"
    Meta-Data"android.accessibilityservice"
    "android.appwidget.provider"
    #Icons331
    #Pictures0
    #Videos0
    Audio files0
    Videos0
    Size of the App4.2M

    Dynamic analysis

    For understanding the behavioral changes of these malware categories and families, six categories of features are extracted after executing the malware in an emulated environment. The main extracted features include:

    • Memory: Memory features define activities performed by malware by utilizing memory.
    • API: Application Programming Interface (API) features delineate the communication between two applications.
    • Network: Network features describe the data transmitted and received between other devices in the network. It indicates foreground and background * network usage.
    • Battery: Batt...
  8. Cybersecurity Incidents in India (2020–2024)

    • kaggle.com
    zip
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agile Yaswanth Sai Simha Reddy (2025). Cybersecurity Incidents in India (2020–2024) [Dataset]. https://www.kaggle.com/datasets/saisimha203/cybersecurity-cases-india
    Explore at:
    zip(11210 bytes)Available download formats
    Dataset updated
    Apr 22, 2025
    Authors
    Agile Yaswanth Sai Simha Reddy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    India
    Description

    The dataset "Cybersecurity Cases in India" is a comprehensive collection of real-world cybersecurity incidents reported across various cities in India. The dataset encapsulates the financial loss, incident types, and categories, providing a detailed overview of the cybercrime landscape in one of the world’s largest digital economies. With over 1000 records, it spans incidents from 2020 to 2024, covering various types of cybercrimes such as phishing, online fraud, malware attacks, ransomware, data breaches, DDoS attacks, identity theft, and more. Each record captures important attributes of the incidents, such as the year, date of occurrence, amount lost in INR, the type of incident, the city in which it occurred, and the category of the affected entity (e.g., financial, personal, corporate).

    The dataset is structured to enable analysis of the trends in cybercrime over time, the financial impact of various cyberattacks, and the geographic distribution of incidents across Indian cities. It serves as a critical resource for cybersecurity professionals, policymakers, law enforcement agencies, and academic researchers seeking to understand the challenges posed by cybercrime in India and to identify strategies to combat these challenges.

    1. Dataset Purpose and Scope

    The dataset’s primary purpose is to provide an extensive, granular view of the nature and scope of cybersecurity incidents in India. It enables the analysis of the frequency, severity, and financial impact of cybercrimes across different types of attacks, cities, and time periods. As cybercrimes continue to rise globally, including in India, this dataset serves as an important tool for understanding the evolving threats and risks in cyberspace. Cybersecurity experts and analysts can leverage this dataset to identify patterns and trends, while government and law enforcement agencies can use it to devise more targeted interventions and preventive measures.

    India, with its large and growing digital footprint, is a prime target for cybercriminals. The country's rapidly expanding internet user base, coupled with increasing digital adoption in various sectors like finance, healthcare, education, and e-commerce, makes it an attractive target for cyberattacks. This dataset allows stakeholders to understand how cybercrime evolves in response to these dynamics.

    The dataset is a rich resource for understanding the following:

    • Incident Frequency: How often different types of cybercrimes are reported in various Indian cities.
    • Financial Impact: The monetary losses associated with each type of cybercrime.
    • Geographic Distribution: The prevalence of specific types of cybercrimes in particular cities or states.
    • Trend Analysis: How cybercrime has evolved over the years in terms of volume and impact.

    2. Dataset Structure and Variables

    The dataset includes the following key variables, each contributing valuable information to the analysis:

    • Year: The year in which the cybercrime incident occurred. This variable helps track the growth or decline of cybercrime incidents over time.
    • Date: The specific date of the cybercrime incident. This allows for time-series analysis of the data.
    • Amount_Lost_INR: The financial loss associated with the cybercrime incident, expressed in Indian Rupees (INR). This variable highlights the economic impact of each cyberattack and can be used to assess the severity of different incidents.
    • Incident_Type: The type of cybercrime incident. This can include phishing, online fraud, malware attacks, ransomware, data breaches, DDoS attacks, and identity theft. This variable is crucial for understanding which types of cybercrimes are most prevalent and how they differ in their impact.
    • City: The city where the incident occurred. This allows for the geographic analysis of cybercrime, helping to identify high-risk areas and cities where certain types of cyberattacks are more common.
    • Category: The category of the entity affected by the cybercrime, such as financial institutions, government bodies, corporations, educational institutions, or individuals. This variable provides insights into which sectors are more vulnerable to specific types of cyberattacks.

    3. Cybersecurity Threat Landscape in India

    India's digital transformation has made it a prime target for cybercriminals. As of 2023, India is one of the largest internet markets in the world, with over 600 million active internet users. The rapid growth of e-commerce, digital banking, social media, and government services has created new opportunities for cybercriminals to exploit vulnerabilities in digital systems. According to a 2022 report by the Indian Computer Emergency Response Team (CERT-In), India witnessed a significant increase in cybersecurity incidents, with millions of cyberattacks targeting individuals, b...

  9. Share of organizations hit by ransomware attacks global 2020-2024

    • statista.com
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of organizations hit by ransomware attacks global 2020-2024 [Dataset]. https://www.statista.com/statistics/1614508/ransomware-global-rate/
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In 2024, ** percent of organizations worldwide claimed to have fallen victim to a ransomware attack in the previous year, according to a survey conducted among cybersecurity leaders of worldwide organizations. This is a decline compared to *** previous years, when ** percent of global organizations encountered ransomware attacks.

  10. m

    Android permissions dataset, Android Malware and benign Application Data set...

    • data.mendeley.com
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arvind Mahindru (2020). Android permissions dataset, Android Malware and benign Application Data set (consist of permissions and API calls) [Dataset]. http://doi.org/10.17632/b4mxg7ydb7.3
    Explore at:
    Dataset updated
    Mar 4, 2020
    Authors
    Arvind Mahindru
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of apps needed permissions during installation and run-time. We collect apps from three different sources google play, third-party apps and malware dataset. This file contains more than 5,00,000 Android apps. features extracted at the time of installation and execution. One file contains the name of the features and others contain .apk file corresponding to it extracted permissions and API calls. Benign apps are collected from Google's play store, hiapk, app china, Android, mumayi , gfan slideme, and pandaapp. These .apk files collected from the last three years continuously and contain 81 distinct malware families.

  11. API Call based Malware Dataset

    • kaggle.com
    zip
    Updated May 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferhat Ozgur Catak (2019). API Call based Malware Dataset [Dataset]. https://www.kaggle.com/focatak/malapi2019
    Explore at:
    zip(5944171 bytes)Available download formats
    Dataset updated
    May 8, 2019
    Authors
    Ferhat Ozgur Catak
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Analytics https://img.shields.io/badge/visits-100k-green" alt="Total Downloads">

    Windows Malware Dataset with PE API Calls

    Our public malware dataset generated by Cuckoo Sandbox based on Windows OS API calls analysis for cyber security researchers for malware analysis in cvs file format for machine learning applications.

    Cite The DataSet
    If you find those results useful please cite them :

    @article{10.7717/peerj-cs.285,
     title = {Deep learning based Sequential model for malware analysis using Windows exe API Calls},
     author = {Catak, Ferhat Ozgur and Yazı, Ahmet Faruk and Elezaj, Ogerta and Ahmed, Javed},
     year = 2020,
     month = jul,
     keywords = {Malware analysis, Sequential models, Network security, Long-short-term memory, Malware dataset},
     volume = 6,
     pages = {e285},
     journal = {PeerJ Computer Science},
     issn = {2376-5992},
     url = {https://doi.org/10.7717/peerj-cs.285},
     doi = {10.7717/peerj-cs.285}
    }
    

    Publications

    The details of the Mal-API-2019 dataset are published in following the papers: * [Link] AF. Yazı, FÖ Çatak, E. Gül, Classification of Metamorphic Malware with Deep Learning (LSTM), IEEE Signal Processing and Applications Conference, 2019. * [Link] Catak, FÖ., Yazi, AF., A Benchmark API Call Dataset for Windows PE Malware Classification, arXiv:1905.01999, 2019.

    Introduction

    This study seeks to obtain data which will help to address machine learning based malware research gaps. The specific objective of this study is to build a benchmark dataset for Windows operating system API calls of various malware. This is the first study to undertake metamorphic malware to build sequential API calls. It is hoped that this research will contribute to a deeper understanding of how metamorphic malware change their behavior (i.e. API calls) by adding meaningless opcodes with their own dissembler/assembler parts.

    Malware Types and System Overall

    In our research, we have translated the families produced by each of the software into 8 main malware families: Trojan, Backdoor, Downloader, Worms, Spyware Adware, Dropper, Virus. Table 1 shows the number of malware belonging to malware families in our data set. As you can see in the table, the number of samples of other malware families except AdWare is quite close to each other. There is such a difference because we don't find too much of malware from the adware malware family.

    Malware FamilySamplesDescription
    Spyware832enables a user to obtain covert information about another's computer activities by transmitting data covertly from their hard drive.
    Downloader1001share the primary functionality of downloading content.
    Trojan1001misleads users of its true intent.
    Worms1001spreads copies of itself from computer to computer.
    Adware379hides on your device and serves you advertisements.
    Dropper891surreptitiously carries viruses, back doors and other malicious software so they can be executed on the compromised machine.
    Virus1001designed to spread from host to host and has the ability to replicate itself.
    Backdoor1001a technique in which a system security mechanism is bypassed undetectably to access a computer or its data.

    Figure shows the general flow of the generation of the malware data set. As shown in the figure, we have obtained the MD5 hash values of the malware we collect from Github. We searched these hash values using the VirusTotal API, and we have obtained the families of these malicious software from the reports of 67 different antivirus software in VirusTotal. We have observed that the malicious software families found in the reports of these 67 different antivirus software in VirusTotal are different.

    Screenshot

    Data Description

  12. Global malware types detected most frequently 2020-2021

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Global malware types detected most frequently 2020-2021 [Dataset]. https://www.statista.com/statistics/271037/distribution-of-most-common-malware-file-types/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2020 - Sep 2021
    Area covered
    Worldwide
    Description

    Between October 2020 and September 2021, Backdoor was the most common type of malware attack worldwide. Cyber attacks of this group amounted to 37 percent of all detected malware attacks in the measured period. Downloader ranked second, with 17 percent, while Worm followed with 16 percent among all malware attacks reported.

  13. VHS-22

    • kaggle.com
    zip
    Updated Apr 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H2020 SIMARGL (2022). VHS-22 [Dataset]. https://www.kaggle.com/datasets/h2020simargl/vhs-22-network-traffic-dataset/data
    Explore at:
    zip(1903940704 bytes)Available download formats
    Dataset updated
    Apr 29, 2022
    Authors
    H2020 SIMARGL
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    VHS-22 is a heterogeneous, flow-level dataset which combines ISOT, CICIDS-17, Booters and CTU-13 datasets, as well as traffic from Malware Traffic Analysis (MTA) site, to increase variety of malicious and legitimate traffic flows. It contains 27.7 million flows (20.3 million legitimate and 7.4 million of attacks). The flows are represented in the form of 45 features; apart from classical NetFlow features, VHS-22 contains statistical parameters and network-level features. Their detailed description and the results of initial detection experiments are presented in the paper:

    Paweł Szumełda, Natan Orzechowski, Mariusz Rawski, and Artur Janicki. 2022. VHS-22 – A Very Heterogeneous Set of Network Traffic Data for Threat Detection. In Proc. European Interdisciplinary Cybersecurity Conference (EICC 2022), June 15–16, 2022, Barcelona, Spain. ACM, New York, NY, USA, https://doi.org/10.1145/3528580.3532843

    Every day contains different attacks mixed with legitimate traffic. 01-01-2022 Botnet attacks from ISOT dataset. 02-01-2022 Various attacks from MTA dataset. 03-01-2022 Web attacks from CICIDS-17 dataset. 04-01-2022 Bruteforce attacks from CICIDS-17 dataset. 05-01-2022 Botnet attacks from CICIDS-17 dataset. 06-01-2022 DDoS attacks from CICIDS-17 dataset 07-01-2022 to 11-01-2022 DDoS attacks from Booters dataset. 12-01-2022 to 23-01-2022 Botnet traffic from CTU-13 dataset.

    The VHS-22 dataset consists of labeled network flows and all data is publicly available for researchers in .csv format. When using VHS-22, please cite our paper which describes the VHS-22 dataset in detail, as well as the publications describing the source datasets:

    Paweł Szumełda, Natan Orzechowski, Mariusz Rawski, and Artur Janicki. 2022. VHS-22 – A Very Heterogeneous Set of Network Traffic Data for Threat Detection. In Proc. European Interdisciplinary Cybersecurity Conference (EICC 2022), June 15–16, 2022, Barcelona, Spain. ACM, New York, NY, USA, https://doi.org/10.1145/3528580.3532843

    Sherif Saad, Issa Traore, Ali Ghorbani, Bassam Sayed, David Zhao, Wei Lu, John Felix, and Payman Hakimian. 2011. Detecting P2P botnets through network behavior analysis and machine learning. In Proc. International Conference on Privacy, Security and Trust. IEEE, Montreal, Canada, 174–1

    Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. 2018. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization, In Proc. 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), Funchal, Portugal

    José Jair Santanna, Romain Durban, Anna Sperotto, and Aiko Pras. 2015. Inside booters: An analysis on operational databases. In Proc. International Symposium on Integrated Network Management (INM 2015). IFIP/IEEE, Ottawa, Canada, 432–440. https://doi.org/10.1109/INM.2015.71403

    Riaz Khan, Xiaosong Zhang, Rajesh Kumar, Abubakar Sharif, Noorbakhsh Amiri Golilarz, and Mamoun Alazab. 2019. An Adaptive Multi-Layer Botnet Detection Technique Using Machine Learning Classifiers. Applied Sciences 9 (06 2019), 2375. https://doi.org/10.3390/app91123

    The Malware Traffic Analysis data originate from https://www.malware-traffic-analysis.net, authored by Brad.

    The work has been funded by the SIMARGL Project -- Secure Intelligent Methods for Advanced RecoGnition of malware and stegomalware, with the support of the European Commission and the Horizon 2020 Program, under Grant Agreement No. 833042.

  14. g

    SEPTA Ridership Statistics

    • gimi9.com
    • s.cnmilf.com
    • +1more
    Updated Jul 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). SEPTA Ridership Statistics [Dataset]. https://gimi9.com/dataset/data-gov_septa-ridership-statistics/
    Explore at:
    Dataset updated
    Jul 9, 2023
    Description

    Stop summary files represent average daily ridership at the stop level over the course of the relevant period. Trolley ridership data was generated using automatic passenger counters (APCs). Bus data is calculated from a variety of sources depending on the route and year. The bus data files represent average daily fall ridership from 2014 – present. Accurate weekend bus data was not available until 2017 at which point SEPTA had more widespread APC coverage. No bus data is available for Fall 2020 due to a malware attack. APC bus data was also not available for articulated vehicles and the Boulevard Direct from August 2020 through February 2022 due to the malware attack.

  15. Ransomware attacks leading to data breaches worldwide 2020-2021

    • statista.com
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Ransomware attacks leading to data breaches worldwide 2020-2021 [Dataset]. https://www.statista.com/statistics/1321457/ransomware-resulting-in-data-breaches/
    Explore at:
    Dataset updated
    Dec 10, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In 2021, the number of global ransomware attacks that resulted in data leakage saw an increase of over 80 percent year-over-year. The number of ransomware attacks leading to data exposure in 2020 was around 1,500, while in 2021 it reached approximately 2,700 cases worldwide.

  16. Classification of Malwares (CLaMP)

    • kaggle.com
    zip
    Updated Jan 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Shahane (2021). Classification of Malwares (CLaMP) [Dataset]. https://www.kaggle.com/saurabhshahane/classification-of-malwares
    Explore at:
    zip(445293 bytes)Available download formats
    Dataset updated
    Jan 30, 2021
    Authors
    Saurabh Shahane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    A Malware classifier dataset built with header fields’ values of Portable Executable files

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    ClaMP_Integrated-5184.csv Total samples : 5184 (Malware () + Benign()) Features (69) : Raw Features (54) + Derived Features(15)

    ClaMP_Raw-5184.csv Total samples : 5184 (Malware ()+ Benign()) Features (55) : Raw Features(55)

    IMAGE_DOS_HEADER (19)

    "e_magic", "e_cblp", "e_cp","e_crlc","e_cparhdr", "e_minalloc","e_maxalloc","e_ss","e_sp", "e_csum","e_ip","e_cs","e_lfarlc","e_ovno","e_res", "e_oemid","e_oeminfo","e_res2","e_lfanew"

    FILE_HEADER (7)

    "Machine","NumberOfSections","CreationYear","PointerToSymbolTable", "NumberOfSymbols","SizeOfOptionalHeader","Characteristics"

    OPTIONAL_HEADER (29)

    "Magic", "MajorLinkerVersion", "MinorLinkerVersion", "SizeOfCode", "SizeOfInitializedData", "SizeOfUninitializedData", "AddressOfEntryPoint", "BaseOfCode", "BaseOfData", "ImageBase", "SectionAlignment", "FileAlignment", "MajorOperatingSystemVersion", "MinorOperatingSystemVersion", "MajorImageVersion", "MinorImageVersion", "MajorSubsystemVersion", "MinorSubsystemVersion", "SizeOfImage", "SizeOfHeaders", "CheckSum", "Subsystem", "DllCharacteristics", "SizeOfStackReserve", "SizeOfStackCommit", "SizeOfHeapReserve", "SizeOfHeapCommit", "LoaderFlags", "NumberOfRvaAndSizes"

    TARGET_VARIABLE: class - 0 (benign), 1 (malware)

    Acknowledgements

    The data is sourced from Mendeley data.

    Kumar, Ajit (2020), “ClaMP (Classification of Malware with PE headers)”, Mendeley Data, V1, doi: 10.17632/xvyv59vwvz.1

    Read Paper: "A learning model to detect maliciousness of portable executable using integrated feature set", authored by Ajit Kumar, K.S.Kuppusamy, and G.Aghila.

  17. Covid-19 Worldometers Latest Cases Data July 2020

    • kaggle.com
    zip
    Updated Jul 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujay Sreedhar (2020). Covid-19 Worldometers Latest Cases Data July 2020 [Dataset]. https://www.kaggle.com/sujay12345/covid19-worldometers-latest-cases-data-july-2020
    Explore at:
    zip(24768 bytes)Available download formats
    Dataset updated
    Jul 8, 2020
    Authors
    Sujay Sreedhar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    I would love to see notebooks! Keep bringin' em.

    Content

    Worldometer manually analyzes, validates, and aggregates data from thousands of sources in real time and provides global COVID-19 live statistics for a wide audience of caring people around the world.

    Our data is also trusted and used by the UK Government, Johns Hopkins CSSE, the Government of Thailand, the Government of Vietnam, the Government of Pakistan, Financial Times, The New York Times, Business Insider, BBC, and many others.

    Acknowledgements

    Acknowledge Sujay S

    Inspiration

    Thanks to blogs out there on medium! That made me do this!

  18. 🌐 Global Cybersecurity Threats (2015-2024)

    • kaggle.com
    zip
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Soundankar (2025). 🌐 Global Cybersecurity Threats (2015-2024) [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/global-cybersecurity-threats-2015-2024
    Explore at:
    zip(48178 bytes)Available download formats
    Dataset updated
    Mar 16, 2025
    Authors
    Atharva Soundankar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📂

    The Global Cybersecurity Threats Dataset (2015-2024) provides extensive data on cyberattacks, malware types, targeted industries, and affected countries. It is designed for threat intelligence analysis, cybersecurity trend forecasting, and machine learning model development to enhance global digital security.

    📊 Column Descriptions

    Column NameDescription
    CountryCountry where the attack occurred
    YearYear of the incident
    Threat TypeType of cybersecurity threat (e.g., Malware, DDoS)
    Attack VectorMethod of attack (e.g., Phishing, SQL Injection)
    Affected IndustryIndustry targeted (e.g., Finance, Healthcare)
    Data Breached (GB)Volume of data compromised
    Financial Impact ($M)Estimated financial loss in millions
    Severity LevelLow, Medium, High, Critical
    Response Time (Hours)Time taken to mitigate the attack
    Mitigation StrategyCountermeasures taken
  19. Respiratory Virus Dashboard Metrics

    • data.chhs.ca.gov
    • healthdata.gov
    • +2more
    csv, xlsx, zip
    Updated Nov 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Respiratory Virus Dashboard Metrics [Dataset]. https://data.chhs.ca.gov/dataset/respiratory-virus-dashboard-metrics
    Explore at:
    csv(116045), zip, xlsx(9425), csv(64958), csv(53108), xlsx(9666), xlsx(9337)Available download formats
    Dataset updated
    Nov 21, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    Note: On April 30, 2024, the Federal mandate for COVID-19 and influenza associated hospitalization data to be reported to CDC’s National Healthcare Safety Network (NHSN) expired. Hospitalization data beyond April 30, 2024, will not be updated on the Open Data Portal. Hospitalization and ICU admission data collected from summer 2020 to May 10, 2023, are sourced from the California Hospital Association (CHA) Survey. Data collected on or after May 11, 2023, are sourced from CDC's National Healthcare Safety Network (NHSN).

    Data is from the California Department of Public Health (CDPH) Respiratory Virus State Dashboard at https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/Respiratory-Viruses/RespiratoryDashboard.aspx.

    Data are updated each Friday around 2 pm.

    For COVID-19 death data: As of January 1, 2023, data was sourced from the California Department of Public Health, California Comprehensive Death File (Dynamic), 2023–Present. Prior to January 1, 2023, death data was sourced from the COVID-19 case registry. The change in data source occurred in July 2023 and was applied retroactively to all 2023 data to provide a consistent source of death data for the year of 2023. Influenza death data was sourced from the California Department of Public Health, California Comprehensive Death File (Dynamic), 2020–Present.

    COVID-19 testing data represent data received by CDPH through electronic laboratory reporting of test results for COVID-19 among residents of California. Testing date is the date the test was administered, and tests have a 1-day lag (except for the Los Angeles County, which has an additional 7-day lag). Influenza testing data represent data received by CDPH from clinical sentinel laboratories in California. These laboratories report the aggregate number of laboratory-confirmed influenza virus detections and total tests performed on a weekly basis. These data do not represent all influenza testing occurring in California and are available only at the state level.

  20. IoT-23: A labeled dataset with malicious and benign IoT network traffic

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Sep 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Garcia; Sebastian Garcia; Agustin Parmisano; Maria Jose Erquiaga; Agustin Parmisano; Maria Jose Erquiaga (2021). IoT-23: A labeled dataset with malicious and benign IoT network traffic [Dataset]. http://doi.org/10.5281/zenodo.4743746
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 3, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sebastian Garcia; Sebastian Garcia; Agustin Parmisano; Maria Jose Erquiaga; Agustin Parmisano; Maria Jose Erquiaga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IoT-23 is a dataset of network traffic from Internet of Things (IoT) devices. It has 20 malware captures executed in IoT devices, and 3 captures for benign IoT devices traffic. It was first published in January 2020, with captures ranging from 2018 to 2019. These IoT network traffic was captured in the Stratosphere Laboratory, AIC group, FEL, CTU University, Czech Republic. Its goal is to offer a large dataset of real and labeled IoT malware infections and IoT benign traffic for researchers to develop machine learning algorithms. This dataset and its research was funded by Avast Software. The malware was allow to connect to the Internet.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2020). Mexico: registered number of malware attacks 2019-2020 [Dataset]. https://www.statista.com/statistics/1179173/number-registered-malware-attacks-mexico/
Organization logo

Mexico: registered number of malware attacks 2019-2020

Explore at:
Dataset updated
Jul 15, 2020
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Mexico
Description

In the first half of 2020, over *** million malware attacks were registered in Mexico. This number represents however a decrease of about three percent in comparison to the same period the previous year, when approximately **** million attacks were registered. In 2020, Mexico was one of the countries with most cyber attacks in Latin America.

Search
Clear search
Close search
Google apps
Main menu