Facebook
TwitterIn the first half of 2020, over *** million malware attacks were registered in Mexico. This number represents however a decrease of about three percent in comparison to the same period the previous year, when approximately **** million attacks were registered. In 2020, Mexico was one of the countries with most cyber attacks in Latin America.
Facebook
TwitterAs of March 2020, the total number of new malware detections worldwide amounted to ****** million programs, up from *** million new malware detections at the end of January 2020. According to AV-TEST, the cumulative number of new malware samples is projected to surpass *** million within 2020.
Facebook
TwitterAs of March 2020, the total number of new Android malware samples amounted to 482,579 per month. According to AV-Test, trojans were the most common type of malware affecting Android devices.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.
Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.
We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.
MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.
VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.
AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.
Credits Authors
Billy Bob Brumley (Tampere University, Tampere, Finland)
Juha Nurmi (Tampere University, Tampere, Finland)
Mikko Niemelä (Cyber Intelligence House, Singapore)
Funding
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).
Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
📌 Context of the Dataset
The Healthcare Ransomware Dataset was created to simulate real-world cyberattacks in the healthcare industry. Hospitals, clinics, and research labs have become prime targets for ransomware due to their reliance on real-time patient data and legacy IT infrastructure. This dataset provides insight into attack patterns, recovery times, and cybersecurity practices across different healthcare organizations.
Why is this important?
Ransomware attacks on healthcare organizations can shut down entire hospitals, delay treatments, and put lives at risk. Understanding how different healthcare organizations respond to attacks can help develop better security strategies. The dataset allows cybersecurity analysts, data scientists, and researchers to study patterns in ransomware incidents and explore predictive modeling for risk mitigation.
📌 Sources and Research Inspiration This simulated dataset was inspired by real-world cybersecurity reports and built using insights from official sources, including:
1️⃣ IBM Cost of a Data Breach Report (2024)
The healthcare sector had the highest average cost of data breaches ($10.93 million per incident). On average, organizations recovered only 64.8% of their data after paying ransom. Healthcare breaches took 277 days on average to detect and contain.
2️⃣ Sophos State of Ransomware in Healthcare (2024)
67% of healthcare organizations were hit by ransomware in 2024, an increase from 60% in 2023. 66% of backup compromise attempts succeeded, making data recovery significantly more difficult. The most common attack vectors included exploited vulnerabilities (34%) and compromised credentials (34%).
3️⃣ Health & Human Services (HHS) Cybersecurity Reports
Ransomware incidents in healthcare have doubled since 2016. Organizations that fail to monitor threats frequently experience higher infection rates.
4️⃣ Cybersecurity & Infrastructure Security Agency (CISA) Alerts
Identified phishing, unpatched software, and exposed RDP ports as top ransomware entry points. Only 13% of healthcare organizations monitor cyber threats more than once per day, increasing the risk of undetected attacks.
5️⃣ Emsisoft 2020 Report on Ransomware in Healthcare
The number of ransomware attacks in healthcare increased by 278% between 2018 and 2023. 560 healthcare facilities were affected in a single year, disrupting patient care and emergency services.
📌 Why is This a Simulated Dataset?
This dataset does not contain real patient data or actual ransomware cases. Instead, it was built using probabilistic modeling and structured randomness based on industry benchmarks and cybersecurity reports.
How It Was Created:
1️⃣ Defining the Dataset Structure
The dataset was designed to simulate realistic attack patterns in healthcare, using actual ransomware case studies as inspiration.
Columns were selected based on what real-world cybersecurity teams track, such as: Attack methods (phishing, RDP exploits, credential theft). Infection rates, recovery time, and backup compromise rates. Organization type (hospitals, clinics, research labs) and monitoring frequency.
2️⃣ Generating Realistic Data Using ChatGPT & Python
ChatGPT assisted in defining relationships between attack factors, ensuring that key cybersecurity concepts were accurately reflected. Python’s NumPy and Pandas libraries were used to introduce randomized attack simulations based on real-world statistics. Data was validated against industry research to ensure it aligns with actual ransomware attack trends.
3️⃣ Ensuring Logical Relationships Between Data Points
Hospitals take longer to recover due to larger infrastructure and compliance requirements. Organizations that track more cyber threats recover faster because they detect attacks earlier. Backup security significantly impacts recovery time, reflecting the real-world risk of backup encryption attacks.
Facebook
TwitterContext The growing use of mobile devices has made them a popular target for malicious applications because they store sensitive personal information, including messages, photos, and other private data. To exploit this, modern malware is designed using advanced techniques that make it difficult to detect and analyze. As a result, the rapid increase in new malware variants each day has outpaced the effectiveness of traditional detection methods. Consequently, as cyber threats continue to evolve in complexity, there is an urgent need for advanced security measures to protect user privacy and safeguard sensitive information.
Content Two datasets are created:
Binary Classification (Dataset-1) Dataset-1 contains 352 static and 323 dynamic features extracted from 1800 benign and 1747 malicious apps.
Multi-class Classification (Dataset-2) Dataset-2 contains 352 static and 323 dynamic features extracted from 1747 malicious apps with 13 malicious families.
AndroMD Dataset This dataset is the combination of both static and dynamic features as mentioned in Binary classification (Dataset-1)
These datasets are valuable for researchers and experts in Android malware detection and for conducting experimental studies.
Acknowledgements
Meghna Dhalaria, Ekta Gandotra (2021). "A Hybrid Approach for Android Malware Detection and Family Classification", International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, issue Regular Issue, no. 6, pp. 174-188. https://doi.org/10.9781/ijimai.2020.09.001
Meghna Dhalaria, and Ekta Gandotra, “MalDetect: A Classifier Fusion Approach for Detection of Android Malware,” Expert Systems with Applications, vol. 235, pp. 121155, 2023. https://doi.org/10.1016/j.eswa.2023.121155
Meghna Dhalaria, and Ekta Gandotra, “Binary and Multi-class Classification of Android Applications using Static Features.” International Journal of Applied Management Science. vol. 15, no. 2, pp. 117-140, 2023. https://doi.org/10.1504/IJAMS.2023.131670
Meghna Dhalaria, and Ekta Gandotra, "CSForest: an approach for imbalanced family classification of android malicious applications." International Journal of Information Technology, vol. 13 no. 3, pp. 1-13, 2021. https://doi.org/10.1007/s41870-021-00661-7
Meghna Dhalaria, and Ekta Gandotra, “A Framework for Detection of Android Malware using Static Features,” In 2020 IEEE 17th India Council International Conference (INDICON), pp. 1-7, IEEE, 2020. https://doi.org/10.1109/INDICON49873.2020.9342511
Meghna Dhalaria, and Ekta Gandotra, “Android Malware Detection using Chi-Square Feature Selection and Ensemble Learning Method,” In 2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 36-41, IEEE, 2020. https://doi.org/10.1109/PDGC50313.2020.9315818
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 200K android malware apps which are labeled and characterized into corresponding family. Benign android apps (200K) are collected from Androzoo dataset to balance the huge dataset. We collected 14 malware categories including adware, backdoor, file infector, no category, Potentially Unwanted Apps (PUA), ransomware, riskware, scareware, trojan, trojan-banker, trojan-dropper, trojan-sms,**trojan-spy** and zero-day.
A complete taxonomy of all the malware families of captured malware apps is created by dividing them into 8 categories such as sensitive data collection, media, hardware, actions/activities, internet connection, C&C, antivirus and storage & settings.
| Category | Number of families | Number of samples |
|---|---|---|
| Adware | 48 | 47,210 |
| Backdoor | 11 | 1,538 |
| File Infector | 5 | 669 |
| No Category | - | 2,296 |
| PUA | 8 | 2,051 |
| Ransomware | 8 | 6,202 |
| Riskware | 21 | 97,349 |
| Scareware | 3 | 1,556 |
| Trojan | 45 | 13,559 |
| Trojan-Banker | 11 | 887 |
| Trojan-Dropper | 9 | 2,302 |
| Trojan-SMS | 11 | 3,125 |
| Trojan-Spy | 11 | 3,540 |
| Zero-day | - | 13,340 |
AndroidManifest.xml contains a lot of features that can be used for static analysis. The main extracted features include:
| Feature | Values |
|---|---|
| Package Name | "com.fb.iwidget" |
| Activities | "com.fb.iwidget.OverlayActivity" "org.acra.CrashReportDialog" "com.batch.android.BatchActionActivity" "com.fb.iwidget.MainActivity" "com.fb.iwidget.PreferencesActivity" "com.fb.iwidget.PickerActivity" "com.fb.iwidget.IntroActivity" |
| Services | "com.batch.android.BatchActionService" "com.fb.iwidget.MainService" "com.fb.iwidget.SnapAccessService" |
| Receivers/Providers | "com.fb.iwidget.ExpandWidgetProvider" "com.fb.iwidget.ActionReceiver" |
| Intents Actions | "android.accessibilityservice.AccessibilityService" "android.appwidget.action.APPWIDGET_UPDATE" "android.intent.action.BOOT_COMPLETED" "android.intent.action.CREATE_SHORTCUT" "android.intent.action.MAIN" "android.intent.action.MY_PACKAGE_REPLACED" "android.intent.action.USER_PRESENT" "android.intent.action.VIEW" "com.fb.iwidget.action.SHOULD_REVIVE" |
| Intents Categories | "android.intent.category.BROWSABLE" "android.intent.category.DEFAULT" "android.intent.category.LAUNCHER" |
| Permissions | "android.permission.ACCESS_NETWORK_STATE" "android.permission.CALL_PHONE" "android.permission.INTERNET" "android.permission.RECEIVE_BOOT_COMPLETED" "android.permission.SYSTEM_ALERT_WINDOW" "com.android.vending.BILLING" "android.permission.BIND_ACCESSIBILITY_SERVICE" |
| Meta-Data | "android.accessibilityservice" "android.appwidget.provider" |
| #Icons | 331 |
| #Pictures | 0 |
| #Videos | 0 |
| Audio files | 0 |
| Videos | 0 |
| Size of the App | 4.2M |
For understanding the behavioral changes of these malware categories and families, six categories of features are extracted after executing the malware in an emulated environment. The main extracted features include:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset "Cybersecurity Cases in India" is a comprehensive collection of real-world cybersecurity incidents reported across various cities in India. The dataset encapsulates the financial loss, incident types, and categories, providing a detailed overview of the cybercrime landscape in one of the world’s largest digital economies. With over 1000 records, it spans incidents from 2020 to 2024, covering various types of cybercrimes such as phishing, online fraud, malware attacks, ransomware, data breaches, DDoS attacks, identity theft, and more. Each record captures important attributes of the incidents, such as the year, date of occurrence, amount lost in INR, the type of incident, the city in which it occurred, and the category of the affected entity (e.g., financial, personal, corporate).
The dataset is structured to enable analysis of the trends in cybercrime over time, the financial impact of various cyberattacks, and the geographic distribution of incidents across Indian cities. It serves as a critical resource for cybersecurity professionals, policymakers, law enforcement agencies, and academic researchers seeking to understand the challenges posed by cybercrime in India and to identify strategies to combat these challenges.
The dataset’s primary purpose is to provide an extensive, granular view of the nature and scope of cybersecurity incidents in India. It enables the analysis of the frequency, severity, and financial impact of cybercrimes across different types of attacks, cities, and time periods. As cybercrimes continue to rise globally, including in India, this dataset serves as an important tool for understanding the evolving threats and risks in cyberspace. Cybersecurity experts and analysts can leverage this dataset to identify patterns and trends, while government and law enforcement agencies can use it to devise more targeted interventions and preventive measures.
India, with its large and growing digital footprint, is a prime target for cybercriminals. The country's rapidly expanding internet user base, coupled with increasing digital adoption in various sectors like finance, healthcare, education, and e-commerce, makes it an attractive target for cyberattacks. This dataset allows stakeholders to understand how cybercrime evolves in response to these dynamics.
The dataset is a rich resource for understanding the following:
The dataset includes the following key variables, each contributing valuable information to the analysis:
India's digital transformation has made it a prime target for cybercriminals. As of 2023, India is one of the largest internet markets in the world, with over 600 million active internet users. The rapid growth of e-commerce, digital banking, social media, and government services has created new opportunities for cybercriminals to exploit vulnerabilities in digital systems. According to a 2022 report by the Indian Computer Emergency Response Team (CERT-In), India witnessed a significant increase in cybersecurity incidents, with millions of cyberattacks targeting individuals, b...
Facebook
TwitterIn 2024, ** percent of organizations worldwide claimed to have fallen victim to a ransomware attack in the previous year, according to a survey conducted among cybersecurity leaders of worldwide organizations. This is a decline compared to *** previous years, when ** percent of global organizations encountered ransomware attacks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of apps needed permissions during installation and run-time. We collect apps from three different sources google play, third-party apps and malware dataset. This file contains more than 5,00,000 Android apps. features extracted at the time of installation and execution. One file contains the name of the features and others contain .apk file corresponding to it extracted permissions and API calls. Benign apps are collected from Google's play store, hiapk, app china, Android, mumayi , gfan slideme, and pandaapp. These .apk files collected from the last three years continuously and contain 81 distinct malware families.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
https://img.shields.io/badge/visits-100k-green" alt="Total Downloads">
Our public malware dataset generated by Cuckoo Sandbox based on Windows OS API calls analysis for cyber security researchers for malware analysis in cvs file format for machine learning applications.
Cite The DataSet
If you find those results useful please cite them :
@article{10.7717/peerj-cs.285,
title = {Deep learning based Sequential model for malware analysis using Windows exe API Calls},
author = {Catak, Ferhat Ozgur and Yazı, Ahmet Faruk and Elezaj, Ogerta and Ahmed, Javed},
year = 2020,
month = jul,
keywords = {Malware analysis, Sequential models, Network security, Long-short-term memory, Malware dataset},
volume = 6,
pages = {e285},
journal = {PeerJ Computer Science},
issn = {2376-5992},
url = {https://doi.org/10.7717/peerj-cs.285},
doi = {10.7717/peerj-cs.285}
}
The details of the Mal-API-2019 dataset are published in following the papers: * [Link] AF. Yazı, FÖ Çatak, E. Gül, Classification of Metamorphic Malware with Deep Learning (LSTM), IEEE Signal Processing and Applications Conference, 2019. * [Link] Catak, FÖ., Yazi, AF., A Benchmark API Call Dataset for Windows PE Malware Classification, arXiv:1905.01999, 2019.
This study seeks to obtain data which will help to address machine learning based malware research gaps. The specific objective of this study is to build a benchmark dataset for Windows operating system API calls of various malware. This is the first study to undertake metamorphic malware to build sequential API calls. It is hoped that this research will contribute to a deeper understanding of how metamorphic malware change their behavior (i.e. API calls) by adding meaningless opcodes with their own dissembler/assembler parts.
In our research, we have translated the families produced by each of the software into 8 main malware families: Trojan, Backdoor, Downloader, Worms, Spyware Adware, Dropper, Virus. Table 1 shows the number of malware belonging to malware families in our data set. As you can see in the table, the number of samples of other malware families except AdWare is quite close to each other. There is such a difference because we don't find too much of malware from the adware malware family.
| Malware Family | Samples | Description |
|---|---|---|
| Spyware | 832 | enables a user to obtain covert information about another's computer activities by transmitting data covertly from their hard drive. |
| Downloader | 1001 | share the primary functionality of downloading content. |
| Trojan | 1001 | misleads users of its true intent. |
| Worms | 1001 | spreads copies of itself from computer to computer. |
| Adware | 379 | hides on your device and serves you advertisements. |
| Dropper | 891 | surreptitiously carries viruses, back doors and other malicious software so they can be executed on the compromised machine. |
| Virus | 1001 | designed to spread from host to host and has the ability to replicate itself. |
| Backdoor | 1001 | a technique in which a system security mechanism is bypassed undetectably to access a computer or its data. |
Figure shows the general flow of the generation of the malware data set. As shown in the figure, we have obtained the MD5 hash values of the malware we collect from Github. We searched these hash values using the VirusTotal API, and we have obtained the families of these malicious software from the reports of 67 different antivirus software in VirusTotal. We have observed that the malicious software families found in the reports of these 67 different antivirus software in VirusTotal are different.
Facebook
TwitterBetween October 2020 and September 2021, Backdoor was the most common type of malware attack worldwide. Cyber attacks of this group amounted to 37 percent of all detected malware attacks in the measured period. Downloader ranked second, with 17 percent, while Worm followed with 16 percent among all malware attacks reported.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
VHS-22 is a heterogeneous, flow-level dataset which combines ISOT, CICIDS-17, Booters and CTU-13 datasets, as well as traffic from Malware Traffic Analysis (MTA) site, to increase variety of malicious and legitimate traffic flows. It contains 27.7 million flows (20.3 million legitimate and 7.4 million of attacks). The flows are represented in the form of 45 features; apart from classical NetFlow features, VHS-22 contains statistical parameters and network-level features. Their detailed description and the results of initial detection experiments are presented in the paper:
Paweł Szumełda, Natan Orzechowski, Mariusz Rawski, and Artur Janicki. 2022. VHS-22 – A Very Heterogeneous Set of Network Traffic Data for Threat Detection. In Proc. European Interdisciplinary Cybersecurity Conference (EICC 2022), June 15–16, 2022, Barcelona, Spain. ACM, New York, NY, USA, https://doi.org/10.1145/3528580.3532843
Every day contains different attacks mixed with legitimate traffic. 01-01-2022 Botnet attacks from ISOT dataset. 02-01-2022 Various attacks from MTA dataset. 03-01-2022 Web attacks from CICIDS-17 dataset. 04-01-2022 Bruteforce attacks from CICIDS-17 dataset. 05-01-2022 Botnet attacks from CICIDS-17 dataset. 06-01-2022 DDoS attacks from CICIDS-17 dataset 07-01-2022 to 11-01-2022 DDoS attacks from Booters dataset. 12-01-2022 to 23-01-2022 Botnet traffic from CTU-13 dataset.
The VHS-22 dataset consists of labeled network flows and all data is publicly available for researchers in .csv format. When using VHS-22, please cite our paper which describes the VHS-22 dataset in detail, as well as the publications describing the source datasets:
Paweł Szumełda, Natan Orzechowski, Mariusz Rawski, and Artur Janicki. 2022. VHS-22 – A Very Heterogeneous Set of Network Traffic Data for Threat Detection. In Proc. European Interdisciplinary Cybersecurity Conference (EICC 2022), June 15–16, 2022, Barcelona, Spain. ACM, New York, NY, USA, https://doi.org/10.1145/3528580.3532843
Sherif Saad, Issa Traore, Ali Ghorbani, Bassam Sayed, David Zhao, Wei Lu, John Felix, and Payman Hakimian. 2011. Detecting P2P botnets through network behavior analysis and machine learning. In Proc. International Conference on Privacy, Security and Trust. IEEE, Montreal, Canada, 174–1
Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. 2018. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization, In Proc. 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), Funchal, Portugal
José Jair Santanna, Romain Durban, Anna Sperotto, and Aiko Pras. 2015. Inside booters: An analysis on operational databases. In Proc. International Symposium on Integrated Network Management (INM 2015). IFIP/IEEE, Ottawa, Canada, 432–440. https://doi.org/10.1109/INM.2015.71403
Riaz Khan, Xiaosong Zhang, Rajesh Kumar, Abubakar Sharif, Noorbakhsh Amiri Golilarz, and Mamoun Alazab. 2019. An Adaptive Multi-Layer Botnet Detection Technique Using Machine Learning Classifiers. Applied Sciences 9 (06 2019), 2375. https://doi.org/10.3390/app91123
The Malware Traffic Analysis data originate from https://www.malware-traffic-analysis.net, authored by Brad.
The work has been funded by the SIMARGL Project -- Secure Intelligent Methods for Advanced RecoGnition of malware and stegomalware, with the support of the European Commission and the Horizon 2020 Program, under Grant Agreement No. 833042.
Facebook
TwitterStop summary files represent average daily ridership at the stop level over the course of the relevant period. Trolley ridership data was generated using automatic passenger counters (APCs). Bus data is calculated from a variety of sources depending on the route and year. The bus data files represent average daily fall ridership from 2014 – present. Accurate weekend bus data was not available until 2017 at which point SEPTA had more widespread APC coverage. No bus data is available for Fall 2020 due to a malware attack. APC bus data was also not available for articulated vehicles and the Boulevard Direct from August 2020 through February 2022 due to the malware attack.
Facebook
TwitterIn 2021, the number of global ransomware attacks that resulted in data leakage saw an increase of over 80 percent year-over-year. The number of ransomware attacks leading to data exposure in 2020 was around 1,500, while in 2021 it reached approximately 2,700 cases worldwide.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Malware classifier dataset built with header fields’ values of Portable Executable files
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
ClaMP_Integrated-5184.csv Total samples : 5184 (Malware () + Benign()) Features (69) : Raw Features (54) + Derived Features(15)
ClaMP_Raw-5184.csv Total samples : 5184 (Malware ()+ Benign()) Features (55) : Raw Features(55)
IMAGE_DOS_HEADER (19)
"e_magic", "e_cblp", "e_cp","e_crlc","e_cparhdr", "e_minalloc","e_maxalloc","e_ss","e_sp", "e_csum","e_ip","e_cs","e_lfarlc","e_ovno","e_res", "e_oemid","e_oeminfo","e_res2","e_lfanew"
FILE_HEADER (7)
"Machine","NumberOfSections","CreationYear","PointerToSymbolTable", "NumberOfSymbols","SizeOfOptionalHeader","Characteristics"
OPTIONAL_HEADER (29)
"Magic", "MajorLinkerVersion", "MinorLinkerVersion", "SizeOfCode", "SizeOfInitializedData", "SizeOfUninitializedData", "AddressOfEntryPoint", "BaseOfCode", "BaseOfData", "ImageBase", "SectionAlignment", "FileAlignment", "MajorOperatingSystemVersion", "MinorOperatingSystemVersion", "MajorImageVersion", "MinorImageVersion", "MajorSubsystemVersion", "MinorSubsystemVersion", "SizeOfImage", "SizeOfHeaders", "CheckSum", "Subsystem", "DllCharacteristics", "SizeOfStackReserve", "SizeOfStackCommit", "SizeOfHeapReserve", "SizeOfHeapCommit", "LoaderFlags", "NumberOfRvaAndSizes"
TARGET_VARIABLE: class - 0 (benign), 1 (malware)
The data is sourced from Mendeley data.
Kumar, Ajit (2020), “ClaMP (Classification of Malware with PE headers)”, Mendeley Data, V1, doi: 10.17632/xvyv59vwvz.1
Read Paper: "A learning model to detect maliciousness of portable executable using integrated feature set", authored by Ajit Kumar, K.S.Kuppusamy, and G.Aghila.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I would love to see notebooks! Keep bringin' em.
Worldometer manually analyzes, validates, and aggregates data from thousands of sources in real time and provides global COVID-19 live statistics for a wide audience of caring people around the world.
Our data is also trusted and used by the UK Government, Johns Hopkins CSSE, the Government of Thailand, the Government of Vietnam, the Government of Pakistan, Financial Times, The New York Times, Business Insider, BBC, and many others.
Acknowledge Sujay S
Thanks to blogs out there on medium! That made me do this!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Global Cybersecurity Threats Dataset (2015-2024) provides extensive data on cyberattacks, malware types, targeted industries, and affected countries. It is designed for threat intelligence analysis, cybersecurity trend forecasting, and machine learning model development to enhance global digital security.
| Column Name | Description |
|---|---|
| Country | Country where the attack occurred |
| Year | Year of the incident |
| Threat Type | Type of cybersecurity threat (e.g., Malware, DDoS) |
| Attack Vector | Method of attack (e.g., Phishing, SQL Injection) |
| Affected Industry | Industry targeted (e.g., Finance, Healthcare) |
| Data Breached (GB) | Volume of data compromised |
| Financial Impact ($M) | Estimated financial loss in millions |
| Severity Level | Low, Medium, High, Critical |
| Response Time (Hours) | Time taken to mitigate the attack |
| Mitigation Strategy | Countermeasures taken |
Facebook
TwitterNote: On April 30, 2024, the Federal mandate for COVID-19 and influenza associated hospitalization data to be reported to CDC’s National Healthcare Safety Network (NHSN) expired. Hospitalization data beyond April 30, 2024, will not be updated on the Open Data Portal. Hospitalization and ICU admission data collected from summer 2020 to May 10, 2023, are sourced from the California Hospital Association (CHA) Survey. Data collected on or after May 11, 2023, are sourced from CDC's National Healthcare Safety Network (NHSN).
Data is from the California Department of Public Health (CDPH) Respiratory Virus State Dashboard at https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/Respiratory-Viruses/RespiratoryDashboard.aspx.
Data are updated each Friday around 2 pm.
For COVID-19 death data: As of January 1, 2023, data was sourced from the California Department of Public Health, California Comprehensive Death File (Dynamic), 2023–Present. Prior to January 1, 2023, death data was sourced from the COVID-19 case registry. The change in data source occurred in July 2023 and was applied retroactively to all 2023 data to provide a consistent source of death data for the year of 2023. Influenza death data was sourced from the California Department of Public Health, California Comprehensive Death File (Dynamic), 2020–Present.
COVID-19 testing data represent data received by CDPH through electronic laboratory reporting of test results for COVID-19 among residents of California. Testing date is the date the test was administered, and tests have a 1-day lag (except for the Los Angeles County, which has an additional 7-day lag). Influenza testing data represent data received by CDPH from clinical sentinel laboratories in California. These laboratories report the aggregate number of laboratory-confirmed influenza virus detections and total tests performed on a weekly basis. These data do not represent all influenza testing occurring in California and are available only at the state level.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IoT-23 is a dataset of network traffic from Internet of Things (IoT) devices. It has 20 malware captures executed in IoT devices, and 3 captures for benign IoT devices traffic. It was first published in January 2020, with captures ranging from 2018 to 2019. These IoT network traffic was captured in the Stratosphere Laboratory, AIC group, FEL, CTU University, Czech Republic. Its goal is to offer a large dataset of real and labeled IoT malware infections and IoT benign traffic for researchers to develop machine learning algorithms. This dataset and its research was funded by Avast Software. The malware was allow to connect to the Internet.
Facebook
TwitterIn the first half of 2020, over *** million malware attacks were registered in Mexico. This number represents however a decrease of about three percent in comparison to the same period the previous year, when approximately **** million attacks were registered. In 2020, Mexico was one of the countries with most cyber attacks in Latin America.