Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Cyber threat intelligence (CTI) strategies involve gathering several data attributes, building profiles, using intelligent algorithms, and developing optimized threat detection and mitigation techniques. Windows based exe file malware can be detected through its Portable Executable (PE) file header features. Researchers require good datasets to develop efficient Anti-Malware technology. A A new dataset called SOMLAP (Swarm Optimization and Machine Learning Applied to PE Malware Detection) with a value addition to the existing benchmark dataset is developed. The SOMLAP data contains 51,409 samples that include both benign and malware files, with a total of 108 pure PE file header attributes. The data contains 19,809 (38.54%) malware file features gathered from Virus Share and 31,600 (61.46%) benign executables and DLLs were gathered from Windows 10 OS.
For more details please refer our research article: https://doi.org/10.3390/electronics12020342
If you use this data in your work, please cite the paper:
Kattamuri, Santosh Jhansi, Ravi Kiran Varma Penmatsa, Sujata Chakravarty, and Venkata Sai Pavan Madabathula. 2023. "Swarm Optimization and Machine Learning Applied to PE Malware Detection towards Cyber Threat Intelligence" Electronics 12, no. 2: 342. https://doi.org/10.3390/electronics12020342
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is useful to practice skills in Data Analysis or Data Science, contains information about indicators of crompromise found in MalwareBazaar's database.
The dataset was retrieved from MalwareBazaar's database, full dump CSV. Curated, formatted and cleaned by myself.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The "Android Malware Detection Dataset" is a comprehensive collection of data designed to facilitate research in the detection and analysis of malware targeting the Android platform. This dataset encompasses a wide range of features extracted from Android applications, providing valuable insights into their behaviors and functionalities.
Key features of the dataset include:
This dataset provides researchers with a rich source of information to develop and evaluate effective malware detection and analysis techniques, ultimately contributing to the enhancement of mobile security on the Android platform.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Malware-benign Image representation. The Dataset were collected from several malware repositories, including TekDefense, TheZoo, The Malware-Repo, Malware Database amd Malware Bazar. The benign samples were collected from Microsoft 10 and 11 system apps and several open source software repository including CNET, Sourceforge, FileForum, PortableFreeware. The samples were validated by scanning them using Virustotal Malware scanning services. The Samples underwent preprocessing by converting the malware binary into grayscale images following rules from Nataraj (2011). Nataraj Paper: https://vision.ece.ucsb.edu/research/signal-processing-malware-analysis. Maldeb Dataset is collected by Debi Amalia Septiyani and Halimul Hakim Khairul D. A. Septiyani, “Generating Grayscale and RGB Images dataset for windows PE malware using Gist Features extaction method,” Institut Teknologi Bandung, 2022, and Dani Agung Prastiyo, "Design and implementation of a machine learning-based malware classification system with an audio signal feature Analysis Approach," Institut Teknologi Bandung, 2023. The complete dataset can be accessed on this link https://ieee-dataport.org/documents/maldeb-dataset and https://github.com/julismail/Self-Supervised
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS. Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts. We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License. 1. MalwareInfectionSet We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB. 2. VictimAccessSet We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB. 3. AccountAccessSet The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB. Credits Authors Billy Bob Brumley (Tampere University, Tampere, Finland) Juha Nurmi (Tampere University, Tampere, Finland) Mikko Niemelä (Cyber Intelligence House, Singapore) Funding This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS). Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
Facebook
Twitterhttps://www.comodo.com/home/internet-security/updates/vdp/database.phphttps://www.comodo.com/home/internet-security/updates/vdp/database.php
The complete Comodo Internet Security database is available for download...
Facebook
Twitter
According to our latest research, the global ransomware recovery services market size reached USD 22.6 billion in 2024, driven by the exponential rise in sophisticated ransomware attacks targeting enterprises across diverse sectors. The market is growing at a robust CAGR of 17.3% and is forecasted to achieve a value of USD 62.5 billion by 2033. This remarkable growth is primarily attributed to the escalating frequency and severity of ransomware incidents, increasing awareness about cybersecurity resilience, and the growing regulatory emphasis on data protection and business continuity.
One of the primary growth factors fueling the ransomware recovery services market is the surge in cybercrime sophistication and frequency. As threat actors increasingly employ advanced tactics such as double extortion, fileless malware, and targeted attacks on critical infrastructure, organizations are compelled to invest in specialized ransomware recovery services. These services not only facilitate rapid data restoration and system recovery but also minimize downtime and financial losses, ensuring business continuity. The proliferation of remote work and the expansion of digital operations have further widened the attack surface, making organizations more vulnerable and heightening the demand for robust recovery solutions.
Another significant driver is the evolving regulatory landscape, which mandates stringent data protection and incident response protocols. Governments and regulatory bodies worldwide are enacting comprehensive cybersecurity frameworks that require organizations to implement effective recovery strategies as part of their overall risk management approach. Non-compliance with these regulations can result in hefty penalties and reputational damage, prompting organizations to seek expert ransomware recovery services. The increasing adoption of cyber insurance policies, which often stipulate the engagement of professional recovery providers, also contributes to the market’s expansion.
Additionally, the growing adoption of digital transformation initiatives across industries is accelerating the need for ransomware recovery services. As organizations migrate critical workloads to cloud environments and integrate IoT devices, they face new vulnerabilities that can be exploited by ransomware operators. The complexity of hybrid and multi-cloud infrastructures necessitates comprehensive recovery strategies that encompass data recovery, forensic analysis, and incident response. Moreover, the rising awareness among small and medium enterprises (SMEs) about the existential threat posed by ransomware is leading to increased investments in specialized recovery solutions, further propelling market growth.
Regionally, North America dominates the ransomware recovery services market, accounting for the largest revenue share in 2024, followed closely by Europe and Asia Pacific. The high incidence of ransomware attacks on critical sectors such as healthcare, BFSI, and government agencies in these regions has spurred the adoption of advanced recovery solutions. Meanwhile, Asia Pacific is witnessing the fastest growth, driven by rapid digitalization, increasing cyber threats, and expanding regulatory mandates. Latin America and the Middle East & Africa are also experiencing steady growth as organizations in these regions enhance their cybersecurity resilience in response to rising attack volumes.
The ransomware recovery services market is segmented by service type into data recovery, system restoration, incident response, forensic analysis, and others. Data recovery remains the cornerstone of ransomware recovery services, as organizations prioritize the restoration of mission-critical files and databases compromised during an attack. Advanced data recovery services employ a combination of backup management, decryption tools, and data integrity checks to ensure seamless restoration without reinfection. The growin
Facebook
TwitterComprehensive database of website malware threats, vulnerabilities, and security risks detected by Quttera's malware scanner.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 18,850 normal android application packages and 10,000 malware android packages which are used to identify the behaviour of malware application on permission they need at run-time.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS handshakes and certificates, and GeoIP information for 368,956 benign domains from Cisco Umbrella, 461,338 benign domains from the actual CESNET network traffic, 164,425 phishing domains from PhishTank and OpenPhish services, and 100,809 malware domains from various sources like ThreatFox, The Firebog, MISP threat intelligence platform, and other sources. The ground truth for the phishing dataset was double-check with the VirusTotal (VT) service. Domain names not considered malicious by VT have been removed from phishing and malware datasets. Similarly, benign domain names that were considered risky by VT have been removed from the benign datasets. The data was collected between March 2023 and July 2024. The final assessment of the data was conducted in August 2024.
The dataset is useful for cybersecurity research, e.g. statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing and malware website detection.
The data is located in the following individual files:
Both files contain a JSON array of records generated using mongoexport. The following table documents the structure of a record. Please note that:
|
Field name |
Field type |
Nullable |
Description |
|
domain_name |
String |
No |
The evaluated domain name |
|
url |
String |
No |
The source URL for the domain name |
|
evaluated_on |
Date |
No |
Date of last collection attempt |
|
source |
String |
No |
An identifier of the source |
|
sourced_on |
Date |
No |
Date of ingestion of the domain name |
|
dns |
Object |
Yes |
Data from DNS scan |
|
rdap |
Object |
Yes |
Data from RDAP or WHOIS |
|
tls |
Object |
Yes |
Data from TLS handshake |
|
ip_data |
Array of Objects |
Yes |
Array of data objects capturing the IP addresses related to the domain name |
|
DNS data (dns field) | |||
|
A |
Array of Strings |
No |
Array of IPv4 addresses |
|
AAAA |
Array of Strings |
No |
Array of IPv6 addresses |
|
TXT |
Array of Strings |
No |
Array of raw TXT values |
|
CNAME |
Object |
No |
The CNAME target and related IPs |
|
MX |
Array of Objects |
No |
Array of objects with the MX target hostname, priority and related IPs |
|
NS |
Array of Objects |
No |
Array of objects with the NS target hostname and related IPs |
|
SOA |
Object |
No |
All the SOA fields, present if found at the target domain name |
|
zone_SOA |
Object |
No |
The SOA fields of the target’s zone (closest point of delegation), present if found and not a record in the target domain directly |
|
dnssec |
Object |
No |
Flags describing the DNSSEC validation result for each record type |
|
ttls |
Object |
No |
The TTL values for each record type |
|
remarks |
Object |
No |
The zone domain name and DNSSEC flags |
|
RDAP data (rdap field) | |||
|
copyright_notice |
String |
No |
RDAP/WHOIS data usage copyright notice |
|
dnssec |
Bool |
No |
DNSSEC presence flag |
|
entitites |
Object |
No |
An object with various arrays representing the found related entity types (e.g. abuse, admin, registrant). The arrays contain objects describing the individual entities. |
|
expiration_date |
Date |
Yes |
The current date of expiration |
|
handle |
String |
No |
RDAP handle |
|
last_changed_date |
Date |
Yes |
The date when the domain was last changed |
|
name |
String |
No |
The target domain name for which the data in this object are stored |
|
nameservers |
Array of Strings |
No |
Nameserver hostnames provided by RDAP or WHOIS |
|
registration_date |
Date |
Yes |
First registration date |
|
status |
Array of Strings |
Facebook
TwitterAbstract—Global business offerings and services are steadily expanding each year, with more registered businesses and an ever-increasing global population. An increase in population and consumers creates evermore data for companies to store, increasing the incentive for cybercriminals to target these databases. One preferred malware deployed by cyber criminals is Ransomware, locking the data away from the user until payment terms are met. Companies are not integrating sufficient protection protocols and fail-safe measures to successfully protect themselves against Ransomware. The resulting attack negatively affects core business components such as reputation, revenue, and customer safety. This investigation of mitigation methods is crucial for the future of cyber protection and without these recommendations attacks like ransomware will continue to thrive. Globally, there are concerning rates of successful ransomware attacks, each having a varying negative impact on the victims. Furthermore, we have proposed four methods of Network Controls, User Training, Two-Factor Authentication, and Machine Learning. After a thorough investigation of these mitigation methods, it has been found that the most cost-effective for any business size is User Training. Implementing our recommendation will significantly reduce the likelihood of becoming a victim of ransomware.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Real-time database of ransomware victims and active ransomware groups. Includes victim names, industries, countries, attack dates, and group activity trends. Updated twice daily from ransomware leak sites.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are the data set and source code related to the paper: "A Framework for Developing Strategic Cyber Threat Intelligence from Advanced Persistent Threat Analysis Reports Using Graph-Based Algorithms"
1- aptnotes-downloader.zip : contains source code that downloads all APT reports listed in https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections
2- apt-groups.zip : contains all APT group names gathered from https://docs.google.com/spreadsheets/d/1H9_xaxQHpWaa4O_Son4Gx0YOIzlcBWMsdvePFX68EKU/edit?gid=1864660085#gid=1864660085 and https://malpedia.caad.fkie.fraunhofer.de/actors and https://malpedia.caad.fkie.fraunhofer.de/actors
3- apt-reports.zip : contains all deduplicated APT reports gathered from https://github.com/aptnotes/data and https://github.com/CyberMonitor/APT_CyberCriminal_Campagin_Collections
4- countries.zip : contains country name list.
5- ttps.zip : contains all MITRE techniques gathered from https://attack.mitre.org/resources/attack-data-and-tools/
6- malware-families.zip : contains all malware family names gathered from https://malpedia.caad.fkie.fraunhofer.de/families
7- ioc-searcher-app.zip : contains source code that extracts IoCs from APT reports. Extracted IoC files are provided in report-analyser.zip. Original code repo can be found at https://github.com/malicialab/iocsearcher
8- extracted-iocs.zip : contains extracted IoCs by ioc-searcher-app.zip
9- report-analyser.zip : contains source code that searchs APT reports, malware families, countries and TTPs. I case of a match, it updates files in extracted-iocs.zip.
10- cti-transformation-app.zip : contains source code that transforms files in extracted-iocs.zip to CTI triples and saves into Neo4j graph database.
11- graph-db-backup.zip : contains volume folder of Neo4j Docker container. When it is mounted to a Docker container, all CTI database becomes reachable from Neo4j web interface. Here is how to run a Neo4j Docker container that mounts folder in the zip:
docker run -d --publish=7474:7474 --publish=7687:7687 --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/data:/data --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/plugins:/plugins --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/logs:/logs --volume={PATH_TO_VOLUME}/DEVIL_NEO4J_VOLUME/neo4j/conf:/conf --env 'NEO4J_PLUGINS=["apoc","graph-data-science"]' --env NEO4J_apoc_export_file_enabled=true --env NEO4J_apoc_import_file_enabled=true --env NEO4J_apoc_import_file_use_neo4j_config=true --env=NEO4J_AUTH=none neo4j:5.13.0
web interface: http://localhost:7474
username: neo4j
password: neo4j
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by malwareTBugs
Released under Database: Open Database, Contents: Database Contents
Facebook
TwitterObjective: The rapid adoption of health information technology (IT) coupled with growing reports of ransomware, and hacking has made cybersecurity a priority in health care. This study leverages federal data in order to better understand current cybersecurity threats in the context of health IT.
Materials and Methods: Retrospective observational study of all available reported data breaches in the United States from 2013 to 2017, downloaded from a publicly available federal regulatory database.
Results: There were 1512 data breaches affecting 154 415 257 patient records from a heterogeneous distribution of covered entities (P < .001). There were 128 electronic medical record-related breaches of 4 867 920 patient records, while 363 hacking incidents affected 130 702 378 records.
Discussion and Conclusion: Despite making up less than 25% of all breaches, hacking was responsible for nearly 85% of all affected patient records. As medicine becomes increasingly interconnected and inform...
Facebook
TwitterIn 2023, organizations all around the world detected 317.59 million ransomware attempts. Overall, this number decreased significantly between the third and fourth quarters of 2022, going from around 102 million to nearly 155 million cases, respectively. Ransomware attacks usually target organizations that collect large amounts of data and are critically important. In case of an attack, these organizations prefer paying the ransom to restore stolen data rather than to report the attack immediately. The incidents of data loss also damage companies’ reputation, which is one of the reasons why ransomware attacks are not reported. Most targeted industries and regions As a part of critical infrastructure, the manufacturing industry is usually targeted by ransomware attacks. In 2022, manufacturing organizations worldwide saw 437 such attacks. The food and beverage industry ranked second, with over 50 ransomware attacks. By the share of ransomware attacks on critical infrastructure, North America ranked first among other worldwide regions, followed by Europe. Healthcare and public health sector organizations filed the highest number of complaints to the U.S. law enforcement in 2022 about ransomware attacks. Ransomware as a service (RaaS) The Ransomware as a Service (RaaS) business model has existed for over a decade. The model involves hackers and affiliates. Hackers develop ransomware attack models and sell them to affiliates. The latter then use them independently to attack targets. According to the business model, the hacker who created the RaaS receives a service fee per collected ransom. In the first quarter of 2022, there were 31 Ransomware as a Service (RaaS) extortion groups worldwide, compared to the 19 such groups in the same quarter of 2021.
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The Database Security Service market is booming, projected to reach $15 billion in 2025 and grow at a CAGR of 15%. This comprehensive analysis explores market drivers, trends, and key players, providing insights into the evolving landscape of database protection against cyber threats. Learn about the impact of AI, cloud computing, and data privacy regulations on this rapidly expanding sector.
Facebook
Twitterhttps://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
The size of the Data Encryption Market market was valued at USD 14.5 Billion in 2024 and is projected to reach USD 40.98 Billion by 2033, with an expected CAGR of 16% during the forecast period. Recent developments include: On Apr. 11, 2023, Menlo Security, a leading provider of browser security solutions, published the results of the 10th Annual Cyberthreat Defense Report (CDR) by the CyberEdge Group. The report, partially sponsored by Menlo Security, highlights the augmenting importance of browser isolation technologies to combat ransomware and other malicious threats., The research revealed that most ransomware attacks include threats beyond data encryption. According to the report, around 51% of respondents confirmed that they have been using at least one type of browser or Internet isolation to protect their organizational data, while another 40% are about to deploy data encryption technology. Furthermore, around 33% of respondents noted that browser isolation is a key cybersecurity strategy to protect against sophisticated attacks, including ransomware, phishing, and zero-day attacks., On Feb.14, 2023, EnterpriseDB, a relational database provider, announced the addition of Transparent Data Encryption (TDE) based on open-source PostgreSQL to its databases. The new TDE feature will be shipped along with the firm's enterprise version of its database. TDE is a method of encrypting database files to ensure data security while at rest and in motion., Adding that most enterprises use TDE for compliance issues helps ensure data encryption on the hard drive and files on a backup. Before the development of built-in TDE, enterprises relied on either full-disk encryption or stackable cryptographic file system encryption., On Jan.25, 2023, Researchers from the Tokyo University of Science, Japan, announced the development of a faster and cheaper method for handling encrypted data while improving security. The new data encryption method developed by Japanese researchers combines the best of homomorphic encryption and secret sharing to handle encrypted data., Homomorphic encryption and secret sharing are key methods to compute sensitive data while preserving privacy. Homomorphic encryption is computationally intensive and involves performing computational data encryption on a single server, while secret sharing is fast and computationally efficient., In this method, the encrypted data/secret input is divided and distributed across multiple servers, each performing a computation, such as multiplication, on its data. The results of the computations are then used to reconstruct the original data., September 2022: Convergence Technology Solutions Corp., a supplier of software-enabled IT and cloud solutions, declared that it has obtained certification in Canada to sell and deploy IBM zsystems and LinuxONE., November 2019: Penta Security Systems announced that it has been selected as a finalist for the 2020 SC Magazine Awards, which are given by SC Media and celebrated in the United States. As a result, MyDiamo from Penta Security has been named the Best Database Security Solution of 2020. Additionally, this will result in the expansion of common-level encryption and improve the open-source DBMS installation procedure.. Potential restraints include: ISSUE REGARDING SECURITY AND DATA BREACH 44, HIGH IMPLEMENTATION COSTS AND COMPLEXITY 45; ISSUE WITH RESPECT TO DATA CONSISTENCY AND INTEROPERABILITY ACROSS DIFFERENT EDGE PLATFORMS 45.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
WinMET (Windows Malware Execution Traces) Dataset
WinMET dataset contains the reports generated with CAPE sandbox after analyzing several malware samples. The reports are valid JSON files that contain the spawned processes, the sequence of WinAPI and system calls invoked by each process, their parameters, their return values, and OS accesed resources, amongst many others.
Please use this DOI reference that always points to the latest WinMET version: https://doi.org/10.5281/zenodo.12647555
This dataset was generated using the MALVADA framework, which you can read more about in our publication https://doi.org/10.1016/j.softx.2025.102082. The article also provides insights about the contents of this dataset.
Razvan Raducu, Alain Villagrasa-Labrador, Ricardo J. Rodríguez, Pedro Álvarez, MALVADA: A framework for generating datasets of malware execution traces, SoftwareX, Volume 30, 2025, 102082, ISSN 2352-7110, https://doi.org/10.1016/j.softx.2025.102082.(https://www.sciencedirect.com/science/article/pii/S2352711025000494)
How to use the dataset
The 7z file is password protected. The password is: infected.
Compressed size on disk: ~2.5GiB.Decompressed size on disk: ~105GiB.Total decompressed .json files: 9889.
The name of each .json file is irrelevant. It corresponds to its analysis ID.
cape_report_to_label_mapping.json and avclass_report_to_label_mapping.json contain the mappings of each report with its corresponding consensus label, sorted in descendent order (given the number of reports belonging to each label/family).
Integrity checks for WinMET.7z:
MD5: 75b3354fb186ae5a47c320e253bd96ee
SHA256: 00faac011f4938a29ba9afbd9f0b50d89ede342d1d0d6877cb90b46eabd92c72
SHA512: 038ca9303623cadaa72eab680221e81e1d335449d08f6395b39eb99baad4092e02c00955089fba31ce1a9dd04260ae80b622491f754774331bced18e8e3be1c4
Citation
If you use this dataset, cite it as follows:
TBA.
Statistics
The following statistic (and many more) can be obtained by analyzing the WinMET dataset with the MALVADA framework.
Total reports: 9889.
Average VT (VirusTotal) detections: ~53.
There 268 benign or undetected reports. That is, 10 or less VT detections (default threshold).
There are 2584 reports with no CAPE consensus label.
There are 695 reports with no AVClass consensus label.
Top 20 CAPE consensus labels (there are many more):
"(n/a)": 2584
"Redline": 1227
"Agenttesla": 1010
"Crifi": 622
"Amadey": 606
"Smokeloader": 538
"Virlock": 471
"Msilheracles": 408
"Tedy": 364
"Disabler": 343
"Xorstringsnet": 321
"Snake": 252
"Autorun": 252
"Metastealer": 246
"Formbook": 244
"Lokibot": 202
"Strab": 188
"Loki": 185
"Mint": 179
"Taskun": 178
Top 20 AVClass consensus labels (there are many more)
"Reline": 2187
"Disabler": 732
"(n/a)": 695
"Amadey": 575
"Agenttesla": 478
"Taskun": 382
"Virlock": 293
"Equationdrug": 270
"Stop": 268
"Strab": 260
"Noon": 259
"Gamarue": 181
"Dofoil": 135
"Makoob": 113
"Mokes": 110
"Snakelogger": 110
"Bladabindi": 98
"Zard": 84
"Gcleaner": 83
"Deyma": 80
Changelog
Version 2.0: Added cape and avclass label mappings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project provides a comprehensive collection of open-source datasets focused on cybersecurity threats and AI security vulnerabilities. The datasets are carefully selected to align with specific security threats, such as:
Each dataset includes a detailed description, source type, purpose, and direct access links for easy retrieval.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Cyber threat intelligence (CTI) strategies involve gathering several data attributes, building profiles, using intelligent algorithms, and developing optimized threat detection and mitigation techniques. Windows based exe file malware can be detected through its Portable Executable (PE) file header features. Researchers require good datasets to develop efficient Anti-Malware technology. A A new dataset called SOMLAP (Swarm Optimization and Machine Learning Applied to PE Malware Detection) with a value addition to the existing benchmark dataset is developed. The SOMLAP data contains 51,409 samples that include both benign and malware files, with a total of 108 pure PE file header attributes. The data contains 19,809 (38.54%) malware file features gathered from Virus Share and 31,600 (61.46%) benign executables and DLLs were gathered from Windows 10 OS.
For more details please refer our research article: https://doi.org/10.3390/electronics12020342
If you use this data in your work, please cite the paper:
Kattamuri, Santosh Jhansi, Ravi Kiran Varma Penmatsa, Sujata Chakravarty, and Venkata Sai Pavan Madabathula. 2023. "Swarm Optimization and Machine Learning Applied to PE Malware Detection towards Cyber Threat Intelligence" Electronics 12, no. 2: 342. https://doi.org/10.3390/electronics12020342