5 datasets found

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large...

zenodo.org
explore.openaire.eu

json

Updated Dec 10, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Radek Hranický; Radek Hranický; Adam Horák; Ondřej Ondryáš; Ondřej Ondryáš; Adam Horák (2024). A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large Corpus of Benign, Phishing, and Malware Domain Names 2024 [Dataset]. http://doi.org/10.5281/zenodo.13330074

Explore at:

jsonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13330074

Dataset updated

Dec 10, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Radek Hranický; Radek Hranický; Adam Horák; Ondřej Ondryáš; Ondřej Ondryáš; Adam Horák

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Aug 16, 2024

Description

The dataset contains DNS records, IP-related features, WHOIS/RDAP information, information from TLS handshakes and certificates, and GeoIP information for 368,956 benign domains from Cisco Umbrella, 461,338 benign domains from the actual CESNET network traffic, 164,425 phishing domains from PhishTank and OpenPhish services, and 100,809 malware domains from various sources like ThreatFox, The Firebog, MISP threat intelligence platform, and other sources. The ground truth for the phishing dataset was double-check with the VirusTotal (VT) service. Domain names not considered malicious by VT have been removed from phishing and malware datasets. Similarly, benign domain names that were considered risky by VT have been removed from the benign datasets. The data was collected between March 2023 and July 2024. The final assessment of the data was conducted in August 2024.

The dataset is useful for cybersecurity research, e.g. statistical analysis of domain data or feature extraction for training machine learning-based classifiers, e.g. for phishing and malware website detection.

Data Files

The data is located in the following individual files:
- benign_umbrella.json - data for 368,956 benign domains from Cisco Umbrella,
- benign_cesnet.json - data for 461,338 benign domains from the CESNET network,
- phishing.json - data for 164,425 phishing domains, and
- malware.json - data for 100,809 malware domains.

Data Structure

Both files contain a JSON array of records generated using mongoexport. The following table documents the structure of a record. Please note that:

some fields may be missing (they should be interpreted as nulls),
extra fields may be present (they should be ignored).

Field name	Field type	Nullable	Description
domain_name	String	No	The evaluated domain name
url	String	No	The source URL for the domain name
evaluated_on	Date	No	Date of last collection attempt
source	String	No	An identifier of the source
sourced_on	Date	No	Date of ingestion of the domain name
dns	Object	Yes	Data from DNS scan
rdap	Object	Yes	Data from RDAP or WHOIS
tls	Object	Yes	Data from TLS handshake
ip_data	Array of Objects	Yes	Array of data objects capturing the IP addresses related to the domain name
DNS data (dns field)
A	Array of Strings	No	Array of IPv4 addresses
AAAA	Array of Strings	No	Array of IPv6 addresses
TXT	Array of Strings	No	Array of raw TXT values
CNAME	Object	No	The CNAME target and related IPs
MX	Array of Objects	No	Array of objects with the MX target hostname, priority and related IPs
NS	Array of Objects	No	Array of objects with the NS target hostname and related IPs
SOA	Object	No	All the SOA fields, present if found at the target domain name
zone_SOA	Object	No	The SOA fields of the target’s zone (closest point of delegation), present if found and not a record in the target domain directly
dnssec	Object	No	Flags describing the DNSSEC validation result for each record type
ttls	Object	No	The TTL values for each record type
remarks	Object	No	The zone domain name and DNSSEC flags
RDAP data (rdap field)
copyright_notice	String	No	RDAP/WHOIS data usage copyright notice
dnssec	Bool	No	DNSSEC presence flag
entitites	Object	No	An object with various arrays representing the found related entity types (e.g. abuse, admin, registrant). The arrays contain objects describing the individual entities.
expiration_date	Date	Yes	The current date of expiration
handle	String	No	RDAP handle
last_changed_date	Date	Yes	The date when the domain was last changed
name	String	No	The target domain name for which the data in this object are stored
nameservers	Array of Strings	No	Nameserver hostnames provided by RDAP or WHOIS
registration_date	Date	Yes	First registration date
status	Array of Strings

i
ISCX-URL-2016
ieee-dataport.org
Updated Dec 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sajidha S A (2023). ISCX-URL-2016 [Dataset]. https://ieee-dataport.org/documents/iscx-url-2016
Explore at:
Dataset updated
Dec 22, 2023
Authors
Sajidha S A
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Web has long become a major platform for online criminal activities. URLs are used as the main vehicle in this domain. To counter this issues security community focused its efforts on developing techniques for mostly blacklisting of malicious URLs.
H
Hour-Long Wget Attack Dataset (Base Graph)
dataverse.harvard.edu
txt
Updated Oct 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2018). Hour-Long Wget Attack Dataset (Base Graph) [Dataset]. http://doi.org/10.7910/DVN/IWFWSP
Explore at:
txt(11056052), txt(15099973), txt(13396077), txt(10268294), txt(10661818)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/IWFWSP
Dataset updated
Oct 1, 2018
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Wget provenance data in edge-list format parsed from CamFlow provenance data. This dataset contains attack wget base graph data. Experiments were run for over an hour, with recurrent wget commands issued throughout the experiments (one for every 120 seconds). Background activities were also captured as CamFlow whole-system provenance was turned on. Several malicious URL were run during each experimental session. 5 attack experiments were recorded with different normal benign wget operations mixture. Provenance data was in JSON format and converted into edge-list format for the Unicorn IDS research project. Conversation time was Sept. 26th, 2018. Each experiment consists of a base and a streaming graph component.
m
CIC-DDoS2019 Dataset
data.mendeley.com
Updated Mar 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Alamin Talukder (2023). CIC-DDoS2019 Dataset [Dataset]. http://doi.org/10.17632/ssnc74xm6r.1
Explore at:
Unique identifier
https://doi.org/10.17632/ssnc74xm6r.1
Dataset updated
Mar 3, 2023
Authors
Md Alamin Talukder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Distributed Denial of Service (DDoS) attack is a menace to network security that aims at exhausting the target networks with malicious traffic. Although many statistical methods have been designed for DDoS attack detection, designing a real-time detector with low computational overhead is still one of the main concerns. On the other hand, the evaluation of new detection algorithms and techniques heavily relies on the existence of well-designed datasets. In this paper, first, we review the existing datasets comprehensively and propose a new taxonomy for DDoS attacks. Secondly, we generate a new dataset, namely CICDDoS2019, which remedies all current shortcomings. Thirdly, using the generated dataset, we propose a new detection and family classification approach based on a set of network flow features. Finally, we provide the most important feature sets to detect different types of DDoS attacks with their corresponding weights.

The dataset offers an extended set of Distributed Denial of Service attacks, most of which employ some form of amplification through reflection. The dataset shares its feature set with the other CIC NIDS datasets, IDS2017, IDS2018 and DoS2017

original paper link: https://ieeexplore.ieee.org/abstract/document/8888419 kaggle dataset link: https://www.kaggle.com/datasets/dhoogla/cicddos2019
Complete Antivirus Database
comodo.com
cav
Updated Dec 8, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Comodo (2010). Complete Antivirus Database [Dataset]. https://www.comodo.com/home/internet-security/updates/vdp/database.php
Explore at:
cavAvailable download formats
Dataset updated
Dec 8, 2015
Dataset provided by
Comodo Grouphttp://www.comodo.com/
Authors
Comodo
License
https://www.comodo.com/home/internet-security/updates/vdp/database.phphttps://www.comodo.com/home/internet-security/updates/vdp/database.php
Description
The complete Comodo Internet Security database is available for download...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large Corpus of Benign, Phishing, and Malware Domain Names 2024

Explore at:

jsonAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.13330074

Dataset updated

Dec 10, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Radek Hranický; Radek Hranický; Adam Horák; Ondřej Ondryáš; Ondřej Ondryáš; Adam Horák

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Aug 16, 2024

Description

Data Files

The data is located in the following individual files:
- benign_umbrella.json - data for 368,956 benign domains from Cisco Umbrella,
- benign_cesnet.json - data for 461,338 benign domains from the CESNET network,
- phishing.json - data for 164,425 phishing domains, and
- malware.json - data for 100,809 malware domains.

Data Structure

Both files contain a JSON array of records generated using mongoexport. The following table documents the structure of a record. Please note that:

some fields may be missing (they should be interpreted as nulls),
extra fields may be present (they should be ignored).

Field name	Field type	Nullable	Description
domain_name	String	No	The evaluated domain name
url	String	No	The source URL for the domain name
evaluated_on	Date	No	Date of last collection attempt
source	String	No	An identifier of the source
sourced_on	Date	No	Date of ingestion of the domain name
dns	Object	Yes	Data from DNS scan
rdap	Object	Yes	Data from RDAP or WHOIS
tls	Object	Yes	Data from TLS handshake
ip_data	Array of Objects	Yes	Array of data objects capturing the IP addresses related to the domain name
DNS data (dns field)
A	Array of Strings	No	Array of IPv4 addresses
AAAA	Array of Strings	No	Array of IPv6 addresses
TXT	Array of Strings	No	Array of raw TXT values
CNAME	Object	No	The CNAME target and related IPs
MX	Array of Objects	No	Array of objects with the MX target hostname, priority and related IPs
NS	Array of Objects	No	Array of objects with the NS target hostname and related IPs
SOA	Object	No	All the SOA fields, present if found at the target domain name
zone_SOA	Object	No	The SOA fields of the target’s zone (closest point of delegation), present if found and not a record in the target domain directly
dnssec	Object	No	Flags describing the DNSSEC validation result for each record type
ttls	Object	No	The TTL values for each record type
remarks	Object	No	The zone domain name and DNSSEC flags
RDAP data (rdap field)
copyright_notice	String	No	RDAP/WHOIS data usage copyright notice
dnssec	Bool	No	DNSSEC presence flag
entitites	Object	No	An object with various arrays representing the found related entity types (e.g. abuse, admin, registrant). The arrays contain objects describing the individual entities.
expiration_date	Date	Yes	The current date of expiration
handle	String	No	RDAP handle
last_changed_date	Date	Yes	The date when the domain was last changed
name	String	No	The target domain name for which the data in this object are stored
nameservers	Array of Strings	No	Nameserver hostnames provided by RDAP or WHOIS
registration_date	Date	Yes	First registration date
status	Array of Strings

Clear search

Close search

Google apps

Main menu

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large...

Data Files

Data Structure

ISCX-URL-2016

Hour-Long Wget Attack Dataset (Base Graph)

CIC-DDoS2019 Dataset

Complete Antivirus Database

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large Corpus of Benign, Phishing, and Malware Domain Names 2024See More Versions

Data Files

Data Structure

A Dataset of Information (DNS, IP, WHOIS/RDAP, TLS, GeoIP) for a Large Corpus of Benign, Phishing, and Malware Domain Names 2024