https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/
Packet headers (upto transport layer, inclusive) for Anonymized Internet Traces 2019 Dataset. Derived from 10G traces on Equinix NYC monitor.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.
Activities:
Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.
The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.
The amount of data is stated as follows:
Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes
The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.
https://www.caida.org/about/legal/aua/public_aua/https://www.caida.org/about/legal/aua/public_aua/
Meta data for all passive monthly traces, incl. chicago and sanjose monitors. This includes the files used to generate the public trace stats.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Streaming is by far the predominant type of traffic in communication networks. With thispublic dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3G/4G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents network traffic traces data of the 14 D-Link IoT devices from different types including camera, network camera, smart-plug, door-window sensor, and home-hub. It consists of:
• Network packet traces (inbound and outbound traffic) and
• IEEE 802.11 MAC frame traces.
The experimental testbed was set-up in the Network Systems and Signal Processing (NSSP) laboratory at Universiti Brunei Darussalam (UBD) to collect all the network traffic traces from 9th September 2020 to 10th January 2021 including an access point on a laptop. The network traffic traces were captured passively observing the Ethernet interface and the WiFi interface at the access point.
In packet traces, typical communication protocols, such as TCP, UDP, IP, ICMP, ARP, DNS, SSDP, TLS/SSL etc, data are captured which IoT devices use for communication on the Internet. In the probe request frame (a subtype of management frames) traces, data are recorded which IoT devices use to connect access point on the local area network.
The authors would like to thank the Faculty of Integrated Technologies, Universiti Brunei Darussalam, for the support to conduct this research experiment in the Network Systems and Signal Processing laboratory.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
out of which 20 used plaintext HTTP browsing
https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/
This dataset contains anonymized layer 1-4 packet headers of two-way passive traces captured on a 100 GB link between Los Angeles and San Jose. These data are useful for research on the characteristics of Internet traffic, including application breakdown, security events, geographic and topological distribution, flow volume and duration.
Passive 100G sampler is offered to researchers at commercial organizations when they request Anonymized Internet Traces. These data are part of the 2024 Anonymized Traces 100G dataset. The files consist of 5 second snapshots of a bidirectional capture taken in November 2024.
The traffic data is collected at a backbone link of a Tier1 ISP, aiming to estimate the number of packets for each network flow identified by IP addresses and application ports.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
YouTube flows
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains all data used during the evaluation of statistical characteristics preservation. Archives are protected by password "trace-share" to avoid false detection by antivirus software.
For more information, see the project repository at https://github.com/Trace-Share.
Selected Attack Traces
We selected 72 different traces of network attacks obtained from various internet databases. File names refer to common names of contained vulnerabilities, malware, or attack tools.
Background Traffic Data
Publicly available dataset CSE-CIC-IDS-2018 was used as a background traffic data. The evaluation uses data from the day Thursday-01-03-2018 containing a sufficient proportion of regular traffic without any statistically significant attacks. Only traffic aimed at victim machines (range 172.31.69.0/24) is used to reduce less significant traffic.
Evaluation Results and Dataset Structure
Traces variants (traces-normalized.zip, traces-adjusted.zip)
./traces-normalized/ — normalized PCAP files and details in YAML format;
./traces-adjusted/ — configuration files for traces combination in YAML format.
Computed statistics (statistics.zip)
./statistics-background/ — background traffic statistics computed by ID2T;
./statistics-combination/ — combined traces statistics computed by ID2T for all adjust options (selected only combinations where ID2T provided all statistics files);
./statistics-difference/ — computed mean and median differences of background and combined traffic traces.
Evaluation results
statistics-difference.ipynb — file containing visualization of statistics differences.
https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/
The UCSD Network Telescope consists of a globally routed, but lightly utilized /9 and /10 network prefix, that is, 1/256th of the whole IPv4 address space. It contains few legitimate hosts; inbound traffic to non-existent machines - so called Internet Background Radiation (IBR) - is unsolicited and results from a wide range of events, including misconfiguration (e.g. mistyping an IP address), scanning of address space by attackers or malware looking for vulnerable targets, backscatter from randomly spoofed denial-of-service attacks, and the automated spread of malware. CAIDA continously captures this anomalous traffic discarding the legitimate traffic packets destined to the few reachable IP addresses in this prefix. We archive and aggregate these data, and provide this valuable resource to network security researchers. This dataset represents raw traffic traces captured by the Telescope instrumentation and made available in near-real time as one-hour long compressed pcap files. We collect more than 3 TB of uncompressed IBR traffic traces data per day. The most recent 14 days of data are stored locally at CAIDA. Once data slides out of this near-real-time window, the pcap files are off-loaded to a tape storage. This historical Telescope data starting from 2008 are available by additional request.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains cloud network performance data related to the Amazon S3 storage service. The dataset refers to experimental campaigns conducted in May 2016. The dataset was collected leveraging 77 Bismark VPs, instructed as detailed in the following. Each VP performed repeated download cycles over 7 days. Each cycle is composed of 40 sequential download requests spaced out by 10 seconds and uniquely identified by a combination of factors, i.e. cloud region, file size, and storage class. Downloads within cycles are randomly scheduled and repeated from each VP every 2 hours. After every download, VPs run TCP-traceroute towards the IP address that served the request in order to trace the information related to the path and estimate the RTT to the S3 cloud datacenter (note that this information is not always available, due to the version of the firmware of the Bismark nodes and to the measurement tools available on them).
When refering to our Traffic Traces, please cite the following reference: Valerio Persico, Antonio Montieri, Antonio Pescapè: On the Network Performance of Amazon S3 Cloud-Storage Service. CloudNet 2016: 113-118
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time Series for Anomaly Detection
The file is a Matlab data file. It contains 3 time series, representing the packet rate of 3 different traffic traces, related to inbound traffic of the UNINA Network. The traces were collected in year 2004. The packet rate was sampled with a period of 2 seconds and each trace lasts 2 hours. These data have been used for studies on volume-based anomaly detection and are related to time intervals during which no anomalies were observed on the UNINA network by the NOC operators. In other words, they can be considered anomaly-free.
When refering to our Anomaly Detection Dataset, please cite the following reference:
A. Dainotti, A. Pescapè, G. Ventre, "A cascade architecture for DoS attacks detection based on the wavelet transform", Journal of Computer Security, Volume 17, Number 6/2009, Pages 945-968.
This dataset was created by DNS_dataset
Description
This dataset consists of 18 days' worth of HTTP traces gathered from the Home IP service offered by UC Berkeley to its students, faculty, and staff Home IP provides dial-up PPP/SLIP IP connectivity using 2.4 kb/s, 9.6 kb/s, 14.4 kb/s, or 28.8 kb/s wireline modems, or Metricom Ricochet (approximately 20-30 kb/s) wireless modems. These client traces were unobtrusively gathered through the use of a packet sniffing machine placed at the head-end of the Home IP modem bank; the tracing program used was a custom module written on top of the Internet Protocol Scanning Engine (IPSE) created by Ian Goldberg. Only traffic destined for port 80 was traced; all non-HTTP protocols and HTTP connections for other ports were excluded from these traces.
The traces contain the following information:
no-cache
, keep-alive
, cache-control
, if-modified-since
, and unless
client headers.no-cache
, cache-control
, expires
, and last-modified
server headers.if-modified-since
, the server expires
, and the server last-modified
headers, if present.Format
For the sake of storage efficiency, the (gzipped) traces are stored in a binary representation. This archive of tools includes the following code to parse and manipulate the archives:
gzcat
showtrace.c
to see how you can use logparse.[ch]
to write code that parses and manipulates the traces. All times displayed are as reported by the gettimeofday()
system call.
The showtrace tool will display lines in the following format:
848278028:829593 848278028:893670 848278028:895350 23.240.8.98:1462 207.36.205.194:80 2 8 4294967295 4294967295 835418853 170 844 37 GET 9168504434183313441..gif HTTP/1.0
The interpretation of the client and server header bitfields are as defined in the logparse.h header in the tools code.
The tools code has been tested on both Linux and Solaris. The provided Makefile assumes Solaris - you may have to play with the LIBS definition for other platforms. HPUX is a mess; I didn't even try, but it should be possible to get these tools to work with little effort. If you do, please let me know what you did so that I can make your changes available to the world.
Measurement
The Home IP population gains IP connectivity using PPP or SLIP across their 2.4 kb/s, 9.6 kb/s, 14.4kb/s or 28.8kb/s wireline modem, or their (approximately) 20-30kb/s wireless Metricom Ricochet modem. There are a total of roughly 600 modems available via the Home IP bank. All traffic from these modems ends up feeding over a single 10Mb/s shared Ethernet segment, on which we placed a network monitoring computer (a Pentium Pro 200Mhz running Linux 2.0.27). The monitor was running the IPSE user-level packet scanning engine and a custom-written HTTP module that reconstructed HTTP connections from the gathered IP packets on-the-fly and emitted an unanonymized trace file. Each trace file was then anonymized and transmitted to our research workstations for further postprocessing and analysis.
The trace gathering engine was brought down and restarted approximately every 4 hours (for administrative and address-space-growth reasons). This implies that there are two weaknesses in these traces that you should be aware of:
The packet capture tool reported no packet drops. Considering that a Pentium Pro 200MHz was used to capture the traces on a 10 Mb/s Ethernet segment, it is virtually certain that no trace drops besides those mentioned above occurred. There may be periods of uncharacteristically low activity in the traces - these correspond to network outages from Berkeley's ISP, rather than trace failures.
The traces do contain entries for requests issued by the client but that weren't completed (because, for instance, the user pressed the STOP button and the TCP connection was shut down before the request completed). Unknown timestamps in the traces contain the value 0xFFFFFFFF (reported by showtrace as 4294967295), and incomplete requests contain header and data length values that report as much header/data was seen.
The trace data is sorted by completion time (i.e. the time at which the last bye of the server response was seen, or the time at which the connection was dropped). However, because of inaccuracies and apparent time travel in the Linux system clock, some trace entries appear slightly out of order.
All timestamps within the traces are as reported by the gettimeofday() system call, so these timestamps ostensibly have microsecond resolution.
Privacy
To maintain the privacy of each individual Home IP user, we have stripped identity information out of the traces through a post-processing phase. Because it is very trivial to identify a user based solely on the pages that the user has visited, we were forced to anonymize the URL and destination IP address of each web request as well as the source IP address. All anonymization was done using a keyed MD5 hash of the data (32 bits for client and server IP addresses, 64 bits for URLs). We ourselves do not know the key used to salt the MD5 hash, so don't bother asking us for it. Similarly, don't bother asking us for unanonymized traces.
In order to preserve some information about the URLs, the post-processed URLs have the following format:
COMMAND URLHASH.[flags][.suffix] [HTTPVERS]
where:
COMMAND
is one of GET
, HEAD
, POST
, or PUT
,
<p> </p>
</li>
<li><strong><code>URLHASH</code></strong> is the string representation of the 64-bit MD5 hash of the URL,
<p> </p>
</li>
<li><strong><code>flags</code></strong> contains the character <strong>q</strong> to indicate that a question mark was seen in the URL, and the character <strong>c</strong> to indicate that the string <strong>CGI</strong> or <strong>cgi</strong> was seen in the URL,
<p> </p>
</li>
<li><strong><code>suffix</code></strong> is the filename suffix, if present, and
<p> </p>
</li>
<li><strong><code>HTTPVERS</code></strong> is the HTTP version field of the HTTP command issued by the client,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All algorithm throughput values (in Gbps) using Defcon and web traffic traces.
This backscatter from victims was collected by the UCSD Network Telescope.
Quarterly data collection took place for one week in May, August and
November in 2004, and February, May, August and November in 2005. Possible
uses of this data include modeling DoS attacks, understanding victim
populations, and using real packet traces to validate algorithms for
detecting or classifying malicious traffic. This last use is particularly
valuable because it is extremely challenging to artificially generate the
kind of real-world noise present on the Internet.
This dataset contains two full days of trace data from the UCSD Network Telescope:
2008-11-12 and 2008-11-19. These dates precede our detection of the Conficker A Worm
on 2008-11-21. The dataset consists of 48 compressed pcap files each containing one
hour of traffic observed by the Network Telescope.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/
Packet headers (upto transport layer, inclusive) for Anonymized Internet Traces 2019 Dataset. Derived from 10G traces on Equinix NYC monitor.