19 datasets found
  1. c

    Anonymized Internet Traces 2019

    • catalog.caida.org
    Updated Jan 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CAIDA (2019). Anonymized Internet Traces 2019 [Dataset]. https://catalog.caida.org/dataset/passive_2019_pcap
    Explore at:
    Dataset updated
    Jan 15, 2019
    Dataset authored and provided by
    CAIDA
    License

    https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/

    Time period covered
    Jan 2019
    Description

    Packet headers (upto transport layer, inclusive) for Anonymized Internet Traces 2019 Dataset. Derived from 10G traces on Equinix NYC monitor.

  2. m

    Network traffic and code for machine learning classification

    • data.mendeley.com
    Updated Feb 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Labayen (2020). Network traffic and code for machine learning classification [Dataset]. http://doi.org/10.17632/5pmnkshffm.2
    Explore at:
    Dataset updated
    Feb 20, 2020
    Authors
    Víctor Labayen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.

    Activities:

    Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.

    The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.

    The amount of data is stated as follows:

    Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes

    The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.

  3. c

    Passive Metadata

    • catalog.caida.org
    Updated Feb 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CAIDA (2021). Passive Metadata [Dataset]. https://catalog.caida.org/dataset/passive_metadata
    Explore at:
    Dataset updated
    Feb 25, 2021
    Dataset authored and provided by
    CAIDA
    License

    https://www.caida.org/about/legal/aua/public_aua/https://www.caida.org/about/legal/aua/public_aua/

    Time period covered
    Mar 2008 - Jan 2019
    Description

    Meta data for all passive monthly traces, incl. chicago and sanjose monitors. This includes the files used to generate the public trace stats.

  4. f

    YouTube Dataset on Mobile Streaming for Internet Traffic Modeling, Network...

    • figshare.com
    txt
    Updated Apr 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Loh; Florian Wamser; Fabian Poignée; Stefan Geißler; Tobias Hoßfeld (2022). YouTube Dataset on Mobile Streaming for Internet Traffic Modeling, Network Management, and Streaming Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.19096823.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 14, 2022
    Dataset provided by
    figshare
    Authors
    Frank Loh; Florian Wamser; Fabian Poignée; Stefan Geißler; Tobias Hoßfeld
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Streaming is by far the predominant type of traffic in communication networks. With thispublic dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3G/4G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.

  5. m

    Data from: Packet-level and IEEE 802.11 MAC frame-level Network Traffic...

    • data.mendeley.com
    • narcis.nl
    Updated Jan 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajarshi Roy Chowdhury (2021). Packet-level and IEEE 802.11 MAC frame-level Network Traffic Traces Data of the D-Link IoT devices [Dataset]. http://doi.org/10.17632/84cc8grtkt.1
    Explore at:
    Dataset updated
    Jan 14, 2021
    Authors
    Rajarshi Roy Chowdhury
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset presents network traffic traces data of the 14 D-Link IoT devices from different types including camera, network camera, smart-plug, door-window sensor, and home-hub. It consists of:

    • Network packet traces (inbound and outbound traffic) and
    • IEEE 802.11 MAC frame traces.
    

    The experimental testbed was set-up in the Network Systems and Signal Processing (NSSP) laboratory at Universiti Brunei Darussalam (UBD) to collect all the network traffic traces from 9th September 2020 to 10th January 2021 including an access point on a laptop. The network traffic traces were captured passively observing the Ethernet interface and the WiFi interface at the access point.

    In packet traces, typical communication protocols, such as TCP, UDP, IP, ICMP, ARP, DNS, SSDP, TLS/SSL etc, data are captured which IoT devices use for communication on the Internet. In the probe request frame (a subtype of management frames) traces, data are recorded which IoT devices use to connect access point on the local area network.

    The authors would like to thank the Faculty of Integrated Technologies, Universiti Brunei Darussalam, for the support to conduct this research experiment in the Network Systems and Signal Processing laboratory.

  6. i

    Data from: In-browser and network traffic based web response time...

    • ieee-dataport.org
    Updated May 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Lopez (2022). In-browser and network traffic based web response time measurements [Dataset]. https://ieee-dataport.org/open-access/browser-and-network-traffic-based-web-response-time-measurements
    Explore at:
    Dataset updated
    May 18, 2022
    Authors
    Carlos Lopez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    out of which 20 used plaintext HTTP browsing

  7. c

    Anonymized Two-Way Traffic Packet Header Traces 100G (5 sec) sampler

    • catalog.caida.org
    Updated Feb 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CAIDA (2025). Anonymized Two-Way Traffic Packet Header Traces 100G (5 sec) sampler [Dataset]. https://catalog.caida.org/dataset/passive_100g_sampler/cite
    Explore at:
    Dataset updated
    Feb 16, 2025
    Dataset authored and provided by
    CAIDA
    License

    https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/

    Time period covered
    Nov 2024
    Description

    This dataset contains anonymized layer 1-4 packet headers of two-way passive traces captured on a 100 GB link between Los Angeles and San Jose. These data are useful for research on the characteristics of Internet traffic, including application breakdown, security events, geographic and topological distribution, flow volume and duration.

    Passive 100G sampler is offered to researchers at commercial organizations when they request Anonymized Internet Traces. These data are part of the 2024 Anonymized Traces 100G dataset. The files consist of 5 second snapshots of a bidirectional capture taken in November 2024.

  8. t

    CAIDA Internet Traces 2016 Chicago - Dataset - LDM

    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). CAIDA Internet Traces 2016 Chicago - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/caida-internet-traces-2016-chicago
    Explore at:
    Dataset updated
    Nov 25, 2024
    Area covered
    Chicago
    Description

    The traffic data is collected at a backbone link of a Tier1 ISP, aiming to estimate the number of packets for each network flow identified by IP addresses and application ports.

  9. GTT23: A 2023 Dataset of Genuine Tor Traces

    • zenodo.org
    Updated Jul 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rob Jansen; Rob Jansen; Ryan Wails; Ryan Wails; Aaron Johnson; Aaron Johnson (2024). GTT23: A 2023 Dataset of Genuine Tor Traces [Dataset]. http://doi.org/10.5281/zenodo.10869889
    Explore at:
    Dataset updated
    Jul 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rob Jansen; Rob Jansen; Ryan Wails; Ryan Wails; Aaron Johnson; Aaron Johnson
    Time period covered
    2023
    Description
    The GTT23 dataset contains network metadata of encrypted traffic measured from exit relays in the Tor network over a 13-week measurement period in 2023. The metadata is suitable for analyzing and evaluating website fingerprinting attacks and defenses.
    Our dataset measurement process was designed to prioritize safety and privacy and was developed through consultation with the Tor Research Safety Board (TRSB, submission #37). Our TRSB interaction resulted in a “No Objections” score.
    The measurement process, additional safety and ethical considerations, and a statistical analysis of the dataset is presented in further detail in the article "A Measurement of Genuine Tor Traces for Realistic Website Fingerprinting", arXiv:2404.07892 [cs.CR], https://doi.org/10.48550/arXiv.2404.07892.
  10. i

    Netflix

    • ieee-dataport.org
    Updated Oct 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danil Shamsimukhametov (2021). Netflix [Dataset]. https://ieee-dataport.org/documents/youtube-netflix-web-dataset-encrypted-traffic-classification
    Explore at:
    Dataset updated
    Oct 1, 2021
    Authors
    Danil Shamsimukhametov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    YouTube flows

  11. Z

    Trace-Share Dataset for Evaluation of Statistical Characteristics...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madeja, Tomas (2020). Trace-Share Dataset for Evaluation of Statistical Characteristics Preservation [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3553062
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Cermak, Milan
    Madeja, Tomas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains all data used during the evaluation of statistical characteristics preservation. Archives are protected by password "trace-share" to avoid false detection by antivirus software.

    For more information, see the project repository at https://github.com/Trace-Share.

    Selected Attack Traces

    We selected 72 different traces of network attacks obtained from various internet databases. File names refer to common names of contained vulnerabilities, malware, or attack tools.

    Background Traffic Data

    Publicly available dataset CSE-CIC-IDS-2018 was used as a background traffic data. The evaluation uses data from the day Thursday-01-03-2018 containing a sufficient proportion of regular traffic without any statistically significant attacks. Only traffic aimed at victim machines (range 172.31.69.0/24) is used to reduce less significant traffic.

    Evaluation Results and Dataset Structure

    Traces variants (traces-normalized.zip, traces-adjusted.zip)

    ./traces-normalized/ — normalized PCAP files and details in YAML format;

    ./traces-adjusted/ — configuration files for traces combination in YAML format.

    Computed statistics (statistics.zip)

    ./statistics-background/ — background traffic statistics computed by ID2T;

    ./statistics-combination/ — combined traces statistics computed by ID2T for all adjust options (selected only combinations where ID2T provided all statistics files);

    ./statistics-difference/ — computed mean and median differences of background and combined traffic traces.

    Evaluation results

    statistics-difference.ipynb — file containing visualization of statistics differences.

  12. c

    UCSD Real-time Network Telescope

    • catalog.caida.org
    Updated May 17, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CAIDA (2018). UCSD Real-time Network Telescope [Dataset]. https://catalog.caida.org/dataset/telescope_live
    Explore at:
    Dataset updated
    May 17, 2018
    Dataset authored and provided by
    CAIDA
    License

    https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/

    Description

    The UCSD Network Telescope consists of a globally routed, but lightly utilized /9 and /10 network prefix, that is, 1/256th of the whole IPv4 address space. It contains few legitimate hosts; inbound traffic to non-existent machines - so called Internet Background Radiation (IBR) - is unsolicited and results from a wide range of events, including misconfiguration (e.g. mistyping an IP address), scanning of address space by attackers or malware looking for vulnerable targets, backscatter from randomly spoofed denial-of-service attacks, and the automated spread of malware. CAIDA continously captures this anomalous traffic discarding the legitimate traffic packets destined to the few reachable IP addresses in this prefix. We archive and aggregate these data, and provide this valuable resource to network security researchers. This dataset represents raw traffic traces captured by the Telescope instrumentation and made available in near-real time as one-hour long compressed pcap files. We collect more than 3 TB of uncompressed IBR traffic traces data per day. The most recent 14 days of data are stored locally at CAIDA. Once data slides out of this near-real-time window, the pcap files are off-loaded to a tape storage. This historical Telescope data starting from 2008 are available by additional request.

  13. m

    Amazon S3 cloud storage service data set

    • data.mendeley.com
    • narcis.nl
    Updated Jan 21, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Pescape' (2017). Amazon S3 cloud storage service data set [Dataset]. http://doi.org/10.17632/99kv5x8xhr.1
    Explore at:
    Dataset updated
    Jan 21, 2017
    Authors
    Antonio Pescape'
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains cloud network performance data related to the Amazon S3 storage service. The dataset refers to experimental campaigns conducted in May 2016. The dataset was collected leveraging 77 Bismark VPs, instructed as detailed in the following. Each VP performed repeated download cycles over 7 days. Each cycle is composed of 40 sequential download requests spaced out by 10 seconds and uniquely identified by a combination of factors, i.e. cloud region, file size, and storage class. Downloads within cycles are randomly scheduled and repeated from each VP every 2 hours. After every download, VPs run TCP-traceroute towards the IP address that served the request in order to trace the information related to the path and estimate the RTT to the S3 cloud datacenter (note that this information is not always available, due to the version of the firmware of the Bismark nodes and to the measurement tools available on them).

    When refering to our Traffic Traces, please cite the following reference: Valerio Persico, Antonio Montieri, Antonio Pescapè: On the Network Performance of Amazon S3 Cloud-Storage Service. CloudNet 2016: 113-118

  14. m

    Anomaly Detection

    • data.mendeley.com
    • narcis.nl
    Updated Jan 19, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Pescape' (2017). Anomaly Detection [Dataset]. http://doi.org/10.17632/dkg3b6vz65.1
    Explore at:
    Dataset updated
    Jan 19, 2017
    Authors
    Antonio Pescape'
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Time Series for Anomaly Detection

    The file is a Matlab data file. It contains 3 time series, representing the packet rate of 3 different traffic traces, related to inbound traffic of the UNINA Network. The traces were collected in year 2004. The packet rate was sampled with a period of 2 seconds and each trace lasts 2 hours. These data have been used for studies on volume-based anomaly detection and are related to time intervals during which no anomalies were observed on the UNINA network by the NOC operators. In other words, they can be considered anomaly-free.

    When refering to our Anomaly Detection Dataset, please cite the following reference:

    A. Dainotti, A. Pescapè, G. Ventre, "A cascade architecture for DoS attacks detection based on the wavelet transform", Journal of Computer Security, Volume 17, Number 6/2009, Pages 945-968.

  15. Traces captured by visiting the top 1500 website

    • kaggle.com
    Updated Aug 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DNS_dataset (2021). Traces captured by visiting the top 1500 website [Dataset]. https://www.kaggle.com/datasets/jacksontang16/traces-captured-by-visiting-the-top-1500-website
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DNS_dataset
    Description

    Dataset

    This dataset was created by DNS_dataset

    Contents

  16. UC Berkeley Home IP Web Traces

    • zenodo.org
    application/gzip, bin
    Updated Sep 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven D.Gribble; Steven D.Gribble (2020). UC Berkeley Home IP Web Traces [Dataset]. http://doi.org/10.5281/zenodo.4020425
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    Sep 9, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Steven D.Gribble; Steven D.Gribble
    Area covered
    Berkeley
    Description

    Description

    This dataset consists of 18 days' worth of HTTP traces gathered from the Home IP service offered by UC Berkeley to its students, faculty, and staff Home IP provides dial-up PPP/SLIP IP connectivity using 2.4 kb/s, 9.6 kb/s, 14.4 kb/s, or 28.8 kb/s wireline modems, or Metricom Ricochet (approximately 20-30 kb/s) wireless modems. These client traces were unobtrusively gathered through the use of a packet sniffing machine placed at the head-end of the Home IP modem bank; the tracing program used was a custom module written on top of the Internet Protocol Scanning Engine (IPSE) created by Ian Goldberg. Only traffic destined for port 80 was traced; all non-HTTP protocols and HTTP connections for other ports were excluded from these traces.

    The traces contain the following information:

    • a total of 9,244,728 references spanning the period from Friday, November 1st, 1996 at 15:18:59 PST through Tuesday, November 19th, 1996 at 05:52:03 PST. 8,377 unique clients were seen in the traces.
    • the time at which the client made the request
    • the time at which the first byte of the server response was seen
    • the time at which the last byte of the server response was seen
    • the client IP address (suitably anonymized)
    • the client port
    • the server IP address (suitably anonymized)
    • the server port (always 80 for these traces)
    • the presence of the no-cache, keep-alive, cache-control, if-modified-since, and unless client headers.
    • the presence of the no-cache, cache-control, expires, and last-modified server headers.
    • the values of the client if-modified-since, the server expires, and the server last-modified headers, if present.
    • the length of the response HTTP header
    • the length of the response data
    • the request URL (suitably anonymized)

    Format

    For the sake of storage efficiency, the (gzipped) traces are stored in a binary representation. This archive of tools includes the following code to parse and manipulate the archives:

    • showtrace: this program will print out a human readable ASCII representation of what is in the traces. To use, type:

      gzcat

      Take a look at the source file showtrace.c to see how you can use logparse.[ch] to write code that parses and manipulates the traces. All times displayed are as reported by the gettimeofday() system call.

    • anon_clients: this is the program that we used to anonymize the traces. I include this program under the principle that the anonymization used is strong enough that distributing the anonymization code cannot help anybody break the anonymization.

    • timeconvert: a program that accepts a calendar time (i.e. time in seconds since the Epoch, as reported by showtrace and as used in the trace filenames) and outputs the local time corresponding to that calendar time.

    The showtrace tool will display lines in the following format:

    848278028:829593 848278028:893670 848278028:895350 23.240.8.98:1462
    207.36.205.194:80 2 8 4294967295 4294967295 835418853 170 844
    37 GET 9168504434183313441..gif HTTP/1.0
    
    • 848278028:829593 is the time at which the client made the request
    • 848278028:893670 is the time at which the first byte of the server response was seen
    • 848278028:895350 is the time at which the last byte of the server response was seen
    • 23.240.8.98:1462 is the anonymized client IP address and the client port number
    • 207.36.205.194:80 is the anonymized server IP address and the server port number
    • 2 is the decimal representation of the client headers bitfield
    • 8 is the decimal representation of the server headers bitfield
    • the first 4294967295 is the if-modified-since client header value (note that 4294967295 is 0xFFFFFFFF, which means this header value was not present for this entry)
    • the second 4294967295 is the expires server header value (again not present)
    • 835418853 is the last-modified server header value
    • 170 is the length of the HTTP response header
    • 844 is the length of the response data
    • 37 is the length of the anonymized request URL
    • "GET 9168504434183313441..gif HTTP/1.0" is the anonymized request URL.

    The interpretation of the client and server header bitfields are as defined in the logparse.h header in the tools code.

    The tools code has been tested on both Linux and Solaris. The provided Makefile assumes Solaris - you may have to play with the LIBS definition for other platforms. HPUX is a mess; I didn't even try, but it should be possible to get these tools to work with little effort. If you do, please let me know what you did so that I can make your changes available to the world.

    Measurement

    The Home IP population gains IP connectivity using PPP or SLIP across their 2.4 kb/s, 9.6 kb/s, 14.4kb/s or 28.8kb/s wireline modem, or their (approximately) 20-30kb/s wireless Metricom Ricochet modem. There are a total of roughly 600 modems available via the Home IP bank. All traffic from these modems ends up feeding over a single 10Mb/s shared Ethernet segment, on which we placed a network monitoring computer (a Pentium Pro 200Mhz running Linux 2.0.27). The monitor was running the IPSE user-level packet scanning engine and a custom-written HTTP module that reconstructed HTTP connections from the gathered IP packets on-the-fly and emitted an unanonymized trace file. Each trace file was then anonymized and transmitted to our research workstations for further postprocessing and analysis.

    The trace gathering engine was brought down and restarted approximately every 4 hours (for administrative and address-space-growth reasons). This implies that there are two weaknesses in these traces that you should be aware of:

    1. any connection active when the engine was brought down will have a possibly incorrect timestamp for the last byte seen from the server, and a possibly incorrect reported size. We estimate that no more than 150 such entries (out of roughly 90000-100000) are misreported for each 4 hour period.

    2. any connection that was forged in the very small time window (about 300 milliseconds) between when the engine was shut down and restarted will not appear in the logs. We estimate that no more than 30 such drops occur for each 4 hour period.

    The packet capture tool reported no packet drops. Considering that a Pentium Pro 200MHz was used to capture the traces on a 10 Mb/s Ethernet segment, it is virtually certain that no trace drops besides those mentioned above occurred. There may be periods of uncharacteristically low activity in the traces - these correspond to network outages from Berkeley's ISP, rather than trace failures.

    The traces do contain entries for requests issued by the client but that weren't completed (because, for instance, the user pressed the STOP button and the TCP connection was shut down before the request completed). Unknown timestamps in the traces contain the value 0xFFFFFFFF (reported by showtrace as 4294967295), and incomplete requests contain header and data length values that report as much header/data was seen.

    The trace data is sorted by completion time (i.e. the time at which the last bye of the server response was seen, or the time at which the connection was dropped). However, because of inaccuracies and apparent time travel in the Linux system clock, some trace entries appear slightly out of order.

    All timestamps within the traces are as reported by the gettimeofday() system call, so these timestamps ostensibly have microsecond resolution.

    Privacy

    To maintain the privacy of each individual Home IP user, we have stripped identity information out of the traces through a post-processing phase. Because it is very trivial to identify a user based solely on the pages that the user has visited, we were forced to anonymize the URL and destination IP address of each web request as well as the source IP address. All anonymization was done using a keyed MD5 hash of the data (32 bits for client and server IP addresses, 64 bits for URLs). We ourselves do not know the key used to salt the MD5 hash, so don't bother asking us for it. Similarly, don't bother asking us for unanonymized traces.

    In order to preserve some information about the URLs, the post-processed URLs have the following format:

    COMMAND URLHASH.[flags][.suffix] [HTTPVERS]

    where:

    • COMMAND is one of GET, HEAD, POST, or PUT,

      <p> </p>
      </li>
      <li><strong><code>URLHASH</code></strong> is the string representation of the 64-bit MD5 hash of the URL,
      <p> </p>
      </li>
      <li><strong><code>flags</code></strong> contains the character <strong>q</strong> to indicate that a question mark was seen in the URL, and the character <strong>c</strong> to indicate that the string <strong>CGI</strong> or <strong>cgi</strong> was seen in the URL,
      <p> </p>
      </li>
      <li><strong><code>suffix</code></strong> is the filename suffix, if present, and
      <p> </p>
      </li>
      <li><strong><code>HTTPVERS</code></strong> is the HTTP version field of the HTTP command issued by the client,
      
  17. f

    All algorithm throughput values (in Gbps) using Defcon and web traffic...

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chun-Liang Lee; Yi-Shan Lin; Yaw-Chung Chen (2023). All algorithm throughput values (in Gbps) using Defcon and web traffic traces. [Dataset]. http://doi.org/10.1371/journal.pone.0139301.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Chun-Liang Lee; Yi-Shan Lin; Yaw-Chung Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All algorithm throughput values (in Gbps) using Defcon and web traffic traces.

  18. i

    Backscatter-2004-2005

    • impactcybertrust.org
    Updated May 26, 2004
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD - Center for Applied Internet Data Analysis (2004). Backscatter-2004-2005 [Dataset]. http://doi.org/10.23721/107/1353898
    Explore at:
    Dataset updated
    May 26, 2004
    Authors
    UCSD - Center for Applied Internet Data Analysis
    Time period covered
    May 26, 2004 - Dec 1, 2005
    Description

    This backscatter from victims was collected by the UCSD Network Telescope.
    Quarterly data collection took place for one week in May, August and
    November in 2004, and February, May, August and November in 2005. Possible
    uses of this data include modeling DoS attacks, understanding victim
    populations, and using real packet traces to validate algorithms for
    detecting or classifying malicious traffic. This last use is particularly
    valuable because it is extremely challenging to artificially generate the
    kind of real-world noise present on the Internet.

  19. i

    Two-Days-in-2008 Telescope Dataset

    • impactcybertrust.org
    Updated Nov 12, 2008
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD - Center for Applied Internet Data Analysis (2008). Two-Days-in-2008 Telescope Dataset [Dataset]. http://doi.org/10.23721/107/1353895
    Explore at:
    Dataset updated
    Nov 12, 2008
    Authors
    UCSD - Center for Applied Internet Data Analysis
    Time period covered
    Nov 12, 2008 - Nov 19, 2008
    Description

    This dataset contains two full days of trace data from the UCSD Network Telescope:
    2008-11-12 and 2008-11-19. These dates precede our detection of the Conficker A Worm
    on 2008-11-21. The dataset consists of 48 compressed pcap files each containing one
    hour of traffic observed by the Network Telescope.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CAIDA (2019). Anonymized Internet Traces 2019 [Dataset]. https://catalog.caida.org/dataset/passive_2019_pcap

Anonymized Internet Traces 2019

Explore at:
57 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 15, 2019
Dataset authored and provided by
CAIDA
License

https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/

Time period covered
Jan 2019
Description

Packet headers (upto transport layer, inclusive) for Anonymized Internet Traces 2019 Dataset. Derived from 10G traces on Equinix NYC monitor.

Search
Clear search
Close search
Google apps
Main menu