25 datasets found
  1. f

    OpenStack log files

    • figshare.com
    zip
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mbasa MOLO; Victor Akande; Nzanzu Vingi Patrick; Joke Badejo; Emmanuel Adetiba (2021). OpenStack log files [Dataset]. http://doi.org/10.6084/m9.figshare.17025353.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 16, 2021
    Dataset provided by
    figshare
    Authors
    Mbasa MOLO; Victor Akande; Nzanzu Vingi Patrick; Joke Badejo; Emmanuel Adetiba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains log information of a cloud computing infrastructure based on OpenStack.Three different files are available, including the nova, cinder, and glance log files. Due to the fact that the data is unbalanced, a CSV file containing log information of the three OpenStack applications is provided. This can be used for testing in case the log files are used for a machine learning purpose. These data were collected from the Federated Genominc (FEDGEN) cloud computing infrastructure hosted in Covenant Unversity under the Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE) project funded by the World Bank.

  2. AIT Log Data Set V2.0

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jun 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber (2024). AIT Log Data Set V2.0 [Dataset]. http://doi.org/10.5281/zenodo.5789064
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AIT Log Data Sets

    This repository contains synthetic log data suitable for evaluation of intrusion detection systems, federated learning, and alert aggregation. A detailed description of the dataset is available in [1]. The logs were collected from eight testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by [2]. Please cite these papers if the data is used for academic publications.

    In brief, each of the datasets corresponds to a testbed representing a small enterprise network including mail server, file share, WordPress server, VPN, firewall, etc. Normal user behavior is simulated to generate background noise over a time span of 4-6 days. At some point, a sequence of attack steps is launched against the network. Log data is collected from all hosts and includes Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs. Separate ground truth files are used to label events that are related to the attacks. Compared to the AIT-LDSv1.1, a more complex network and diverse user behavior is simulated, and logs are collected from all hosts in the network. If you are only interested in network traffic analysis, we also provide the AIT-NDS containing the labeled netflows of the testbed networks. We also provide the AIT-ADS, an alert data set derived by forensically applying open-source intrusion detection systems on the log data.

    The datasets in this repository have the following structure:

    • The gather directory contains all logs collected from the testbed. Logs collected from each host are located in gather/.
    • The labels directory contains the ground truth of the dataset that indicates which events are related to attacks. The directory mirrors the structure of the gather directory so that each label files is located at the same path and has the same name as the corresponding log file. Each line in the label files references the log event corresponding to an attack by the line number counted from the beginning of the file ("line"), the labels assigned to the line that state the respective attack step ("labels"), and the labeling rules that assigned the labels ("rules"). An example is provided below.
    • The processing directory contains the source code that was used to generate the labels.
    • The rules directory contains the labeling rules.
    • The environment directory contains the source code that was used to deploy the testbed and run the simulation using the Kyoushi Testbed Environment.
    • The dataset.yml file specifies the start and end time of the simulation.

    The following table summarizes relevant properties of the datasets:

    • fox
      • Simulation time: 2022-01-15 00:00 - 2022-01-20 00:00
      • Attack time: 2022-01-18 11:59 - 2022-01-18 13:15
      • Scan volume: High
      • Unpacked size: 26 GB
    • harrison
      • Simulation time: 2022-02-04 00:00 - 2022-02-09 00:00
      • Attack time: 2022-02-08 07:07 - 2022-02-08 08:38
      • Scan volume: High
      • Unpacked size: 27 GB
    • russellmitchell
      • Simulation time: 2022-01-21 00:00 - 2022-01-25 00:00
      • Attack time: 2022-01-24 03:01 - 2022-01-24 04:39
      • Scan volume: Low
      • Unpacked size: 14 GB
    • santos
      • Simulation time: 2022-01-14 00:00 - 2022-01-18 00:00
      • Attack time: 2022-01-17 11:15 - 2022-01-17 11:59
      • Scan volume: Low
      • Unpacked size: 17 GB
    • shaw
      • Simulation time: 2022-01-25 00:00 - 2022-01-31 00:00
      • Attack time: 2022-01-29 14:37 - 2022-01-29 15:21
      • Scan volume: Low
      • Data exfiltration is not visible in DNS logs
      • Unpacked size: 27 GB
    • wardbeck
      • Simulation time: 2022-01-19 00:00 - 2022-01-24 00:00
      • Attack time: 2022-01-23 12:10 - 2022-01-23 12:56
      • Scan volume: Low
      • Unpacked size: 26 GB
    • wheeler
      • Simulation time: 2022-01-26 00:00 - 2022-01-31 00:00
      • Attack time: 2022-01-30 07:35 - 2022-01-30 17:53
      • Scan volume: High
      • No password cracking in attack chain
      • Unpacked size: 30 GB
    • wilson
      • Simulation time: 2022-02-03 00:00 - 2022-02-09 00:00
      • Attack time: 2022-02-07 10:57 - 2022-02-07 11:49
      • Scan volume: High
      • Unpacked size: 39 GB

    The following attacks are launched in the network:

    • Scans (nmap, WPScan, dirb)
    • Webshell upload (CVE-2020-24186)
    • Password cracking (John the Ripper)
    • Privilege escalation
    • Remote command execution
    • Data exfiltration (DNSteal)

    Note that attack parameters and their execution orders vary in each dataset. Labeled log files are trimmed to the simulation time to ensure that their labels (which reference the related event by the line number in the file) are not misleading. Other log files, however, also contain log events generated before or after the simulation time and may therefore be affected by testbed setup or data collection. It is therefore recommended to only consider logs with timestamps within the simulation time for analysis.

    The structure of labels is explained using the audit logs from the intranet server in the russellmitchell data set as an example in the following. The first four labels in the labels/intranet_server/logs/audit/audit.log file are as follows:

    {"line": 1860, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1861, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1862, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    {"line": 1863, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

    Each JSON object in this file assigns a label to one specific log line in the corresponding log file located at gather/intranet_server/logs/audit/audit.log. The field "line" in the JSON objects specify the line number of the respective event in the original log file, while the field "labels" comprise the corresponding labels. For example, the lines in the sample above provide the information that lines 1860-1863 in the gather/intranet_server/logs/audit/audit.log file are labeled with "attacker_change_user" and "escalate" corresponding to the attack step where the attacker receives escalated privileges. Inspecting these lines shows that they indeed correspond to the user authenticating as root:

    type=USER_AUTH msg=audit(1642999060.603:2226): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:authentication acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=USER_ACCT msg=audit(1642999060.603:2227): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:accounting acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=CRED_ACQ msg=audit(1642999060.615:2228): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:setcred acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    type=USER_START msg=audit(1642999060.627:2229): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:session_open acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

    The same applies to all other labels for this log file and all other log files. There are no labels for logs generated by "normal" (i.e., non-attack) behavior; instead, all log events that have no corresponding JSON object in one of the files from the labels directory, such as the lines 1-1859 in the example above, can be considered to be labeled as "normal". This means that in order to figure out the labels for the log data it is necessary to store the line numbers when processing the original logs from the gather directory and see if these line numbers also appear in the corresponding file in the labels directory.

    Beside the attack labels, a general overview of the exact times when specific attack steps are launched are available in gather/attacker_0/logs/attacks.log. An enumeration of all hosts and their IP addresses is stated in processing/config/servers.yml. Moreover, configurations of each host are provided in gather/ and gather/.

    Version history:

    • AIT-LDS-v1.x: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.
    • AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.

    Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU projects GUARD (833456) and PANDORA (SI2.835928).

    If you use the dataset, please cite the following publications:

    [1] M. Landauer, F. Skopik, M. Frank, W. Hotwagner,

  3. q

    Computational Log files from Gaussian09

    • data.researchdatafinder.qut.edu.au
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Computational Log files from Gaussian09 [Dataset]. https://data.researchdatafinder.qut.edu.au/ar/dataset/4b78b73f-ed46-42f8-b7d9-1150d03e78d2/resources/6b5ab72b-fb48-4812-a9d0-f0472967ed4b/history/6b1d0f0c-f32a-4c52-8c1c-109a7128402d
    Explore at:
    Dataset updated
    Apr 29, 2024
    License

    http://researchdatafinder.qut.edu.au/display/n7962http://researchdatafinder.qut.edu.au/display/n7962

    Description

    Log files including optimised structures and transition states from Gaussian09 calculations QUT Research Data Respository Dataset Resource available for download

  4. Utah FORGE: Well 58-32 Schlumberger FMI Logs DLIS and XML files

    • gdr.openei.org
    • data.openei.org
    • +2more
    archive, data +1
    Updated Nov 17, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greg Nash; Joe Moore; Greg Nash; Joe Moore (2017). Utah FORGE: Well 58-32 Schlumberger FMI Logs DLIS and XML files [Dataset]. http://doi.org/10.15121/1464529
    Explore at:
    archive, website, dataAvailable download formats
    Dataset updated
    Nov 17, 2017
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Geothermal Data Repository
    Energy and Geoscience Institute at the University of Utah
    Authors
    Greg Nash; Joe Moore; Greg Nash; Joe Moore
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This zipped data set includes Schlumberger FMI logs DLIS and XML files from Utah FORGE deep well 58-32. These include runs 1 (2226-7550 ft) and 2 (7440-7550 ft). Run 3 (7390-7527ft) was acquired during phase 2C.

  5. f

    Event Logs CSV

    • figshare.com
    rar
    Updated Dec 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dina Bayomie (2019). Event Logs CSV [Dataset]. http://doi.org/10.6084/m9.figshare.11342063.v1
    Explore at:
    rarAvailable download formats
    Dataset updated
    Dec 9, 2019
    Dataset provided by
    figshare
    Authors
    Dina Bayomie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The event logs in CSV format. The dataset contains both correlated and uncorrelated logs

  6. BSEE Data Center - Scanned Units Query

    • catalog.data.gov
    Updated Feb 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Safety and Environmental Enforcement (2025). BSEE Data Center - Scanned Units Query [Dataset]. https://catalog.data.gov/dataset/bsee-data-center-scanned-units-query-f51ac
    Explore at:
    Dataset updated
    Feb 26, 2025
    Dataset provided by
    Bureau of Safety and Environmental Enforcementhttp://www.bsee.gov/
    Description

    Scanned Units Query - You can now request these same well files, well logs, and well data as a free download through the File Request System ( https://www.data.bsee.gov/Other/FileRequestSystem/Default.aspx ). The Disc Media Store will be removed at some point in the future.

  7. Processed Datasets - Imputation in Well Log Data: A Benchmark

    • zenodo.org
    application/gzip
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro H. T. Gama; Pedro H. T. Gama; Jackson Faria; Jessica Sena; Jessica Sena; Francisco Neves; Francisco Neves; Vinícius R. Riffel; Vinícius R. Riffel; Lucas Perez; Lucas Perez; André Korenchendler; André Korenchendler; Matheus C. A. Sobreira; Matheus C. A. Sobreira; Alexei M. C. Machado; Alexei M. C. Machado; Jackson Faria (2024). Processed Datasets - Imputation in Well Log Data: A Benchmark [Dataset]. http://doi.org/10.5281/zenodo.10987946
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pedro H. T. Gama; Pedro H. T. Gama; Jackson Faria; Jessica Sena; Jessica Sena; Francisco Neves; Francisco Neves; Vinícius R. Riffel; Vinícius R. Riffel; Lucas Perez; Lucas Perez; André Korenchendler; André Korenchendler; Matheus C. A. Sobreira; Matheus C. A. Sobreira; Alexei M. C. Machado; Alexei M. C. Machado; Jackson Faria
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 17, 2024
    Description

    Imputation of well log data is a common task in the field. However a quick review of the literature reveals a lack of padronization when evaluating methods for the problem. The goal of the benchmark is to introduce a standard evaluation protocol to any imputation method for well log data.

    In the proposed benchmark, three public datasets are used:

    • Geolink: The Geolink Dataset is another public dataset of wells in the Norwegian offshore. The data is provided by the company of the same name, GEOLINK and follows the NOLD 2.0 license.
      This dataset contains a total of 223 wells. It also has lithology labels for the wells with a total of 36 lithology classes. [download original]
    • Taranaki Basin: The Taranaki Basin Dataset is a curated set of wells and a convenient option for experimentation especially due to it is ease of accessibility and use.
      This collection, under the CDLA-Sharing-1.0 license, contains well logs extracted from the New Zealand Petroleum & Minerals Online Exploration Database and Petlab.
      There are a total of 407 wells, of which 289 are onshore and 118 are offshore exploration and production wells. [download original]
    • Teapot Dome: The Teapot Dome dataset is provided by the Rocky Mountain Oilfield Testing Center (RMOTC) and the US Department of Energy.
      It contains different types of data related to the Teapot Dome oil field, such as 2D and 3D seismic data, well logs, and GIS data. The data is licensed under the Creative Commons 4.0 license.
      In total, the dataset has 1,179 wells with available logs. The number of available logs varies across wells. There are only 91 wells with the gamma ray, bulk density, and neutron porosity logs, while only three wells have the complete basic suite. [direct download]

    Here you can download all three datasets already preprocessed to be used with our implementation, found here.

    File Description:

    There are six files for each fold partition for each dataset.

    • datasetname_fold_k_well_log_metadata_train.json : JSON file with general information of the slices of training partition of the fold k. Contains total number of slices and the number of slices per well.
    • datasetname_fold_k_well_log_metadata_val.json : JSON file with general information of the slices of validation partition of the fold k. Contains total number of slices and the number of slices per well.
    • datasetname_fold_k_well_log_slices_train.npy: .npy (numpy) file ready to be loaded with the slices for training of the fold k already processed. When loaded should have shape of (total_slices, 256, number_of_logs)
    • datasetname_fold_k_well_log_slices_val.npy : .npy (numpy) file ready to be loaded with the slices for validation of the fold k already processed.
    • datasetname_fold_k_well_log_slices_meta_train.json : JSON file with the slices info for all slices in the training partition of the fold k. For each slice, 7 data points are provided, the last four are discarded (it would contain other information that was not used). The first three are in order the: origin well name, the starting position in that well, and the end position of the slice in that well.
    • datasetname_fold_k_well_log_slices_meta_val.json : JSON file with the slices info for all slices in the validation partition of the fold k.
  8. Event Log Sampling Datasets

    • figshare.com
    zip
    Updated Jul 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CONG LIU (2022). Event Log Sampling Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.20354505.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 22, 2022
    Dataset provided by
    figshare
    Authors
    CONG LIU
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This datasets includes 9 event logs, which can be used to experiment with log completeness-oriented event log sampling methods.

    · exercise.xes: The dataset is a simulation log generated by the paper review process model, and each trace clearly describes the process of reviewing papers in detail.

    · training_log_1/3/8.xes: These 3 datasets are human-trained simulation logs for the 2016 Process Discovery Competition (PDC 2016). Each trace consists of two values, the name of the process model activity referenced by the event and the identifier of the case to which the event belongs.

    · Production.xes: This dataset includes process data from production processes, and each track includes data for cases, activities, resources, timestamps, and more data fields.

    · BPIC_2012_A/O/W.xes: These 3 dataset are derived from the personal loan application process of a financial institution in the Netherlands. The process represented in the event log is the application process of a personal loan or overdraft in a global financing organization. Each trace describes the process of applying for a personal loan for different customers.

    · CrossHospital.xes: The dataset includes the treatment process data of emergency patients in the hospital, and each track represents the treatment process of an emergency patient in the hospital.

  9. LoRaWAN Traffic Analysis Dataset

    • zenodo.org
    zip
    Updated Aug 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ales Povalac; Ales Povalac; Jan Kral; Jan Kral (2023). LoRaWAN Traffic Analysis Dataset [Dataset]. http://doi.org/10.5281/zenodo.7919213
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ales Povalac; Ales Povalac; Jan Kral; Jan Kral
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was created by a LoRaWAN sniffer and contains packets, which are thoroughly analyzed in the paper Exploring LoRaWAN Traffic: In-Depth Analysis of IoT Network Communications (not yet published). Data from the LoRaWAN sniffer was collected in four cities: Liege (Belgium), Graz (Austria), Vienna (Austria), and Brno (Czechia).

    Gateway ID: b827ebafac000001

    • Uplink reception (end-device => gateway)
    • Only packets containing CRC, inverted IQ
    • RX0: 867.1 MHz, 867.3 MHz, 867.5 MHz, 867.7 MHz, 867.9 MHz - BW 125 kHz and all SF
    • RX1: 868.1 MHz, 868.3 MHz, 868.5 MHz - BW 125 kHz and all SF

    Gateway ID: b827ebafac000002

    • Downlink reception (gateway => end-device)
    • Includes packets without CRC, non-inverted IQ
    • RX0: 867.1 MHz, 867.3 MHz, 867.5 MHz, 867.7 MHz, 867.9 MHz - BW 125 kHz and all SF
    • RX1: 868.1 MHz, 868.3 MHz, 868.5 MHz - BW 125 kHz and all SF

    Gateway ID: b827ebafac000003

    • Downlink reception (gateway => end-device) and Class-B beacon on 869.525 MHz
    • Includes packets without CRC, non-inverted IQ
    • RX0: 869.525 MHz - BW 125 kHz and all SF, BW 125 kHz and SF9 with implicit header, CR 4/5 and length 17 B

    To open the pcap files, you need Wireshark with current support for LoRaTap and LoRaWAN protocols. This support will be available in the official 4.1.0 release. A working version for Windows is accessible in the automated build system.

    The source data is available in the log.zip file, which contains the complete dataset obtained by the sniffer. A set of conversion tools for log processing is available on Github. The converted logs, available in Wireshark format, are stored in pcap.zip. For the LoRaWAN decoder, you can use the attached root and session keys. The processed outputs are stored in csv.zip, and graphical statistics are available in png.zip.

    This data represents a unique, geographically identifiable selection from the full log, cleaned of any errors. The records from Brno include communication between the gateway and a node with known keys.

    Test file :: 00_Test

    • short test file for parser verification
    • comparison of LoRaTap version 0 and version 1 formats

    Brno, Czech Republic :: 01_Brno

    • 49.22685N, 16.57536E, ASL 306m
    • lines 150873 to 529796
    • time 1.8.2022 15:04:28 to 17.8.2022 13:05:32
    • preliminary experiment
    • experimental device
      • Device EUI: 70b3d5cee0000042
      • Application key: d494d49a7b4053302bdcf96f1defa65a
      • Device address: 00d85395
      • Network session key: c417540b8b2afad8930c82fcf7ea54bb
      • Application session key: 421fea9bedd2cc497f63303edf5adf8e

    Liege, Belgium :: 02_Liege :: evaluated in the paper

    • 50.66445N, 5.59276E, ASL 151m
    • lines 636205 to 886868
    • time 25.8.2022 10:12:24 to 12.9.2022 06:20:48

    Brno, Czech Republic :: 03_Brno_join

    • 49.22685N, 16.57536E, ASL 306m
    • lines 947787 to 979382
    • time 30.9.2022 15:21:27 to 4.10.2022 10:46:31
    • record contains OTAA activation (Join Request / Join Accept)
    • experimental device:
      • Device EUI: 70b3d5cee0000042
      • Application key: d494d49a7b4053302bdcf96f1defa65a
      • Device address: 01e65ddc
      • Network session key: e2898779a03de59e2317b149abf00238
      • Application session key: 59ca1ac91922887093bc7b236bd1b07f

    Graz, Austria :: 04_Graz :: evaluated in the paper

    • 47.07049N, 15.44506E, ASL 364m
    • lines 1015139 to 1178855
    • time 26.10.2022 06:21:07 to 29.11.2022 10:03:00

    Vienna, Austria :: 05_Wien :: evaluated in the paper

    • 48.19666N, 16.37101E, ASL 204m
    • lines 1179308 to 3657105
    • time 1.12.2022 10:42:19 to 4.1.2023 14:00:05
    • contains a total of 14 short restarts (under 90 seconds)

    Brno, Czech Republic :: 07_Brno :: evaluated in the paper

    • 49.22685N, 16.57536E, ASL 306m
    • lines 4969648 to 6919392
    • time 16.2.2023 8:53:43 to 30.3.2023 9:00:11
  10. d

    Ministry of Public Administration and Security_Public Data Usage (File_API)

    • data.go.kr
    csv
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Ministry of Public Administration and Security_Public Data Usage (File_API) [Dataset]. https://www.data.go.kr/en/data/15076332/fileData.do
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 17, 2025
    License

    https://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do

    Description

    It provides the number of downloads and API utilization requests by year (2011-2023) of file data registered in the public data portal, and is useful for analyzing the trend of increase in public data utilization. The file format is provided in CSV format, and the meta items are statistical year, registration agency, list name, data name, file downloads, and API utilization requests. You can download file data from the public data portal without logging in, and to utilize the open API, you must register as a public data portal member and log in to apply for utilization.

  11. Scanned data logs, ship logs and reports from the ADBEX II voyage of the...

    • researchdata.edu.au
    • data.aad.gov.au
    Updated Jan 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CONNELL, DAVE J.; Commonwealth of Australia; CONNELL, DAVE J. (2024). Scanned data logs, ship logs and reports from the ADBEX II voyage of the Nella Dan, January to February, 1984 [Dataset]. https://researchdata.edu.au/scanned-logs-ship-february-1984/2839164
    Explore at:
    Dataset updated
    Jan 16, 2024
    Dataset provided by
    Australian Antarctic Divisionhttps://www.antarctica.gov.au/
    Australian Ocean Data Network
    Australian Antarctic Data Centre
    Authors
    CONNELL, DAVE J.; Commonwealth of Australia; CONNELL, DAVE J.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 14, 1984 - Feb 6, 1984
    Area covered
    Description

    The dataset contains scanned copies of a number of logs, reports and documents from the Antarctic Division Biomass Experiment II( ADBEX II) voyage of the Nella Dan in January and February of 1984. The documents cover CTD data and reports, meteorological logs, ship logs, and other files.

    The dataset download contains the following files:

    ADBEX 2 CTD DATA 1m 2m Averages EJW 12-86.pdf
    ADBEX 2 CTD Report.txt Data Summary 2m Averages STNS 1 - 5 5-86.pdf
    ADBEX 2 CTD Stations Graphs.pdf
    ADBEX 2 RIF2.DAT.pdf
    Met Log - ADBEX 2.pdf
    ADBEX 2 - SIBEX 1 Ships Log - Nella Dan.pdf
    ADBEX 2 SIBEX 1 Log Book 1984.pdf
    ADBEX 2 Analysis Programs.pdf

  12. Z

    PIPr: A Dataset of Public Infrastructure as Code Programs

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salvaneschi, Guido (2023). PIPr: A Dataset of Public Infrastructure as Code Programs [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8262770
    Explore at:
    Dataset updated
    Nov 28, 2023
    Dataset provided by
    Spielmann, David
    Sokolowski, Daniel
    Salvaneschi, Guido
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Programming Languages Infrastructure as Code (PL-IaC) enables IaC programs written in general-purpose programming languages like Python and TypeScript. The currently available PL-IaC solutions are Pulumi and the Cloud Development Kits (CDKs) of Amazon Web Services (AWS) and Terraform. This dataset provides metadata and initial analyses of all public GitHub repositories in August 2022 with an IaC program, including their programming languages, applied testing techniques, and licenses. Further, we provide a shallow copy of the head state of those 7104 repositories whose licenses permit redistribution. The dataset is available under the Open Data Commons Attribution License (ODC-By) v1.0. Contents:

    metadata.zip: The dataset metadata and analysis results as CSV files. scripts-and-logs.zip: Scripts and logs of the dataset creation. LICENSE: The Open Data Commons Attribution License (ODC-By) v1.0 text. README.md: This document. redistributable-repositiories.zip: Shallow copies of the head state of all redistributable repositories with an IaC program. This artifact is part of the ProTI Infrastructure as Code testing project: https://proti-iac.github.io. Metadata The dataset's metadata comprises three tabular CSV files containing metadata about all analyzed repositories, IaC programs, and testing source code files. repositories.csv:

    ID (integer): GitHub repository ID url (string): GitHub repository URL downloaded (boolean): Whether cloning the repository succeeded name (string): Repository name description (string): Repository description licenses (string, list of strings): Repository licenses redistributable (boolean): Whether the repository's licenses permit redistribution created (string, date & time): Time of the repository's creation updated (string, date & time): Time of the last update to the repository pushed (string, date & time): Time of the last push to the repository fork (boolean): Whether the repository is a fork forks (integer): Number of forks archive (boolean): Whether the repository is archived programs (string, list of strings): Project file path of each IaC program in the repository programs.csv:

    ID (string): Project file path of the IaC program repository (integer): GitHub repository ID of the repository containing the IaC program directory (string): Path of the directory containing the IaC program's project file solution (string, enum): PL-IaC solution of the IaC program ("AWS CDK", "CDKTF", "Pulumi") language (string, enum): Programming language of the IaC program (enum values: "csharp", "go", "haskell", "java", "javascript", "python", "typescript", "yaml") name (string): IaC program name description (string): IaC program description runtime (string): Runtime string of the IaC program testing (string, list of enum): Testing techniques of the IaC program (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") tests (string, list of strings): File paths of IaC program's tests testing-files.csv:

    file (string): Testing file path language (string, enum): Programming language of the testing file (enum values: "csharp", "go", "java", "javascript", "python", "typescript") techniques (string, list of enum): Testing techniques used in the testing file (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") keywords (string, list of enum): Keywords found in the testing file (enum values: "/go/auto", "/testing/integration", "@AfterAll", "@BeforeAll", "@Test", "@aws-cdk", "@aws-cdk/assert", "@pulumi.runtime.test", "@pulumi/", "@pulumi/policy", "@pulumi/pulumi/automation", "Amazon.CDK", "Amazon.CDK.Assertions", "Assertions_", "HashiCorp.Cdktf", "IMocks", "Moq", "NUnit", "PolicyPack(", "ProgramTest", "Pulumi", "Pulumi.Automation", "PulumiTest", "ResourceValidationArgs", "ResourceValidationPolicy", "SnapshotTest()", "StackValidationPolicy", "Testing", "Testing_ToBeValidTerraform(", "ToBeValidTerraform(", "Verifier.Verify(", "WithMocks(", "[Fact]", "[TestClass]", "[TestFixture]", "[TestMethod]", "[Test]", "afterAll(", "assertions", "automation", "aws-cdk-lib", "aws-cdk-lib/assert", "aws_cdk", "aws_cdk.assertions", "awscdk", "beforeAll(", "cdktf", "com.pulumi", "def test_", "describe(", "github.com/aws/aws-cdk-go/awscdk", "github.com/hashicorp/terraform-cdk-go/cdktf", "github.com/pulumi/pulumi", "integration", "junit", "pulumi", "pulumi.runtime.setMocks(", "pulumi.runtime.set_mocks(", "pulumi_policy", "pytest", "setMocks(", "set_mocks(", "snapshot", "software.amazon.awscdk.assertions", "stretchr", "test(", "testing", "toBeValidTerraform(", "toMatchInlineSnapshot(", "toMatchSnapshot(", "to_be_valid_terraform(", "unittest", "withMocks(") program (string): Project file path of the testing file's IaC program Dataset Creation scripts-and-logs.zip contains all scripts and logs of the creation of this dataset. In it, executions/executions.log documents the commands that generated this dataset in detail. On a high level, the dataset was created as follows:

    A list of all repositories with a PL-IaC program configuration file was created using search-repositories.py (documented below). The execution took two weeks due to the non-deterministic nature of GitHub's REST API, causing excessive retries. A shallow copy of the head of all repositories was downloaded using download-repositories.py (documented below). Using analysis.ipynb, the repositories were analyzed for the programs' metadata, including the used programming languages and licenses. Based on the analysis, all repositories with at least one IaC program and a redistributable license were packaged into redistributable-repositiories.zip, excluding any node_modules and .git directories. Searching Repositories The repositories are searched through search-repositories.py and saved in a CSV file. The script takes these arguments in the following order:

    Github access token. Name of the CSV output file. Filename to search for. File extensions to search for, separated by commas. Min file size for the search (for all files: 0). Max file size for the search or * for unlimited (for all files: *). Pulumi projects have a Pulumi.yaml or Pulumi.yml (case-sensitive file name) file in their root folder, i.e., (3) is Pulumi and (4) is yml,yaml. https://www.pulumi.com/docs/intro/concepts/project/ AWS CDK projects have a cdk.json (case-sensitive file name) file in their root folder, i.e., (3) is cdk and (4) is json. https://docs.aws.amazon.com/cdk/v2/guide/cli.html CDK for Terraform (CDKTF) projects have a cdktf.json (case-sensitive file name) file in their root folder, i.e., (3) is cdktf and (4) is json. https://www.terraform.io/cdktf/create-and-deploy/project-setup Limitations The script uses the GitHub code search API and inherits its limitations:

    Only forks with more stars than the parent repository are included. Only the repositories' default branches are considered. Only files smaller than 384 KB are searchable. Only repositories with fewer than 500,000 files are considered. Only repositories that have had activity or have been returned in search results in the last year are considered. More details: https://docs.github.com/en/search-github/searching-on-github/searching-code The results of the GitHub code search API are not stable. However, the generally more robust GraphQL API does not support searching for files in repositories: https://stackoverflow.com/questions/45382069/search-for-code-in-github-using-graphql-v4-api Downloading Repositories download-repositories.py downloads all repositories in CSV files generated through search-respositories.py and generates an overview CSV file of the downloads. The script takes these arguments in the following order:

    Name of the repositories CSV files generated through search-repositories.py, separated by commas. Output directory to download the repositories to. Name of the CSV output file. The script only downloads a shallow recursive copy of the HEAD of the repo, i.e., only the main branch's most recent state, including submodules, without the rest of the git history. Each repository is downloaded to a subfolder named by the repository's ID.

  13. n

    Respiration_chambers/raw_log_files and combined datasets of biomass and...

    • cmr.earthdata.nasa.gov
    • researchdata.edu.au
    • +1more
    Updated Dec 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Respiration_chambers/raw_log_files and combined datasets of biomass and chamber data, and physical parameters [Dataset]. http://doi.org/10.26179/5c1827d5d6711
    Explore at:
    Dataset updated
    Dec 18, 2018
    Time period covered
    Jan 27, 2015 - Feb 23, 2015
    Area covered
    Description

    General overview The following datasets are described by this metadata record, and are available for download from the provided URL.

    • Raw log files, physical parameters raw log files
    • Raw excel files, respiration/PAM chamber raw excel spreadsheets
    • Processed and cleaned excel files, respiration chamber biomass data
    • Raw rapid light curve excel files (this is duplicated from Raw log files), combined dataset pH, temperature, oxygen, salinity, velocity for experiment
    • Associated R script file for pump cycles of respirations chambers

    ####

    Physical parameters raw log files

    Raw log files 1) DATE= 2) Time= UTC+11 3) PROG=Automated program to control sensors and collect data 4) BAT=Amount of battery remaining 5) STEP=check aquation manual 6) SPIES=check aquation manual 7) PAR=Photoactive radiation 8) Levels=check aquation manual 9) Pumps= program for pumps 10) WQM=check aquation manual

    ####

    Respiration/PAM chamber raw excel spreadsheets

    Abbreviations in headers of datasets Note: Two data sets are provided in different formats. Raw and cleaned (adj). These are the same data with the PAR column moved over to PAR.all for analysis. All headers are the same. The cleaned (adj) dataframe will work with the R syntax below, alternative add code to do cleaning in R.

    Date: ISO 1986 - Check Time:UTC+11 unless otherwise stated DATETIME: UTC+11 unless otherwise stated ID (of instrument in respiration chambers) ID43=Pulse amplitude fluoresence measurement of control ID44=Pulse amplitude fluoresence measurement of acidified chamber ID=1 Dissolved oxygen ID=2 Dissolved oxygen ID3= PAR ID4= PAR PAR=Photo active radiation umols F0=minimal florescence from PAM Fm=Maximum fluorescence from PAM Yield=(F0 – Fm)/Fm rChl=an estimate of chlorophyll (Note this is uncalibrated and is an estimate only) Temp=Temperature degrees C PAR=Photo active radiation PAR2= Photo active radiation2 DO=Dissolved oxygen %Sat= Saturation of dissolved oxygen Notes=This is the program of the underwater submersible logger with the following abreviations: Notes-1) PAM= Notes-2) PAM=Gain level set (see aquation manual for more detail) Notes-3) Acclimatisation= Program of slowly introducing treatment water into chamber Notes-4) Shutter start up 2 sensors+sample…= Shutter PAMs automatic set up procedure (see aquation manual) Notes-5) Yield step 2=PAM yield measurement and calculation of control Notes-6) Yield step 5= PAM yield measurement and calculation of acidified Notes-7) Abatus respiration DO and PAR step 1= Program to measure dissolved oxygen and PAR (see aquation manual). Steps 1-4 are different stages of this program including pump cycles, DO and PAR measurements.

    8) Rapid light curve data Pre LC: A yield measurement prior to the following measurement After 10.0 sec at 0.5% to 8%: Level of each of the 8 steps of the rapid light curve Odessey PAR (only in some deployments): An extra measure of PAR (umols) using an Odessey data logger Dataflow PAR: An extra measure of PAR (umols) using a Dataflow sensor. PAM PAR: This is copied from the PAR or PAR2 column PAR all: This is the complete PAR file and should be used Deployment: Identifying which deployment the data came from

    ####

    Respiration chamber biomass data

    The data is chlorophyll a biomass from cores from the respiration chambers. The headers are: Depth (mm) Treat (Acidified or control) Chl a (pigment and indicator of biomass) Core (5 cores were collected from each chamber, three were analysed for chl a), these are psudoreplicates/subsamples from the chambers and should not be treated as replicates.

    ####

    Associated R script file for pump cycles of respirations chambers

    Associated respiration chamber data to determine the times when respiration chamber pumps delivered treatment water to chambers. Determined from Aquation log files (see associated files). Use the chamber cut times to determine net production rates. Note: Users need to avoid the times when the respiration chambers are delivering water as this will give incorrect results. The headers that get used in the attached/associated R file are start regression and end regression. The remaining headers are not used unless called for in the associated R script. The last columns of these datasets (intercept, ElapsedTimeMincoef) are determined from the linear regressions described below.

    To determine the rate of change of net production, coefficients of the regression of oxygen consumption in discrete 180 minute data blocks were determined. R squared values for fitted regressions of these coefficients were consistently high (greater than 0.9). We make two assumptions with calculation of net production rates: the first is that heterotrophic community members do not change their metabolism under OA; and the second is that the heterotrophic communities are similar between treatments.

    ####

    Combined dataset pH, temperature, oxygen, salinity, velocity for experiment

    This data is rapid light curve data generated from a Shutter PAM fluorimeter. There are eight steps in each rapid light curve. Note: The software component of the Shutter PAM fluorimeter for sensor 44 appeared to be damaged and would not cycle through the PAR cycles. Therefore the rapid light curves and recovery curves should only be used for the control chambers (sensor ID43).

    The headers are PAR: Photoactive radiation relETR: F0/Fm x PAR Notes: Stage/step of light curve Treatment: Acidified or control

    The associated light treatments in each stage. Each actinic light intensity is held for 10 seconds, then a saturating pulse is taken (see PAM methods).

    After 10.0 sec at 0.5% = 1 umols PAR After 10.0 sec at 0.7% = 1 umols PAR After 10.0 sec at 1.1% = 0.96 umols PAR After 10.0 sec at 1.6% = 4.32 umols PAR After 10.0 sec at 2.4% = 4.32 umols PAR After 10.0 sec at 3.6% = 8.31 umols PAR After 10.0 sec at 5.3% =15.78 umols PAR After 10.0 sec at 8.0% = 25.75 umols PAR

    This dataset appears to be missing data, note D5 rows potentially not useable information

    See the word document in the download file for more information.

  14. PEPS South Australia - Application - SARIG catalogue

    • pid.sarig.sa.gov.au
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pid.sarig.sa.gov.au (2025). PEPS South Australia - Application - SARIG catalogue [Dataset]. https://pid.sarig.sa.gov.au/dataset/mesac197
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Government of South Australiahttp://sa.gov.au/
    Area covered
    Australia, South Australia
    Description

    PEPS South Australia is now a web based system containing a wide range of technical data relevant to the petroleum and geothermal industries. Currently accessible modules and data include: Well Details, Log Prints, Logs Digital and Production... PEPS South Australia is now a web based system containing a wide range of technical data relevant to the petroleum and geothermal industries. Currently accessible modules and data include: Well Details, Log Prints, Logs Digital and Production Summary - available via the Wells Modules. Production Details, Monthly Data and Charts - available in the Production Modules. Production data can be viewed in metric or imperial units. Log Files are now available for download through the Well Files page.

  15. Scanned data logs, ship logs and reports from the ADBEX III voyage of the...

    • researchdata.edu.au
    • data.aad.gov.au
    Updated Jan 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CONNELL, DAVE J.; Commonwealth of Australia; CONNELL, DAVE J.; CONNELL, DAVE J. (2024). Scanned data logs, ship logs and reports from the ADBEX III voyage of the Nella Dan, October to December, 1985 [Dataset]. https://researchdata.edu.au/scanned-logs-ship-december-1985/3651499
    Explore at:
    Dataset updated
    Jan 17, 2024
    Dataset provided by
    Australian Antarctic Divisionhttps://www.antarctica.gov.au/
    Australian Antarctic Data Centre
    Authors
    CONNELL, DAVE J.; Commonwealth of Australia; CONNELL, DAVE J.; CONNELL, DAVE J.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 8, 1985 - Dec 28, 1985
    Area covered
    Description

    The dataset contains scanned copies of a number of logs, reports and documents from the Antarctic Division Biomass Experiment III( ADBEX III) voyage of the Nella Dan from October to December, 1985. The documents cover CTD data, cell count graphs, acoustic logs, ship logs, and other files.

    The dataset download contains the following files:

    ADBEX 3 CTD.pdf
    CTD Log ADBEX 3 Electronics Lab CTD Unit.pdf
    Master Station Log - Vol 1 ADBEX 3.pdf
    ND0018586 ADBEX 3 Acoustic Log Book 1 of 2 Copy 3.pdf
    ND0018586 ADBEX 3 Acoustic Log Book 2 of 2 Copy 2.pdf
    ADBEX 3 CTD Cell Count Graphs.pdf

  16. Z

    Sci-Hub download log 2011-2013

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elbakyan, Alexandra (2022). Sci-Hub download log 2011-2013 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5918541
    Explore at:
    Dataset updated
    Jan 30, 2022
    Dataset authored and provided by
    Elbakyan, Alexandra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The included file ancient.sci-hub.stats.tab contains raw unprocessed PDF download log from Sci-Hub starting from 22 September 2011 and ending on 14 October 2013. Statistics from 14 October to 18 Jan 2014 were not recorded for technical reasons. Statistics before 22 September were not collected.

    Columns in the data file:

    Timestamp (yyyy-MM-dd HH:mm:ss)

    DOI

    URL

    IP address

    User identifier

    Country

    originally published at: https://web.archive.org/web/20200702074701/https://twitter.com/Sci_Hub/status/1221827163781058562

  17. ORIGINAL-NETWORK-TRAFFIC-Friday-02-03-2018-PACP

    • kaggle.com
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karen Pamela López (2023). ORIGINAL-NETWORK-TRAFFIC-Friday-02-03-2018-PACP [Dataset]. https://www.kaggle.com/datasets/karenp/original-network-traffic-friday-02-03-2018-pacp
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Karen Pamela López
    Description

    This data set was originally downloaded from: https://www.unb.ca/cic/datasets/ids-2018.html

    The data set has a weight of 466GB.

    When the download is done, the file contains 2 folders: Processed Traffic Data for ML Algorithms and Original network traffic and log data.

    The "Processed Traffic Data for ML Algorithms" folder contains 10 csv files with the following names:

    • Friday-02-03-2018_TrafficForML_CICFlowMeter
    • Friday-16-02-2018_TrafficForML_CICFlowMeter
    • Friday-23-02-2018_TrafficForML_CICFlowMeter
    • Thuesday-20-02-2018_TrafficForML_CICFlowMeter
    • Thursday-01-03-2018_TrafficForML_CICFlowMeter
    • Thursday-15-02-2018_TrafficForML_CICFlowMeter
    • Thursday-22-02-2018_TrafficForML_CICFlowMeter
    • Wednesday-14-02-2018_TrafficForML_CICFlowMeter
    • Wednesday-21-02-2018_TrafficForML_CICFlowMeter
    • Wednesday-28-02-2018_TrafficForML_CICFlowMeter

    And the "Original Network Traffic and Log data" folder contains 10 folders, each folder is named as the previous files. Each folder contains in turn two folders logs and pcap.

    Here is the PCAP for Friday-02-03-2018

  18. d

    Data from: Who's downloading pirated papers? Everyone

    • datadryad.org
    • search.dataone.org
    • +1more
    zip
    Updated Apr 22, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra Elbakyan; John Bohannon (2017). Who's downloading pirated papers? Everyone [Dataset]. http://doi.org/10.5061/dryad.q447c
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2017
    Dataset provided by
    Dryad
    Authors
    Alexandra Elbakyan; John Bohannon
    Time period covered
    Apr 21, 2016
    Area covered
    global, Global
    Description

    Sci-Hub download dataThese data include 28 million download request events from the server logs of Sci-Hub from 1 September 2015 through 29 February 2016. The uncompressed 2.7 gigabytes of data are separated into 6 data files, one for each month, in tab-delimited text format.scihub_data.zipIPython Notebook for Sci-Hub raw dataIPython Notebook used to process the raw server log data (processing the GIS files into CSV, scraping DOI metadata, etc.).Sci-Hub.htmlSci-Hub.ipynbSci-Hub publisher DOI prefixesData scraped from the CrossRef website which can be used to replicate the analysis of downloads by publisher.publisher_DOI_prefixes.csv

  19. Network Traffic Dataset

    • kaggle.com
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ravikumar Gattu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

    The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

    Content :

    This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

    The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

    Dataset Columns:

    No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

    Acknowledgements :

    I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

    Ravikumar Gattu , Susmitha Choppadandi

    Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

    **Dataset License: ** CC0: Public Domain

    Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

    ML techniques benefits from this Dataset :

    This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

    1. Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

    2. Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

    3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.

  20. Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

    • zenodo.org
    • explore.openaire.eu
    bz2
    Updated Mar 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
    Explore at:
    bz2Available download formats
    Dataset updated
    Mar 15, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

    Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

    This repository contains two files:

    • dump.tar.bz2
    • jupyter_reproducibility.tar.bz2

    The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

    The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

    • analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.
    • archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.
    • paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

    In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

    Reproducing the Analysis

    This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

    Ubuntu 18.04.1 LTS
    PostgreSQL 10.6
    Conda 4.5.11
    Python 3.7.2
    PdfCrop 2012/11/02 v1.38

    First, download dump.tar.bz2 and extract it:

    tar -xjf dump.tar.bz2

    It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

    psql jupyter < db2019-03-13.dump

    It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Create a conda environment with Python 3.7:

    conda create -n analyses python=3.7
    conda activate analyses

    Go to the analyses folder and install all the dependencies of the requirements.txt

    cd jupyter_reproducibility/analyses
    pip install -r requirements.txt

    For reproducing the analyses, run jupyter on this folder:

    jupyter notebook

    Execute the notebooks on this order:

    • Index.ipynb
    • N0.Repository.ipynb
    • N1.Skip.Notebook.ipynb
    • N2.Notebook.ipynb
    • N3.Cell.ipynb
    • N4.Features.ipynb
    • N5.Modules.ipynb
    • N6.AST.ipynb
    • N7.Name.ipynb
    • N8.Execution.ipynb
    • N9.Cell.Execution.Order.ipynb
    • N10.Markdown.ipynb
    • N11.Repository.With.Notebook.Restriction.ipynb
    • N12.To.Paper.ipynb

    Reproducing or Expanding the Collection

    The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

    Requirements

    This time, we have extra requirements:

    All the analysis requirements
    lbzip2 2.5
    gcc 7.3.0
    Github account
    Gmail account

    Environment

    First, set the following environment variables:

    export JUP_MACHINE="db"; # machine identifier
    export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
    export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
    export JUP_COMPRESSION="lbzip2"; # compression program
    export JUP_VERBOSE="5"; # verbose level
    export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
    export JUP_GITHUB_USERNAME="github_username"; # your github username
    export JUP_GITHUB_PASSWORD="github_password"; # your github password
    export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
    export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
    export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
    export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
    export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
    export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
    export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
    export JUP_WITH_EXECUTION="1"; # run execute python notebooks
    export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
    export JUP_EXECUTION_MODE="-1"; # run following the execution order
    export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
    export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
    export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
    export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
    export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
    
    
    # Frequenci of log report
    export JUP_ASTROID_FREQUENCY="5";
    export JUP_IPYTHON_FREQUENCY="5";
    export JUP_NOTEBOOKS_FREQUENCY="5";
    export JUP_REQUIREMENT_FREQUENCY="5";
    export JUP_CRAWLER_FREQUENCY="1";
    export JUP_CLONE_FREQUENCY="1";
    export JUP_COMPRESS_FREQUENCY="5";
    
    export JUP_DB_IP="localhost"; # postgres database IP

    Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

    Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

    Scripts

    Download and extract jupyter_reproducibility.tar.bz2:

    tar -xjf jupyter_reproducibility.tar.bz2

    Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

    Conda 2.7

    conda create -n raw27 python=2.7 -y
    conda activate raw27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 2.7

    conda create -n py27 python=2.7 anaconda -y
    conda activate py27
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    

    Conda 3.4

    It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

    conda create -n raw34 python=3.4 -y
    conda activate raw34
    conda install jupyter -c conda-forge -y
    conda uninstall jupyter -y
    pip install --upgrade pip
    pip install jupyter
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology
    pip install pathlib2

    Anaconda 3.4

    conda create -n py34 python=3.4 anaconda -y
    conda activate py34
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.5

    conda create -n raw35 python=3.5 -y
    conda activate raw35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.5

    It requires the manual installation of other anaconda packages.

    conda create -n py35 python=3.5 anaconda -y
    conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
    conda activate py35
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.6

    conda create -n raw36 python=3.6 -y
    conda activate raw36
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Anaconda 3.6

    conda create -n py36 python=3.6 anaconda -y
    conda activate py36
    conda install -y anaconda-navigator jupyterlab_server navigator-updater
    pip install --upgrade pip
    pip install pipenv
    pip install -e jupyter_reproducibility/archaeology

    Conda 3.7

    <code

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mbasa MOLO; Victor Akande; Nzanzu Vingi Patrick; Joke Badejo; Emmanuel Adetiba (2021). OpenStack log files [Dataset]. http://doi.org/10.6084/m9.figshare.17025353.v1

OpenStack log files

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Nov 16, 2021
Dataset provided by
figshare
Authors
Mbasa MOLO; Victor Akande; Nzanzu Vingi Patrick; Joke Badejo; Emmanuel Adetiba
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains log information of a cloud computing infrastructure based on OpenStack.Three different files are available, including the nova, cinder, and glance log files. Due to the fact that the data is unbalanced, a CSV file containing log information of the three OpenStack applications is provided. This can be used for testing in case the log files are used for a machine learning purpose. These data were collected from the Federated Genominc (FEDGEN) cloud computing infrastructure hosted in Covenant Unversity under the Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE) project funded by the World Bank.

Search
Clear search
Close search
Google apps
Main menu