25 datasets found

f
OpenStack log files
figshare.com
zip
Updated Nov 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mbasa MOLO; Victor Akande; Nzanzu Vingi Patrick; Joke Badejo; Emmanuel Adetiba (2021). OpenStack log files [Dataset]. http://doi.org/10.6084/m9.figshare.17025353.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17025353.v1
Dataset updated
Nov 16, 2021
Dataset provided by
figshare
Authors
Mbasa MOLO; Victor Akande; Nzanzu Vingi Patrick; Joke Badejo; Emmanuel Adetiba
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains log information of a cloud computing infrastructure based on OpenStack.Three different files are available, including the nova, cinder, and glance log files. Due to the fact that the data is unbalanced, a CSV file containing log information of the three OpenStack applications is provided. This can be used for testing in case the log files are used for a machine learning purpose. These data were collected from the Federated Genominc (FEDGEN) cloud computing infrastructure hosted in Covenant Unversity under the Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE) project funded by the World Bank.
AIT Log Data Set V2.0
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Jun 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber (2024). AIT Log Data Set V2.0 [Dataset]. http://doi.org/10.5281/zenodo.5789064
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5789064
Dataset updated
Jun 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber; Max Landauer; Florian Skopik; Maximilian Frank; Wolfgang Hotwagner; Markus Wurzenberger; Andreas Rauber
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
AIT Log Data Sets

This repository contains synthetic log data suitable for evaluation of intrusion detection systems, federated learning, and alert aggregation. A detailed description of the dataset is available in [1]. The logs were collected from eight testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by [2]. Please cite these papers if the data is used for academic publications.

In brief, each of the datasets corresponds to a testbed representing a small enterprise network including mail server, file share, WordPress server, VPN, firewall, etc. Normal user behavior is simulated to generate background noise over a time span of 4-6 days. At some point, a sequence of attack steps is launched against the network. Log data is collected from all hosts and includes Apache access and error logs, authentication logs, DNS logs, VPN logs, audit logs, Suricata logs, network traffic packet captures, horde logs, exim logs, syslog, and system monitoring logs. Separate ground truth files are used to label events that are related to the attacks. Compared to the AIT-LDSv1.1, a more complex network and diverse user behavior is simulated, and logs are collected from all hosts in the network. If you are only interested in network traffic analysis, we also provide the AIT-NDS containing the labeled netflows of the testbed networks. We also provide the AIT-ADS, an alert data set derived by forensically applying open-source intrusion detection systems on the log data.

The datasets in this repository have the following structure:

The gather directory contains all logs collected from the testbed. Logs collected from each host are located in gather/.

The labels directory contains the ground truth of the dataset that indicates which events are related to attacks. The directory mirrors the structure of the gather directory so that each label files is located at the same path and has the same name as the corresponding log file. Each line in the label files references the log event corresponding to an attack by the line number counted from the beginning of the file ("line"), the labels assigned to the line that state the respective attack step ("labels"), and the labeling rules that assigned the labels ("rules"). An example is provided below.

The processing directory contains the source code that was used to generate the labels.

The rules directory contains the labeling rules.

The environment directory contains the source code that was used to deploy the testbed and run the simulation using the Kyoushi Testbed Environment.

The dataset.yml file specifies the start and end time of the simulation.

The following table summarizes relevant properties of the datasets:

fox

Simulation time: 2022-01-15 00:00 - 2022-01-20 00:00

Attack time: 2022-01-18 11:59 - 2022-01-18 13:15

Scan volume: High

Unpacked size: 26 GB

harrison

Simulation time: 2022-02-04 00:00 - 2022-02-09 00:00

Attack time: 2022-02-08 07:07 - 2022-02-08 08:38

Scan volume: High

Unpacked size: 27 GB

russellmitchell

Simulation time: 2022-01-21 00:00 - 2022-01-25 00:00

Attack time: 2022-01-24 03:01 - 2022-01-24 04:39

Scan volume: Low

Unpacked size: 14 GB

santos

Simulation time: 2022-01-14 00:00 - 2022-01-18 00:00

Attack time: 2022-01-17 11:15 - 2022-01-17 11:59

Scan volume: Low

Unpacked size: 17 GB

shaw

Simulation time: 2022-01-25 00:00 - 2022-01-31 00:00

Attack time: 2022-01-29 14:37 - 2022-01-29 15:21

Scan volume: Low

Data exfiltration is not visible in DNS logs

Unpacked size: 27 GB

wardbeck

Simulation time: 2022-01-19 00:00 - 2022-01-24 00:00

Attack time: 2022-01-23 12:10 - 2022-01-23 12:56

Scan volume: Low

Unpacked size: 26 GB

wheeler

Simulation time: 2022-01-26 00:00 - 2022-01-31 00:00

Attack time: 2022-01-30 07:35 - 2022-01-30 17:53

Scan volume: High

No password cracking in attack chain

Unpacked size: 30 GB

wilson

Simulation time: 2022-02-03 00:00 - 2022-02-09 00:00

Attack time: 2022-02-07 10:57 - 2022-02-07 11:49

Scan volume: High

Unpacked size: 39 GB

The following attacks are launched in the network:

Scans (nmap, WPScan, dirb)

Webshell upload (CVE-2020-24186)

Password cracking (John the Ripper)

Privilege escalation

Remote command execution

Data exfiltration (DNSteal)

Note that attack parameters and their execution orders vary in each dataset. Labeled log files are trimmed to the simulation time to ensure that their labels (which reference the related event by the line number in the file) are not misleading. Other log files, however, also contain log events generated before or after the simulation time and may therefore be affected by testbed setup or data collection. It is therefore recommended to only consider logs with timestamps within the simulation time for analysis.

The structure of labels is explained using the audit logs from the intranet server in the russellmitchell data set as an example in the following. The first four labels in the labels/intranet_server/logs/audit/audit.log file are as follows:

{"line": 1860, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

{"line": 1861, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

{"line": 1862, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

{"line": 1863, "labels": ["attacker_change_user", "escalate"], "rules": {"attacker_change_user": ["attacker.escalate.audit.su.login"], "escalate": ["attacker.escalate.audit.su.login"]}}

Each JSON object in this file assigns a label to one specific log line in the corresponding log file located at gather/intranet_server/logs/audit/audit.log. The field "line" in the JSON objects specify the line number of the respective event in the original log file, while the field "labels" comprise the corresponding labels. For example, the lines in the sample above provide the information that lines 1860-1863 in the gather/intranet_server/logs/audit/audit.log file are labeled with "attacker_change_user" and "escalate" corresponding to the attack step where the attacker receives escalated privileges. Inspecting these lines shows that they indeed correspond to the user authenticating as root:

type=USER_AUTH msg=audit(1642999060.603:2226): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:authentication acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

type=USER_ACCT msg=audit(1642999060.603:2227): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:accounting acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

type=CRED_ACQ msg=audit(1642999060.615:2228): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:setcred acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

type=USER_START msg=audit(1642999060.627:2229): pid=27950 uid=33 auid=4294967295 ses=4294967295 msg='op=PAM:session_open acct="jhall" exe="/bin/su" hostname=? addr=? terminal=/dev/pts/1 res=success'

The same applies to all other labels for this log file and all other log files. There are no labels for logs generated by "normal" (i.e., non-attack) behavior; instead, all log events that have no corresponding JSON object in one of the files from the labels directory, such as the lines 1-1859 in the example above, can be considered to be labeled as "normal". This means that in order to figure out the labels for the log data it is necessary to store the line numbers when processing the original logs from the gather directory and see if these line numbers also appear in the corresponding file in the labels directory.

Beside the attack labels, a general overview of the exact times when specific attack steps are launched are available in gather/attacker_0/logs/attacks.log. An enumeration of all hosts and their IP addresses is stated in processing/config/servers.yml. Moreover, configurations of each host are provided in gather/ and gather/.

Version history:

AIT-LDS-v1.x: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.

AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.

Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU projects GUARD (833456) and PANDORA (SI2.835928).

If you use the dataset, please cite the following publications:

[1] M. Landauer, F. Skopik, M. Frank, W. Hotwagner,
q
Computational Log files from Gaussian09
data.researchdatafinder.qut.edu.au
Updated Apr 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Computational Log files from Gaussian09 [Dataset]. https://data.researchdatafinder.qut.edu.au/ar/dataset/4b78b73f-ed46-42f8-b7d9-1150d03e78d2/resources/6b5ab72b-fb48-4812-a9d0-f0472967ed4b/history/6b1d0f0c-f32a-4c52-8c1c-109a7128402d
Explore at:
Dataset updated
Apr 29, 2024
License
http://researchdatafinder.qut.edu.au/display/n7962http://researchdatafinder.qut.edu.au/display/n7962
Description
Log files including optimised structures and transition states from Gaussian09 calculations QUT Research Data Respository Dataset Resource available for download
Utah FORGE: Well 58-32 Schlumberger FMI Logs DLIS and XML files
gdr.openei.org
data.openei.org
+2more
archive, data +1
Updated Nov 17, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greg Nash; Joe Moore; Greg Nash; Joe Moore (2017). Utah FORGE: Well 58-32 Schlumberger FMI Logs DLIS and XML files [Dataset]. http://doi.org/10.15121/1464529
Explore at:
archive, website, dataAvailable download formats
Unique identifier
https://doi.org/10.15121/1464529
Dataset updated
Nov 17, 2017
Dataset provided by
United States Department of Energyhttp://energy.gov/
Geothermal Data Repository
Energy and Geoscience Institute at the University of Utah
Authors
Greg Nash; Joe Moore; Greg Nash; Joe Moore
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This zipped data set includes Schlumberger FMI logs DLIS and XML files from Utah FORGE deep well 58-32. These include runs 1 (2226-7550 ft) and 2 (7440-7550 ft). Run 3 (7390-7527ft) was acquired during phase 2C.
f
Event Logs CSV
figshare.com
rar
Updated Dec 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dina Bayomie (2019). Event Logs CSV [Dataset]. http://doi.org/10.6084/m9.figshare.11342063.v1
Explore at:
rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11342063.v1
Dataset updated
Dec 9, 2019
Dataset provided by
figshare
Authors
Dina Bayomie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The event logs in CSV format. The dataset contains both correlated and uncorrelated logs
BSEE Data Center - Scanned Units Query
catalog.data.gov
Updated Feb 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Safety and Environmental Enforcement (2025). BSEE Data Center - Scanned Units Query [Dataset]. https://catalog.data.gov/dataset/bsee-data-center-scanned-units-query-f51ac
Explore at:
Dataset updated
Feb 26, 2025
Dataset provided by
Bureau of Safety and Environmental Enforcementhttp://www.bsee.gov/
Description
Scanned Units Query - You can now request these same well files, well logs, and well data as a free download through the File Request System ( https://www.data.bsee.gov/Other/FileRequestSystem/Default.aspx ). The Disc Media Store will be removed at some point in the future.
Processed Datasets - Imputation in Well Log Data: A Benchmark
zenodo.org
application/gzip
Updated May 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pedro H. T. Gama; Pedro H. T. Gama; Jackson Faria; Jessica Sena; Jessica Sena; Francisco Neves; Francisco Neves; Vinícius R. Riffel; Vinícius R. Riffel; Lucas Perez; Lucas Perez; André Korenchendler; André Korenchendler; Matheus C. A. Sobreira; Matheus C. A. Sobreira; Alexei M. C. Machado; Alexei M. C. Machado; Jackson Faria (2024). Processed Datasets - Imputation in Well Log Data: A Benchmark [Dataset]. http://doi.org/10.5281/zenodo.10987946
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10987946
Dataset updated
May 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pedro H. T. Gama; Pedro H. T. Gama; Jackson Faria; Jessica Sena; Jessica Sena; Francisco Neves; Francisco Neves; Vinícius R. Riffel; Vinícius R. Riffel; Lucas Perez; Lucas Perez; André Korenchendler; André Korenchendler; Matheus C. A. Sobreira; Matheus C. A. Sobreira; Alexei M. C. Machado; Alexei M. C. Machado; Jackson Faria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 17, 2024
Description
Imputation of well log data is a common task in the field. However a quick review of the literature reveals a lack of padronization when evaluating methods for the problem. The goal of the benchmark is to introduce a standard evaluation protocol to any imputation method for well log data.

In the proposed benchmark, three public datasets are used:

Geolink: The Geolink Dataset is another public dataset of wells in the Norwegian offshore. The data is provided by the company of the same name, GEOLINK and follows the NOLD 2.0 license.
This dataset contains a total of 223 wells. It also has lithology labels for the wells with a total of 36 lithology classes. [download original]

Taranaki Basin: The Taranaki Basin Dataset is a curated set of wells and a convenient option for experimentation especially due to it is ease of accessibility and use.
This collection, under the CDLA-Sharing-1.0 license, contains well logs extracted from the New Zealand Petroleum & Minerals Online Exploration Database and Petlab.
There are a total of 407 wells, of which 289 are onshore and 118 are offshore exploration and production wells. [download original]

Teapot Dome: The Teapot Dome dataset is provided by the Rocky Mountain Oilfield Testing Center (RMOTC) and the US Department of Energy.
It contains different types of data related to the Teapot Dome oil field, such as 2D and 3D seismic data, well logs, and GIS data. The data is licensed under the Creative Commons 4.0 license.
In total, the dataset has 1,179 wells with available logs. The number of available logs varies across wells. There are only 91 wells with the gamma ray, bulk density, and neutron porosity logs, while only three wells have the complete basic suite. [direct download]

Here you can download all three datasets already preprocessed to be used with our implementation, found here.

File Description:

There are six files for each fold partition for each dataset.

datasetname_fold_k_well_log_metadata_train.json : JSON file with general information of the slices of training partition of the fold k. Contains total number of slices and the number of slices per well.

datasetname_fold_k_well_log_metadata_val.json : JSON file with general information of the slices of validation partition of the fold k. Contains total number of slices and the number of slices per well.

datasetname_fold_k_well_log_slices_train.npy: .npy (numpy) file ready to be loaded with the slices for training of the fold k already processed. When loaded should have shape of (total_slices, 256, number_of_logs)

datasetname_fold_k_well_log_slices_val.npy : .npy (numpy) file ready to be loaded with the slices for validation of the fold k already processed.

datasetname_fold_k_well_log_slices_meta_train.json : JSON file with the slices info for all slices in the training partition of the fold k. For each slice, 7 data points are provided, the last four are discarded (it would contain other information that was not used). The first three are in order the: origin well name, the starting position in that well, and the end position of the slice in that well.

datasetname_fold_k_well_log_slices_meta_val.json : JSON file with the slices info for all slices in the validation partition of the fold k.
Event Log Sampling Datasets
figshare.com
zip
Updated Jul 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CONG LIU (2022). Event Log Sampling Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.20354505.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20354505.v1
Dataset updated
Jul 22, 2022
Dataset provided by
figshare
Authors
CONG LIU
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This datasets includes 9 event logs, which can be used to experiment with log completeness-oriented event log sampling methods.

· exercise.xes: The dataset is a simulation log generated by the paper review process model, and each trace clearly describes the process of reviewing papers in detail.

· training_log_1/3/8.xes: These 3 datasets are human-trained simulation logs for the 2016 Process Discovery Competition (PDC 2016). Each trace consists of two values, the name of the process model activity referenced by the event and the identifier of the case to which the event belongs.

· Production.xes: This dataset includes process data from production processes, and each track includes data for cases, activities, resources, timestamps, and more data fields.

· BPIC_2012_A/O/W.xes: These 3 dataset are derived from the personal loan application process of a financial institution in the Netherlands. The process represented in the event log is the application process of a personal loan or overdraft in a global financing organization. Each trace describes the process of applying for a personal loan for different customers.

· CrossHospital.xes: The dataset includes the treatment process data of emergency patients in the hospital, and each track represents the treatment process of an emergency patient in the hospital.
LoRaWAN Traffic Analysis Dataset
zenodo.org
zip
Updated Aug 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ales Povalac; Ales Povalac; Jan Kral; Jan Kral (2023). LoRaWAN Traffic Analysis Dataset [Dataset]. http://doi.org/10.5281/zenodo.7919213
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7919213
Dataset updated
Aug 28, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ales Povalac; Ales Povalac; Jan Kral; Jan Kral
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was created by a LoRaWAN sniffer and contains packets, which are thoroughly analyzed in the paper Exploring LoRaWAN Traffic: In-Depth Analysis of IoT Network Communications (not yet published). Data from the LoRaWAN sniffer was collected in four cities: Liege (Belgium), Graz (Austria), Vienna (Austria), and Brno (Czechia).

Gateway ID: b827ebafac000001

Uplink reception (end-device => gateway)

Only packets containing CRC, inverted IQ

RX0: 867.1 MHz, 867.3 MHz, 867.5 MHz, 867.7 MHz, 867.9 MHz - BW 125 kHz and all SF

RX1: 868.1 MHz, 868.3 MHz, 868.5 MHz - BW 125 kHz and all SF

Gateway ID: b827ebafac000002

Downlink reception (gateway => end-device)

Includes packets without CRC, non-inverted IQ

RX0: 867.1 MHz, 867.3 MHz, 867.5 MHz, 867.7 MHz, 867.9 MHz - BW 125 kHz and all SF

RX1: 868.1 MHz, 868.3 MHz, 868.5 MHz - BW 125 kHz and all SF

Gateway ID: b827ebafac000003

Downlink reception (gateway => end-device) and Class-B beacon on 869.525 MHz

Includes packets without CRC, non-inverted IQ

RX0: 869.525 MHz - BW 125 kHz and all SF, BW 125 kHz and SF9 with implicit header, CR 4/5 and length 17 B

To open the pcap files, you need Wireshark with current support for LoRaTap and LoRaWAN protocols. This support will be available in the official 4.1.0 release. A working version for Windows is accessible in the automated build system.

The source data is available in the log.zip file, which contains the complete dataset obtained by the sniffer. A set of conversion tools for log processing is available on Github. The converted logs, available in Wireshark format, are stored in pcap.zip. For the LoRaWAN decoder, you can use the attached root and session keys. The processed outputs are stored in csv.zip, and graphical statistics are available in png.zip.

This data represents a unique, geographically identifiable selection from the full log, cleaned of any errors. The records from Brno include communication between the gateway and a node with known keys.

Test file :: 00_Test

short test file for parser verification

comparison of LoRaTap version 0 and version 1 formats

Brno, Czech Republic :: 01_Brno

49.22685N, 16.57536E, ASL 306m

lines 150873 to 529796

time 1.8.2022 15:04:28 to 17.8.2022 13:05:32

preliminary experiment

experimental device

Device EUI: 70b3d5cee0000042

Application key: d494d49a7b4053302bdcf96f1defa65a

Device address: 00d85395

Network session key: c417540b8b2afad8930c82fcf7ea54bb

Application session key: 421fea9bedd2cc497f63303edf5adf8e

Liege, Belgium :: 02_Liege :: evaluated in the paper

50.66445N, 5.59276E, ASL 151m

lines 636205 to 886868

time 25.8.2022 10:12:24 to 12.9.2022 06:20:48

Brno, Czech Republic :: 03_Brno_join

49.22685N, 16.57536E, ASL 306m

lines 947787 to 979382

time 30.9.2022 15:21:27 to 4.10.2022 10:46:31

record contains OTAA activation (Join Request / Join Accept)

experimental device:

Device EUI: 70b3d5cee0000042

Application key: d494d49a7b4053302bdcf96f1defa65a

Device address: 01e65ddc

Network session key: e2898779a03de59e2317b149abf00238

Application session key: 59ca1ac91922887093bc7b236bd1b07f

Graz, Austria :: 04_Graz :: evaluated in the paper

47.07049N, 15.44506E, ASL 364m

lines 1015139 to 1178855

time 26.10.2022 06:21:07 to 29.11.2022 10:03:00

Vienna, Austria :: 05_Wien :: evaluated in the paper

48.19666N, 16.37101E, ASL 204m

lines 1179308 to 3657105

time 1.12.2022 10:42:19 to 4.1.2023 14:00:05

contains a total of 14 short restarts (under 90 seconds)

Brno, Czech Republic :: 07_Brno :: evaluated in the paper

49.22685N, 16.57536E, ASL 306m

lines 4969648 to 6919392

time 16.2.2023 8:53:43 to 30.3.2023 9:00:11
d
Ministry of Public Administration and Security_Public Data Usage (File_API)
data.go.kr
csv
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Ministry of Public Administration and Security_Public Data Usage (File_API) [Dataset]. https://www.data.go.kr/en/data/15076332/fileData.do
Explore at:
csvAvailable download formats
Dataset updated
Jun 17, 2025
License
https://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do
Description
It provides the number of downloads and API utilization requests by year (2011-2023) of file data registered in the public data portal, and is useful for analyzing the trend of increase in public data utilization. The file format is provided in CSV format, and the meta items are statistical year, registration agency, list name, data name, file downloads, and API utilization requests. You can download file data from the public data portal without logging in, and to utilize the open API, you must register as a public data portal member and log in to apply for utilization.
Scanned data logs, ship logs and reports from the ADBEX II voyage of the...
researchdata.edu.au
data.aad.gov.au
Updated Jan 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CONNELL, DAVE J.; Commonwealth of Australia; CONNELL, DAVE J. (2024). Scanned data logs, ship logs and reports from the ADBEX II voyage of the Nella Dan, January to February, 1984 [Dataset]. https://researchdata.edu.au/scanned-logs-ship-february-1984/2839164
Explore at:
Dataset updated
Jan 16, 2024
Dataset provided by
Australian Antarctic Divisionhttps://www.antarctica.gov.au/
Australian Ocean Data Network
Australian Antarctic Data Centre
Authors
CONNELL, DAVE J.; Commonwealth of Australia; CONNELL, DAVE J.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 14, 1984 - Feb 6, 1984
Area covered

Description
The dataset contains scanned copies of a number of logs, reports and documents from the Antarctic Division Biomass Experiment II( ADBEX II) voyage of the Nella Dan in January and February of 1984. The documents cover CTD data and reports, meteorological logs, ship logs, and other files.

The dataset download contains the following files:

ADBEX 2 CTD DATA 1m 2m Averages EJW 12-86.pdf
ADBEX 2 CTD Report.txt Data Summary 2m Averages STNS 1 - 5 5-86.pdf
ADBEX 2 CTD Stations Graphs.pdf
ADBEX 2 RIF2.DAT.pdf
Met Log - ADBEX 2.pdf
ADBEX 2 - SIBEX 1 Ships Log - Nella Dan.pdf
ADBEX 2 SIBEX 1 Log Book 1984.pdf
ADBEX 2 Analysis Programs.pdf
Z
PIPr: A Dataset of Public Infrastructure as Code Programs
data.niaid.nih.gov
zenodo.org
Updated Nov 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvaneschi, Guido (2023). PIPr: A Dataset of Public Infrastructure as Code Programs [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8262770
Explore at:
Dataset updated
Nov 28, 2023
Dataset provided by
Spielmann, David
Sokolowski, Daniel
Salvaneschi, Guido
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Programming Languages Infrastructure as Code (PL-IaC) enables IaC programs written in general-purpose programming languages like Python and TypeScript. The currently available PL-IaC solutions are Pulumi and the Cloud Development Kits (CDKs) of Amazon Web Services (AWS) and Terraform. This dataset provides metadata and initial analyses of all public GitHub repositories in August 2022 with an IaC program, including their programming languages, applied testing techniques, and licenses. Further, we provide a shallow copy of the head state of those 7104 repositories whose licenses permit redistribution. The dataset is available under the Open Data Commons Attribution License (ODC-By) v1.0. Contents:

metadata.zip: The dataset metadata and analysis results as CSV files. scripts-and-logs.zip: Scripts and logs of the dataset creation. LICENSE: The Open Data Commons Attribution License (ODC-By) v1.0 text. README.md: This document. redistributable-repositiories.zip: Shallow copies of the head state of all redistributable repositories with an IaC program. This artifact is part of the ProTI Infrastructure as Code testing project: https://proti-iac.github.io. Metadata The dataset's metadata comprises three tabular CSV files containing metadata about all analyzed repositories, IaC programs, and testing source code files. repositories.csv:

ID (integer): GitHub repository ID url (string): GitHub repository URL downloaded (boolean): Whether cloning the repository succeeded name (string): Repository name description (string): Repository description licenses (string, list of strings): Repository licenses redistributable (boolean): Whether the repository's licenses permit redistribution created (string, date & time): Time of the repository's creation updated (string, date & time): Time of the last update to the repository pushed (string, date & time): Time of the last push to the repository fork (boolean): Whether the repository is a fork forks (integer): Number of forks archive (boolean): Whether the repository is archived programs (string, list of strings): Project file path of each IaC program in the repository programs.csv:

ID (string): Project file path of the IaC program repository (integer): GitHub repository ID of the repository containing the IaC program directory (string): Path of the directory containing the IaC program's project file solution (string, enum): PL-IaC solution of the IaC program ("AWS CDK", "CDKTF", "Pulumi") language (string, enum): Programming language of the IaC program (enum values: "csharp", "go", "haskell", "java", "javascript", "python", "typescript", "yaml") name (string): IaC program name description (string): IaC program description runtime (string): Runtime string of the IaC program testing (string, list of enum): Testing techniques of the IaC program (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") tests (string, list of strings): File paths of IaC program's tests testing-files.csv:

file (string): Testing file path language (string, enum): Programming language of the testing file (enum values: "csharp", "go", "java", "javascript", "python", "typescript") techniques (string, list of enum): Testing techniques used in the testing file (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") keywords (string, list of enum): Keywords found in the testing file (enum values: "/go/auto", "/testing/integration", "@AfterAll", "@BeforeAll", "@Test", "@aws-cdk", "@aws-cdk/assert", "@pulumi.runtime.test", "@pulumi/", "@pulumi/policy", "@pulumi/pulumi/automation", "Amazon.CDK", "Amazon.CDK.Assertions", "Assertions_", "HashiCorp.Cdktf", "IMocks", "Moq", "NUnit", "PolicyPack(", "ProgramTest", "Pulumi", "Pulumi.Automation", "PulumiTest", "ResourceValidationArgs", "ResourceValidationPolicy", "SnapshotTest()", "StackValidationPolicy", "Testing", "Testing_ToBeValidTerraform(", "ToBeValidTerraform(", "Verifier.Verify(", "WithMocks(", "[Fact]", "[TestClass]", "[TestFixture]", "[TestMethod]", "[Test]", "afterAll(", "assertions", "automation", "aws-cdk-lib", "aws-cdk-lib/assert", "aws_cdk", "aws_cdk.assertions", "awscdk", "beforeAll(", "cdktf", "com.pulumi", "def test_", "describe(", "github.com/aws/aws-cdk-go/awscdk", "github.com/hashicorp/terraform-cdk-go/cdktf", "github.com/pulumi/pulumi", "integration", "junit", "pulumi", "pulumi.runtime.setMocks(", "pulumi.runtime.set_mocks(", "pulumi_policy", "pytest", "setMocks(", "set_mocks(", "snapshot", "software.amazon.awscdk.assertions", "stretchr", "test(", "testing", "toBeValidTerraform(", "toMatchInlineSnapshot(", "toMatchSnapshot(", "to_be_valid_terraform(", "unittest", "withMocks(") program (string): Project file path of the testing file's IaC program Dataset Creation scripts-and-logs.zip contains all scripts and logs of the creation of this dataset. In it, executions/executions.log documents the commands that generated this dataset in detail. On a high level, the dataset was created as follows:

A list of all repositories with a PL-IaC program configuration file was created using search-repositories.py (documented below). The execution took two weeks due to the non-deterministic nature of GitHub's REST API, causing excessive retries. A shallow copy of the head of all repositories was downloaded using download-repositories.py (documented below). Using analysis.ipynb, the repositories were analyzed for the programs' metadata, including the used programming languages and licenses. Based on the analysis, all repositories with at least one IaC program and a redistributable license were packaged into redistributable-repositiories.zip, excluding any node_modules and .git directories. Searching Repositories The repositories are searched through search-repositories.py and saved in a CSV file. The script takes these arguments in the following order:

Github access token. Name of the CSV output file. Filename to search for. File extensions to search for, separated by commas. Min file size for the search (for all files: 0). Max file size for the search or * for unlimited (for all files: *). Pulumi projects have a Pulumi.yaml or Pulumi.yml (case-sensitive file name) file in their root folder, i.e., (3) is Pulumi and (4) is yml,yaml. https://www.pulumi.com/docs/intro/concepts/project/ AWS CDK projects have a cdk.json (case-sensitive file name) file in their root folder, i.e., (3) is cdk and (4) is json. https://docs.aws.amazon.com/cdk/v2/guide/cli.html CDK for Terraform (CDKTF) projects have a cdktf.json (case-sensitive file name) file in their root folder, i.e., (3) is cdktf and (4) is json. https://www.terraform.io/cdktf/create-and-deploy/project-setup Limitations The script uses the GitHub code search API and inherits its limitations:

Only forks with more stars than the parent repository are included. Only the repositories' default branches are considered. Only files smaller than 384 KB are searchable. Only repositories with fewer than 500,000 files are considered. Only repositories that have had activity or have been returned in search results in the last year are considered. More details: https://docs.github.com/en/search-github/searching-on-github/searching-code The results of the GitHub code search API are not stable. However, the generally more robust GraphQL API does not support searching for files in repositories: https://stackoverflow.com/questions/45382069/search-for-code-in-github-using-graphql-v4-api Downloading Repositories download-repositories.py downloads all repositories in CSV files generated through search-respositories.py and generates an overview CSV file of the downloads. The script takes these arguments in the following order:

Name of the repositories CSV files generated through search-repositories.py, separated by commas. Output directory to download the repositories to. Name of the CSV output file. The script only downloads a shallow recursive copy of the HEAD of the repo, i.e., only the main branch's most recent state, including submodules, without the rest of the git history. Each repository is downloaded to a subfolder named by the repository's ID.
n
Respiration_chambers/raw_log_files and combined datasets of biomass and...
cmr.earthdata.nasa.gov
researchdata.edu.au
+1more
Updated Dec 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Respiration_chambers/raw_log_files and combined datasets of biomass and chamber data, and physical parameters [Dataset]. http://doi.org/10.26179/5c1827d5d6711
Explore at:
Unique identifier
https://doi.org/10.26179/5c1827d5d6711
Dataset updated
Dec 18, 2018
Time period covered
Jan 27, 2015 - Feb 23, 2015
Area covered
Description
General overview The following datasets are described by this metadata record, and are available for download from the provided URL.

Raw log files, physical parameters raw log files

Raw excel files, respiration/PAM chamber raw excel spreadsheets

Processed and cleaned excel files, respiration chamber biomass data

Raw rapid light curve excel files (this is duplicated from Raw log files), combined dataset pH, temperature, oxygen, salinity, velocity for experiment

Associated R script file for pump cycles of respirations chambers

####

Physical parameters raw log files

Raw log files 1) DATE= 2) Time= UTC+11 3) PROG=Automated program to control sensors and collect data 4) BAT=Amount of battery remaining 5) STEP=check aquation manual 6) SPIES=check aquation manual 7) PAR=Photoactive radiation 8) Levels=check aquation manual 9) Pumps= program for pumps 10) WQM=check aquation manual

####

Respiration/PAM chamber raw excel spreadsheets

Abbreviations in headers of datasets Note: Two data sets are provided in different formats. Raw and cleaned (adj). These are the same data with the PAR column moved over to PAR.all for analysis. All headers are the same. The cleaned (adj) dataframe will work with the R syntax below, alternative add code to do cleaning in R.

Date: ISO 1986 - Check Time:UTC+11 unless otherwise stated DATETIME: UTC+11 unless otherwise stated ID (of instrument in respiration chambers) ID43=Pulse amplitude fluoresence measurement of control ID44=Pulse amplitude fluoresence measurement of acidified chamber ID=1 Dissolved oxygen ID=2 Dissolved oxygen ID3= PAR ID4= PAR PAR=Photo active radiation umols F0=minimal florescence from PAM Fm=Maximum fluorescence from PAM Yield=(F0 – Fm)/Fm rChl=an estimate of chlorophyll (Note this is uncalibrated and is an estimate only) Temp=Temperature degrees C PAR=Photo active radiation PAR2= Photo active radiation2 DO=Dissolved oxygen %Sat= Saturation of dissolved oxygen Notes=This is the program of the underwater submersible logger with the following abreviations: Notes-1) PAM= Notes-2) PAM=Gain level set (see aquation manual for more detail) Notes-3) Acclimatisation= Program of slowly introducing treatment water into chamber Notes-4) Shutter start up 2 sensors+sample…= Shutter PAMs automatic set up procedure (see aquation manual) Notes-5) Yield step 2=PAM yield measurement and calculation of control Notes-6) Yield step 5= PAM yield measurement and calculation of acidified Notes-7) Abatus respiration DO and PAR step 1= Program to measure dissolved oxygen and PAR (see aquation manual). Steps 1-4 are different stages of this program including pump cycles, DO and PAR measurements.

8) Rapid light curve data Pre LC: A yield measurement prior to the following measurement After 10.0 sec at 0.5% to 8%: Level of each of the 8 steps of the rapid light curve Odessey PAR (only in some deployments): An extra measure of PAR (umols) using an Odessey data logger Dataflow PAR: An extra measure of PAR (umols) using a Dataflow sensor. PAM PAR: This is copied from the PAR or PAR2 column PAR all: This is the complete PAR file and should be used Deployment: Identifying which deployment the data came from

####

Respiration chamber biomass data

The data is chlorophyll a biomass from cores from the respiration chambers. The headers are: Depth (mm) Treat (Acidified or control) Chl a (pigment and indicator of biomass) Core (5 cores were collected from each chamber, three were analysed for chl a), these are psudoreplicates/subsamples from the chambers and should not be treated as replicates.

####

Associated R script file for pump cycles of respirations chambers

Associated respiration chamber data to determine the times when respiration chamber pumps delivered treatment water to chambers. Determined from Aquation log files (see associated files). Use the chamber cut times to determine net production rates. Note: Users need to avoid the times when the respiration chambers are delivering water as this will give incorrect results. The headers that get used in the attached/associated R file are start regression and end regression. The remaining headers are not used unless called for in the associated R script. The last columns of these datasets (intercept, ElapsedTimeMincoef) are determined from the linear regressions described below.

To determine the rate of change of net production, coefficients of the regression of oxygen consumption in discrete 180 minute data blocks were determined. R squared values for fitted regressions of these coefficients were consistently high (greater than 0.9). We make two assumptions with calculation of net production rates: the first is that heterotrophic community members do not change their metabolism under OA; and the second is that the heterotrophic communities are similar between treatments.

####

Combined dataset pH, temperature, oxygen, salinity, velocity for experiment

This data is rapid light curve data generated from a Shutter PAM fluorimeter. There are eight steps in each rapid light curve. Note: The software component of the Shutter PAM fluorimeter for sensor 44 appeared to be damaged and would not cycle through the PAR cycles. Therefore the rapid light curves and recovery curves should only be used for the control chambers (sensor ID43).

The headers are PAR: Photoactive radiation relETR: F0/Fm x PAR Notes: Stage/step of light curve Treatment: Acidified or control

The associated light treatments in each stage. Each actinic light intensity is held for 10 seconds, then a saturating pulse is taken (see PAM methods).

After 10.0 sec at 0.5% = 1 umols PAR After 10.0 sec at 0.7% = 1 umols PAR After 10.0 sec at 1.1% = 0.96 umols PAR After 10.0 sec at 1.6% = 4.32 umols PAR After 10.0 sec at 2.4% = 4.32 umols PAR After 10.0 sec at 3.6% = 8.31 umols PAR After 10.0 sec at 5.3% =15.78 umols PAR After 10.0 sec at 8.0% = 25.75 umols PAR

This dataset appears to be missing data, note D5 rows potentially not useable information

See the word document in the download file for more information.
PEPS South Australia - Application - SARIG catalogue
pid.sarig.sa.gov.au
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pid.sarig.sa.gov.au (2025). PEPS South Australia - Application - SARIG catalogue [Dataset]. https://pid.sarig.sa.gov.au/dataset/mesac197
Explore at:
Dataset updated
Jan 15, 2025
Dataset provided by
Government of South Australiahttp://sa.gov.au/
Area covered
Australia, South Australia
Description
PEPS South Australia is now a web based system containing a wide range of technical data relevant to the petroleum and geothermal industries. Currently accessible modules and data include: Well Details, Log Prints, Logs Digital and Production... PEPS South Australia is now a web based system containing a wide range of technical data relevant to the petroleum and geothermal industries. Currently accessible modules and data include: Well Details, Log Prints, Logs Digital and Production Summary - available via the Wells Modules. Production Details, Monthly Data and Charts - available in the Production Modules. Production data can be viewed in metric or imperial units. Log Files are now available for download through the Well Files page.
Scanned data logs, ship logs and reports from the ADBEX III voyage of the...
researchdata.edu.au
data.aad.gov.au
Updated Jan 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CONNELL, DAVE J.; Commonwealth of Australia; CONNELL, DAVE J.; CONNELL, DAVE J. (2024). Scanned data logs, ship logs and reports from the ADBEX III voyage of the Nella Dan, October to December, 1985 [Dataset]. https://researchdata.edu.au/scanned-logs-ship-december-1985/3651499
Explore at:
Dataset updated
Jan 17, 2024
Dataset provided by
Australian Antarctic Divisionhttps://www.antarctica.gov.au/
Australian Antarctic Data Centre
Authors
CONNELL, DAVE J.; Commonwealth of Australia; CONNELL, DAVE J.; CONNELL, DAVE J.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 8, 1985 - Dec 28, 1985
Area covered

Description
The dataset contains scanned copies of a number of logs, reports and documents from the Antarctic Division Biomass Experiment III( ADBEX III) voyage of the Nella Dan from October to December, 1985. The documents cover CTD data, cell count graphs, acoustic logs, ship logs, and other files.

The dataset download contains the following files:

ADBEX 3 CTD.pdf
CTD Log ADBEX 3 Electronics Lab CTD Unit.pdf
Master Station Log - Vol 1 ADBEX 3.pdf
ND0018586 ADBEX 3 Acoustic Log Book 1 of 2 Copy 3.pdf
ND0018586 ADBEX 3 Acoustic Log Book 2 of 2 Copy 2.pdf
ADBEX 3 CTD Cell Count Graphs.pdf
Z
Sci-Hub download log 2011-2013
data.niaid.nih.gov
zenodo.org
Updated Jan 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elbakyan, Alexandra (2022). Sci-Hub download log 2011-2013 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5918541
Explore at:
Dataset updated
Jan 30, 2022
Dataset authored and provided by
Elbakyan, Alexandra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The included file ancient.sci-hub.stats.tab contains raw unprocessed PDF download log from Sci-Hub starting from 22 September 2011 and ending on 14 October 2013. Statistics from 14 October to 18 Jan 2014 were not recorded for technical reasons. Statistics before 22 September were not collected.

Columns in the data file:

Timestamp (yyyy-MM-dd HH:mm:ss)

DOI

URL

IP address

User identifier

Country

originally published at: https://web.archive.org/web/20200702074701/https://twitter.com/Sci_Hub/status/1221827163781058562
ORIGINAL-NETWORK-TRAFFIC-Friday-02-03-2018-PACP
kaggle.com
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karen Pamela López (2023). ORIGINAL-NETWORK-TRAFFIC-Friday-02-03-2018-PACP [Dataset]. https://www.kaggle.com/datasets/karenp/original-network-traffic-friday-02-03-2018-pacp
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Karen Pamela López
Description
This data set was originally downloaded from: https://www.unb.ca/cic/datasets/ids-2018.html

The data set has a weight of 466GB.

When the download is done, the file contains 2 folders: Processed Traffic Data for ML Algorithms and Original network traffic and log data.

The "Processed Traffic Data for ML Algorithms" folder contains 10 csv files with the following names:

Friday-02-03-2018_TrafficForML_CICFlowMeter

Friday-16-02-2018_TrafficForML_CICFlowMeter

Friday-23-02-2018_TrafficForML_CICFlowMeter

Thuesday-20-02-2018_TrafficForML_CICFlowMeter

Thursday-01-03-2018_TrafficForML_CICFlowMeter

Thursday-15-02-2018_TrafficForML_CICFlowMeter

Thursday-22-02-2018_TrafficForML_CICFlowMeter

Wednesday-14-02-2018_TrafficForML_CICFlowMeter

Wednesday-21-02-2018_TrafficForML_CICFlowMeter

Wednesday-28-02-2018_TrafficForML_CICFlowMeter

And the "Original Network Traffic and Log data" folder contains 10 folders, each folder is named as the previous files. Each folder contains in turn two folders logs and pcap.

Here is the PCAP for Friday-02-03-2018
d
Data from: Who's downloading pirated papers? Everyone
datadryad.org
search.dataone.org
+1more
zip
Updated Apr 22, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra Elbakyan; John Bohannon (2017). Who's downloading pirated papers? Everyone [Dataset]. http://doi.org/10.5061/dryad.q447c
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.q447c
Dataset updated
Apr 22, 2017
Dataset provided by
Dryad
Authors
Alexandra Elbakyan; John Bohannon
Time period covered
Apr 21, 2016
Area covered
global, Global
Description
Sci-Hub download dataThese data include 28 million download request events from the server logs of Sci-Hub from 1 September 2015 through 29 February 2016. The uncompressed 2.7 gigabytes of data are separated into 6 data files, one for each month, in tab-delimited text format.scihub_data.zipIPython Notebook for Sci-Hub raw dataIPython Notebook used to process the raw server log data (processing the GIS files into CSV, scraping DOI metadata, etc.).Sci-Hub.htmlSci-Hub.ipynbSci-Hub publisher DOI prefixesData scraped from the CrossRef website which can be used to replicate the analysis of downloads by publisher.publisher_DOI_prefixes.csv
Network Traffic Dataset
kaggle.com
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravikumar Gattu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

Content :

This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

Dataset Columns:

No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

Acknowledgements :

I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

Ravikumar Gattu , Susmitha Choppadandi

Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

**Dataset License: ** CC0: Public Domain

Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

ML techniques benefits from this Dataset :

This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...
zenodo.org
explore.openaire.eu
bz2
Updated Mar 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2592524
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2592524
Dataset updated
Mar 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

Paper: https://2019.msrconf.org/event/msr-2019-papers-a-large-scale-study-about-quality-and-reproducibility-of-jupyter-notebooks

This repository contains two files:

dump.tar.bz2

jupyter_reproducibility.tar.bz2

The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.

archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.

paper: empty. The notebook analyses/N12.To.Paper.ipynb moves data to it

In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

Reproducing the Analysis

This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.11
Python 3.7.2
PdfCrop 2012/11/02 v1.38

First, download dump.tar.bz2 and extract it:

tar -xjf dump.tar.bz2

It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

psql jupyter < db2019-03-13.dump

It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Create a conda environment with Python 3.7:

conda create -n analyses python=3.7 conda activate analyses

Go to the analyses folder and install all the dependencies of the requirements.txt

cd jupyter_reproducibility/analyses pip install -r requirements.txt

For reproducing the analyses, run jupyter on this folder:

jupyter notebook

Execute the notebooks on this order:

Index.ipynb

N0.Repository.ipynb

N1.Skip.Notebook.ipynb

N2.Notebook.ipynb

N3.Cell.ipynb

N4.Features.ipynb

N5.Modules.ipynb

N6.AST.ipynb

N7.Name.ipynb

N8.Execution.ipynb

N9.Cell.Execution.Order.ipynb

N10.Markdown.ipynb

N11.Repository.With.Notebook.Restriction.ipynb

N12.To.Paper.ipynb

Reproducing or Expanding the Collection

The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

Requirements

This time, we have extra requirements:

All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account

Environment

First, set the following environment variables:

export JUP_MACHINE="db"; # machine identifier export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories export JUP_LOGS_DIR="/home/jupyter/logs"; # log files export JUP_COMPRESSION="lbzip2"; # compression program export JUP_VERBOSE="5"; # verbose level export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection export JUP_GITHUB_USERNAME="github_username"; # your github username export JUP_GITHUB_PASSWORD="github_password"; # your github password export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB) export JUP_FIRST_DATE="2013-01-01"; # initial date to query github export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address export JUP_EMAIL_TO="target@email.com"; # email that receives notifications export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank export JUP_WITH_EXECUTION="1"; # run execute python notebooks export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies export JUP_EXECUTION_MODE="-1"; # run following the execution order export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction # Frequenci of log report export JUP_ASTROID_FREQUENCY="5"; export JUP_IPYTHON_FREQUENCY="5"; export JUP_NOTEBOOKS_FREQUENCY="5"; export JUP_REQUIREMENT_FREQUENCY="5"; export JUP_CRAWLER_FREQUENCY="1"; export JUP_CLONE_FREQUENCY="1"; export JUP_COMPRESS_FREQUENCY="5"; export JUP_DB_IP="localhost"; # postgres database IP

Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

Scripts

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

Conda 2.7

conda create -n raw27 python=2.7 -y conda activate raw27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 2.7

conda create -n py27 python=2.7 anaconda -y conda activate py27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.4

It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

conda create -n raw34 python=3.4 -y conda activate raw34 conda install jupyter -c conda-forge -y conda uninstall jupyter -y pip install --upgrade pip pip install jupyter pip install pipenv pip install -e jupyter_reproducibility/archaeology pip install pathlib2

Anaconda 3.4

conda create -n py34 python=3.4 anaconda -y conda activate py34 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.5

conda create -n raw35 python=3.5 -y conda activate raw35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.5

It requires the manual installation of other anaconda packages.

conda create -n py35 python=3.5 anaconda -y conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator conda activate py35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.6

conda create -n raw36 python=3.6 -y conda activate raw36 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.6

conda create -n py36 python=3.6 anaconda -y conda activate py36 conda install -y anaconda-navigator jupyterlab_server navigator-updater pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.7

<code

Facebook

Twitter

Click to copy link

Link copied

Cite

Mbasa MOLO; Victor Akande; Nzanzu Vingi Patrick; Joke Badejo; Emmanuel Adetiba (2021). OpenStack log files [Dataset]. http://doi.org/10.6084/m9.figshare.17025353.v1

OpenStack log files

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.17025353.v1

Dataset updated

Nov 16, 2021

Dataset provided by

figshare

Authors

Mbasa MOLO; Victor Akande; Nzanzu Vingi Patrick; Joke Badejo; Emmanuel Adetiba

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains log information of a cloud computing infrastructure based on OpenStack.Three different files are available, including the nova, cinder, and glance log files. Due to the fact that the data is unbalanced, a CSV file containing log information of the three OpenStack applications is provided. This can be used for testing in case the log files are used for a machine learning purpose. These data were collected from the Federated Genominc (FEDGEN) cloud computing infrastructure hosted in Covenant Unversity under the Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE) project funded by the World Bank.

Clear search

Close search

Google apps

Main menu

OpenStack log files

AIT Log Data Set V2.0

Computational Log files from Gaussian09

Utah FORGE: Well 58-32 Schlumberger FMI Logs DLIS and XML files

Event Logs CSV

BSEE Data Center - Scanned Units Query

Processed Datasets - Imputation in Well Log Data: A Benchmark

File Description:

Event Log Sampling Datasets

LoRaWAN Traffic Analysis Dataset

Ministry of Public Administration and Security_Public Data Usage (File_API)

Scanned data logs, ship logs and reports from the ADBEX II voyage of the...

PIPr: A Dataset of Public Infrastructure as Code Programs

Respiration_chambers/raw_log_files and combined datasets of biomass and...

PEPS South Australia - Application - SARIG catalogue

Scanned data logs, ship logs and reports from the ADBEX III voyage of the...

Sci-Hub download log 2011-2013

ORIGINAL-NETWORK-TRAFFIC-Friday-02-03-2018-PACP

Data from: Who's downloading pirated papers? Everyone

Network Traffic Dataset

Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

OpenStack log files