Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Intelligent Hybrid model to Enhance Time Series Models for Predicting Network Traffic
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This dataset consists of circles (or friends lists ) from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks.
The Intelligent Road Network dataset provided by the Transport Department includes traffic directions, turning restrictions at road junctions, stopping restrictions, on-street parking spaces and other road traffic data for supporting the development of intelligent transport system, fleet management system and car navigation etc. by the public.
Esri China (HK) has prepared this File Geodatabase containing a Network Dataset for the Intelligent Road Network to support Esri GIS users to use the dataset in ArcGIS Pro without going through long configuration steps. Please refer to this guideline to use the Road Network Dataset in ArcGIS Pro for routing analysis. This network dataset has been configured and deployed the following restrictions:
Speed LimitTurnIntersectionTraffic FeaturesPedestrian ZoneTraffic Sign of ProhibitionVehicle RestrictionThe coordinate system of this dataset is Hong Kong 1980 Grid.The objectives of uploading the network dataset to ArcGIS Online platform are to facilitate our Hong Kong ArcGIS users to utilize the data in a spatial ready format and save their data conversion effort.For details about the schema and information about the content and relationship of the data, please refer to the data dictionary provided by Transport Department at https://data.gov.hk/en-data/dataset/hk-td-tis_15-road-network-v2.For details about the data, source format and terms of conditions of usage, please refer to the website of DATA.GOV.HK at https://data.gov.hk.Dataset last updated on: 2021 July
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper presents a comprehensive and quality collection of functional human brain network data for potential research in the intersection of neuroscience, machine learning, and graph analytics.Anatomical and functional MRI images of the brain have been used to understand the functional connectivity of the human brain and are particularly important in identifying underlying neurodegenerative conditions such as Alzheimer's, Parkinson's, and Autism. Recently, the study of the brain in the form of brain networks using machine learning and graph analytics has become increasingly popular, especially to predict the early onset of these conditions. A brain network, represented as a graph, retains richer structural and positional information that traditional examination methods are unable to capture. However, the lack of brain network data transformed from functional MRI images prevents researchers from data-driven explorations. One of the main difficulties lies in the complicated domain-specific preprocessing steps and the exhaustive computation required to convert data from MRI images into brain networks. We bridge this gap by collecting a large amount of available MRI images from existing studies, working with domain experts to make sensible design choices, and preprocessing the MRI images to produce a collection of brain network datasets. The datasets originate from 6 different sources, cover 4 neurodegenerative conditions, and consist of a total of 2,688 subjects.Due to the data protocol, we are unable to release the ADNI dataset here. The data will be released via the ADNI external data submissions within their data system.We test our graph datasets on 5 machine learning models commonly used in neuroscience and on a recent graph-based analysis model to validate the data quality and to provide domain baselines. To lower the barrier to entry and promote the research in this interdisciplinary field, we release our complete preprocessing details, codes, and brain network data: https://github.com/brainnetuoa/data_driven_network_neuroscience.To stay informed about the new updates of the datasets, kindly provide us with your email address:https://forms.gle/KGAajR6LEysXWKvKAUpdated on 10/09/2024:Please note that we have identified 14 subjects in the PPMI (Parkinson's Progression Markers Initiative) dataset, prodromal group, where the time-series images include only 10 time slots. The invalid subjects are:sub-prodromal103857sub-prodromal120622sub-prodromal146573sub-prodromal40737sub-prodromal52874sub-prodromal55560sub-prodromal56680sub-prodromal58027sub-prodromal58680sub-prodromal59390sub-prodromal59483sub-prodromal59503sub-prodromal71658sub-prodromal75422We have removed the invalid images, and updated the dataset by including both the parcellated images (ppmi_v2.zip) and the preprocessed images (Ppmi_Preprocessed_v2.z*).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.
Activities:
Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.
The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.
The amount of data is stated as follows:
Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes
The code of our machine learning approach is also included. There is a README.txt file with the documentation of how to use the code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset information
9 graphs of Autonomous Systems (AS) peering information inferred from Oregon
route-views between March 31 2001 and May 26 2001.
Dataset statistics are calculated for the graph with the lowest (March 31 2001)
and highest (from May 26 2001) number of nodes: Dataset statistics for graph
witdh lowest number of nodes - 3 31 2001)
Nodes 10670
Edges 22002
Nodes in largest WCC 10670 (1.000)
Edges in largest WCC 22002 (1.000)
Nodes in largest SCC 10670 (1.000)
Edges in largest SCC 22002 (1.000)
Average clustering coefficient 0.4559
Number of triangles 17144
Fraction of closed triangles 0.009306
Diameter (longest shortest path) 9
90-percentile effective diameter 4.5
Dataset statistics for graph with highest number of nodes - 5 26 2001
Nodes 11174
Edges 23409
Nodes in largest WCC 11174 (1.000)
Edges in largest WCC 23409 (1.000)
Nodes in largest SCC 11174 (1.000)
Edges in largest SCC 23409 (1.000)
Average clustering coefficient 0.4532
Number of triangles 19894
Fraction of closed triangles 0.009636
Diameter (longest shortest path) 10
90-percentile effective diameter 4.4
Source (citation)
J. Leskovec, J. Kleinberg and C. Faloutsos. Graphs over Time: Densification
Laws, Shrinking Diameters and Possible Explanations. ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD), 2005.
Files
File Description
* AS peering information inferred from Oregon route-views ...
oregon1_010331.txt.gz from March 31 2001
oregon1_010407.txt.gz from April 7 2001
oregon1_010414.txt.gz from April 14 2001
oregon1_010421.txt.gz from April 21 2001
oregon1_010428.txt.gz from April 28 2001
oregon1_010505.txt.gz from May 05 2001
oregon1_010512.txt.gz from May 12 2001
oregon1_010519.txt.gz from May 19 2001
oregon1_010526.txt.gz from May 26 2001
NOTE: for the UF Sparse Matrix Collection, the primary matrix in this problem
set (Problem.A) is the last matrix in the sequence, oregon1_010526, from May 26
2001.
The nodes are uniform across all graphs in the sequence in the UF collection.
That is, nodes do...
This repository contains network graphs and network metadata from Moviegalaxies, a website providing network graph data from about 773 films (1915–2012). The data includes individual network graph data in Graph Exchange XML Format and descriptive statistics on measures such as clustering coefficient, degree, density, diameter, modularity, average path length, the total number of edges, and the total number of nodes.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset accompanies the paper "STREETS: A Novel Camera Network Dataset for Traffic Flow" at Neural Information Processing Systems (NeurIPS) 2019. Included are: *Over four million still images form publicly accessible cameras in Lake County, IL. The images were collected across 2.5 months in 2018 and 2019. *Directed graphs describing the camera network structure in two communities in Lake County. *Documented non-recurring traffic incidents in Lake County coinciding with the 2018 data. *Traffic counts for each day of images in the dataset. These counts track the volume of traffic in each community. *Other annotations and files useful for computer vision systems. Refer to the accompanying "readme.txt" or "readme.pdf" for further details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Global Urban Network (GUN) dataset provides pre-computed node and edge attribute features for various cities. Each layer is available in .geojson format and can easily be converted into NetworkX, igraph, PyG, and DGL graph formats.
For node attributes, we adopt a uniform Euclidean approach, as it provides a consistent, straightforward, and extensible basis for integrating heterogeneous data sources across different network locations. Accordingly, we construct 100 metres euclidean buffers for each network node and compute the spatial intersection with spatial targets (e.g., street view imagery points, points of interest, and building footprints). To ensure spatial consistency and accurate distance computation, we project spatial entities into local coordinate reference systems (CRS). Users can employ the Urbanity package to generate Euclidean buffers of arbitrary distance.
For edge attributes, we adopt a two-step approach: 1) compute the distance between each spatial point of interest and its proximate edges in the network, and 2) assign entities to the corresponding edge with lowest distance. To account for remote edges (e.g., peripheral routes that are not located close to any amenities), we specify a distance threshold of 50 metres. For buildings, we compute the distance between building centroids and their respective network edge. Accordingly, we compute spatial indicators based on the set of elements assigned to each network edge.
We also release aggregated subzone statistics for each city. Similarly, users can employ the Urbanity package to generate aggregate statistics for any arbitrary geographic boundary.
Urbanity Python package: https://github.com/winstonyym/urbanity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We recommend using the CESNET DataZoo python library, which facilitates the work with large network traffic datasets. More information about the DataZoo project can be found in the GitHub repository https://github.com/CESNET/cesnet-datazoo.
The modern approach for network traffic classification (TC), which is an important part of operating and securing networks, is to use machine learning (ML) models that are able to learn intricate relationships between traffic characteristics and communicating applications. A crucial prerequisite is having representative datasets. However, datasets collected from real production networks are not being published in sufficient numbers. Thus, this paper presents a novel dataset, CESNET-TLS-Year22, that captures the evolution of TLS traffic in an ISP network over a year. The dataset contains 180 web service labels and standard TC features, such as packet sequences. The unique year-long time span enables comprehensive evaluation of TC models and assessment of their robustness in the face of the ever-changing environment of production networks.
Data description The dataset consists of network flows describing encrypted TLS communications. Flows are extended with packet sequences, histograms, and fields extracted from the TLS ClientHello message, which is transmitted in the first packet of the TLS connection handshake. The most important extracted handshake field is the SNI domain, which is used for ground-truth labeling.
Packet Sequences Sequences of packet sizes, directions, and inter-packet times are standard data input for traffic analysis. For packet sizes, we consider the payload size after transport headers (TCP headers for the TLS case). We omit packets with no TCP payload, for example ACKs, because zero-payload packets are related to the transport layer internals rather than services’ behavior. Packet directions are encoded as ±1, where +1 means a packet sent from client to server, and -1 is a packet from server to client. Inter-packet times depend on the location of communicating hosts, their distance, and on the network conditions on the path. However, it is still possible to extract relevant information that correlates with user interactions and, for example, with the time required for an API/server/database to process the received data and generate a response. Packet sequences have a maximum length of 30, which is the default setting of the used flow exporter. We also derive three fields from each packet sequence: its length, time duration, and the number of roundtrips. The roundtrips are counted as the number of changes in the communication direction; in other words, each client request and server response pair counts as one roundtrip.
Flow statistics Each data record also includes standard flow statistics, representing aggregated information about the entire bidirectional connection. The fields are the number of transmitted bytes and packets in both directions, the duration of the flow, and packet histograms. The packet histograms include binned counts (not limited to the first 30 packets) of packet sizes and inter-packet times in both directions. There are eight bins with a logarithmic scale; the intervals are 0-15, 16-31, 32-63, 64-127, 128-255, 256-511, 512-1024, >1024 [ms or B]. The units are milliseconds for inter-packet times and bytes for packet sizes (More information in the PHISTS plugin documentation). Moreover, each flow has its end reason---either it ended with the TCP connection termination (FIN packets), was idle, reached the active timeout, or ended due to other reasons. This corresponds with the official IANA IPFIX-specified values. The FLOW_ENDREASON_OTHER field represents the forced end and lack of resources reasons.
Dataset structure The dataset is organized per weeks and individual days. The flows are delivered in compressed CSV files. CSV files contain one flow per row; data columns are summarized in the provided list below. For each flow data file, there is a JSON file with the total number of saved flows and the number of flows per service. There are also files aggregating flow counts for each week (stats-week.json) and for the entire dataset (stats-dataset.json). The following list describes flow data fields in CSV files:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is about some real epidemic networks of covid19.
Decentralized finance (DeFi) is known for its unique mechanism design, which applies smart contracts to facilitate peer-to-peer transactions. The decentralized bank is a typical DeFi application. Ideally, a decentralized bank should be decentralized in the transaction. However, many recent studies have found that decentralized banks have not achieved a significant degree of decentralization. This research conducts a comparative study among mainstream decentralized banks. We apply core-periphery network features analysis using the transaction data from four decentralized banks, Liquity, Aave, MakerDao, and Compound. We extract six features and compare the banks' levels of decentralization cross-sectionally. According to the analysis results, we find that: 1) MakerDao and Compound are more decentralized in the transactions than Aave and Liquity. 2) Although decentralized banking transactions are supposed to be decentralized, the data show that four banks have primary external transaction core addresses such as Huobi, Coinbase, and Binance, etc. We also discuss four design features that might affect network decentralization. Our research contributes to the literature at the interface of decentralized finance, financial technology (Fintech), and social network analysis and inspires future protocol designs to live up to the promise of decentralized finance for a truly peer-to-peer transaction network. (2023-07-06)
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The FTC produces the Consumer Sentinel Network Data Book annually using a data set of fraud, identity theft, and other reports from consumers received by the Consumer Sentinel Network. These include reports made directly by consumers to the FTC, as well as reports received by federal, state, local, and international law enforcement agencies and other non-governmental organizations. This data set includes national statistics, as well as a state-by-state listing of top report categories in each state and a listing of metropolitan areas that generated the most complaints per capita, for calendar year 2015.
This report documents the acquisition of source data, and calculation of land cover summary statistics datasets for six National Park Service Klamath Network park units and seven custom areas of analysis: Crater Lake National Park, Lassen Volcanic National Park, Lava Beds National Monument, Oregon Caves National Monument, Redwood National and State Parks, Whiskeytown National Recreation Area, and the seven custom areas of analysis. The source data and land cover calculations are available for use within the National Park Service (NPS) Inventory and Monitoring Program. Land cover summary statistics datasets can be calculated for all geographic regions within the extent of the NPS; this report includes statistics calculated for the conterminous United States. The land cover summary statistics datasets are calculated from multiple sources, including Multi-Resolution Land Characteristics Consortium products in the National Land Cover Database (NLCD) and the United States Geological Survey’s (USGS) Earth Resources Observation and Science (EROS) Center products in the Land Change Monitoring, Assessment, and Projection (LCMAP) raster dataset. These summary statistics calculate land cover at up to three classification scales: Level 1, modified Anderson Level 2, and Natural versus Converted land cover. The output land cover summary statistics datasets produced here for the six Klamath Network park units and seven custom areas of analysis utilize the most recent versions of the source datasets (NLCD and LCMAP). These land cover summary statistics datasets are used in the NPS Inventory and Monitoring Program, including the NPS Environmental Settings Monitoring Protocol and may be used by networks and parks for additional efforts.
The number of social media users in the United States was forecast to continuously increase between 2024 and 2029 by in total 26 million users (+8.55 percent). After the ninth consecutive increasing year, the social media user base is estimated to reach 330.07 million users and therefore a new peak in 2029. Notably, the number of social media users of was continuously increasing over the past years.The shown figures regarding social media users have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Data is from the small-scale demonstration of the Intelligent Network Flow Optimization (INFLO) Prototype System and applications in Seattle, Washington. Connected vehicle systems were deployed in 21 vehicles in a scripted driving scenario circuiting this I-5 corridor northbound and southbound during morning rush hour. This data set contains queue warning messages that were recommended by the INFLO Q-WARN algorithm and sent by the traffic management center to vehicles to warn drivers upstream of the queue. The objective of queue warning is to provide a vehicle operator sufficient warning of impending queue backup in order to brake safely, change lanes, or modify route such that secondary collisions can be minimized or even eliminated.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The 2022 Road Network File depicts the digital road line coverage for Canada. It contains information such as street arc unique identifier (UID), name, type, direction and address range, as well as rank and class. It also includes province or territory (PR) and census subdivision (CSD) information for each side of a street arc (where applicable).The Road Network File is portrayed in Lambert conformal conic projection (North American Datum of 1983 [NAD83]). The 2022 Road Network File is available as a national file.
https://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/
The UCSD Network Telescope consists of a globally routed, but lightly utilized /9 and /10 network prefix, that is, 1/256th of the whole IPv4 address space. It contains few legitimate hosts; inbound traffic to non-existent machines - so called Internet Background Radiation (IBR) - is unsolicited and results from a wide range of events, including misconfiguration (e.g. mistyping an IP address), scanning of address space by attackers or malware looking for vulnerable targets, backscatter from randomly spoofed denial-of-service attacks, and the automated spread of malware. CAIDA continously captures this anomalous traffic discarding the legitimate traffic packets destined to the few reachable IP addresses in this prefix. We archive and aggregate these data, and provide this valuable resource to network security researchers. This dataset represents raw traffic traces captured by the Telescope instrumentation and made available in near-real time as one-hour long compressed pcap files. We collect more than 3 TB of uncompressed IBR traffic traces data per day. The most recent 14 days of data are stored locally at CAIDA. Once data slides out of this near-real-time window, the pcap files are off-loaded to a tape storage. This historical Telescope data starting from 2008 are available by additional request.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
is used in a lot.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Intelligent Hybrid model to Enhance Time Series Models for Predicting Network Traffic