Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Cellular communications, especially with the advent of 5G mobile networks, demand stringent adherence to high-reliability standards, ultra-low latency, increased capacity, enhanced security, and high-speed user connectivity. To fulfill these requirements, mobile operators require a programmable solution capable of supporting multiple independent tenants on a single physical infrastructure. The advent of 5G networks facilitates end-to-end resource allocation through Network Slicing (NS), which allows for the division of the network into distinct virtual slices.
Network slicing in 5G stands as a pivotal feature for next-generation wireless networks, delivering substantial benefits to both mobile operators and businesses. Developing a Machine Learning (ML) model is crucial for accurately predicting the optimal network slice based on key device parameters. Such a model also plays a vital role in managing network load balancing and addressing network slice failures.
The dataset is structured to support the development of an ML model that can classify the optimal network slice based on device parameters. The target output comprises three distinct classes:
Enhanced Mobile Broadband (eMBB):
Ultra-Reliable Low Latency Communication (URLLC):
Massive Machine Type Communication (mMTC):
deepslice_data.csvThe dataset includes labeled instances categorized into the three target classes: eMBB, URLLC, and mMTC. Each instance corresponds to a specific device configuration and its optimal network slice.
Network slicing in 5G is instrumental in provisioning tailored network services for specific use cases, ensuring optimal performance, resource utilization, and user experiences based on the requirements of eMBB, URLLC, and mMTC applications. This dataset is invaluable for researchers and practitioners aiming to design and implement ML models for network slice prediction, thereby enhancing the operational efficiency and reliability of 5G networks.
This dataset is meticulously curated to facilitate the development of ML models for predicting the optimal 5G network slice. It encompasses a comprehensive set of attributes and target classes, ensuring that it meets the highest standards required for advanced research and practical applications in the field of cellular communications and network management.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset has been generated using NYUSIM 3.0 mm-Wave channel simulator software, which takes into account atmospheric data such as rain rate, humidity, barometric pressure, and temperature. The input data was collected over the course of a year in South Asia. As a result, the dataset provides an accurate representation of the seasonal variations in mm-wave channel characteristics in these areas. The dataset includes a total of 2835 records, each of which contains T-R Separation Distance (m), Time Delay (ns), Received Power (dBm), Phase (rad), Azimuth AoD (degree), Elevation AoD (degree), Azimuth AoA (degree), Elevation, AoA (degree), RMS Delay Spread (ns), Season, Frequency and Path Loss (dB). Four main seasons have been considered in this dataset: Spring, Summer, Fall, and Winter. Each season is subdivided into three parts (i.e., low, medium, and high), to accurately include the atmospheric variations in a season. To simulate the path loss, realistic Tx and Rx height, NLoS environment, and mean human blockage attenuation effects have been taken into consideration. The data has been preprocessed and normalized to ensure consistency and ease of use. Researchers in the field of mm-wave communications and networking can use this dataset to study the impact of atmospheric conditions on mm-wave channel characteristics and develop more accurate models for predicting channel behavior. The dataset can also be used to evaluate the performance of different communication protocols and signal processing techniques under varying weather conditions. Note that while the data was collected specifically in South Asia region, the high correlation between the weather patterns in this region and other areas means that the dataset may also be applicable to other regions with similar atmospheric conditions.
Acknowledgements The paper in which the dataset was proposed is available on: https://ieeexplore.ieee.org/abstract/document/10307972
If you use this dataset, please cite the following paper:
Rashed Hasan Ratul, S. M. Mehedi Zaman, Hasib Arman Chowdhury, Md. Zayed Hassan Sagor, Mohammad Tawhid Kawser, and Mirza Muntasir Nishat, “Atmospheric Influence on the Path Loss at High Frequencies for Deployment of 5G Cellular Communication Networks,” 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–6. https://doi.org/10.1109/ICCCNT56998.2023.10307972
BibTeX ```bibtex @inproceedings{Ratul2023Atmospheric, author = {Ratul, Rashed Hasan and Zaman, S. M. Mehedi and Chowdhury, Hasib Arman and Sagor, Md. Zayed Hassan and Kawser, Mohammad Tawhid and Nishat, Mirza Muntasir}, title = {Atmospheric Influence on the Path Loss at High Frequencies for Deployment of {5G} Cellular Communication Networks}, booktitle = {2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)}, year = {2023}, pages = {1--6}, doi = {10.1109/ICCCNT56998.2023.10307972}, keywords = {Wireless communication; Fluctuations; Rain; 5G mobile communication; Atmospheric modeling; Simulation; Predictive models; 5G-NR; mm-wave propagation; path loss; atmospheric influence; NYUSIM; ML} }
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chemical cues are arguably the most fundamental means of animal communication and play an important role in mate choice and kin recognition. Consequently, there is growing interest in the use of gas chromatography (GC) to investigate the chemical basis of eco-evolutionary interactions. Both GC-MS (mass spectrometry) and FID (flame ionization detection) are commonly used to characterise the chemical composition of biological samples such as skin swabs. The resulting chromatograms comprise peaks that are separated according to their retention times and which represent different substances. Across chromatograms of different samples, homologous substances are expected to elute at similar retention times. However, random and often unavoidable experimental variation introduces noise, making the alignment of homologous peaks challenging, particularly with GC-FID data where mass spectral data are lacking. Here we present GCalignR, a user-friendly R package for aligning GC-FID data based on retention times. The package was developed specifically for ecological and evolutionary studies that seek to investigate similarity patterns across multiple and often highly variable biological samples, for example representing different sexes, age classes or reproductive stages. The package also implements dynamic visualisations to facilitate inspection and fine-tuning of the resulting alignments and can be integrated within a broader workflow in R to facilitate downstream multivariate analyses. We demonstrate an example workflow using empirical data from Antarctic fur seals and explore the impact of user-defined parameter values by calculating alignment error rates for multiple datasets. The resulting alignments had low error rates for most of the explored parameter space and we could also show that GCalignR performed equally well or better than other available software. We hope that GCalignR will help to simplify the processing of chemical datasets and improve the standardization and reproducibility of chemical analyses in studies of animal chemical communication and related fields.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network traffic datasets with novel extended IP flow called NetTiSA flow
Datasets were created for the paper: NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification -- Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka -- which is published in The International Journal of Computer and Telecommunications Networking https://doi.org/10.1016/j.comnet.2023.110147Please cite the usage of our datasets as:
Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka, "NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification", Computer Networks, Volume 240, 2024, 110147, ISSN 1389-1286
@article{KOUMAR2024110147, title = {NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification}, journal = {Computer Networks}, volume = {240}, pages = {110147}, year = {2024}, issn = {1389-1286}, doi = {https://doi.org/10.1016/j.comnet.2023.110147}, url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923}, author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka} }
This Zenodo repository contains 23 datasets created from 15 well-known published datasets, which are cited in the table below. Each dataset contains the NetTiSA flow feature vector.
NetTiSA flow feature vector
The novel extended IP flow called NetTiSA (Network Time Series Analysed) flow contains a universal bandwidth-constrained feature vector consisting of 20 features. We divide the NetTiSA flow classification features into three groups by computation. The first group of features is based on classical bidirectional flow information---a number of transferred bytes, and packets. The second group contains statistical and time-based features calculated using the time-series analysis of the packet sequences. The third type of features can be computed from the previous groups (i.e., on the flow collector) and improve the classification performance without any impact on the telemetry bandwidth.
Flow features
The flow features are:
Packets is the number of packets in the direction from the source to the destination IP address.
Packets in reverse order is the number of packets in the direction from the destination to the source IP address.
Bytes is the size of the payload in bytes transferred in the direction from the source to the destination IP address.
Bytes in reverse order is the size of the payload in bytes transferred in the direction from the destination to the source IP address.
Statistical and Time-based features
The features that are exported in the extended part of the flow. All of them can be computed (exactly or in approximative) by stream-wise computation, which is necessary for keeping memory requirements low. The second type of feature set contains the following features:
Mean represents mean of the payload lengths of packets
Min is the minimal value from payload lengths of all packets in a flow
Max is the maximum value from payload lengths of all packets in a flow
Standard deviation is a measure of the variation of payload lengths from the mean payload length
Root mean square is the measure of the magnitude of payload lengths of packets
Average dispersion is the average absolute difference between each payload length of the packet and the mean value
Kurtosis is the measure describing the extent to which the tails of a distribution differ from the tails of a normal distribution
Mean of relative times is the mean of the relative times which is a sequence defined as (st = {t_1 - t_1, t_2 - t_1, ..., t_n - t_1} )
Mean of time differences is the mean of the time differences which is a sequence defined as (dt = { t_j - t_i | j = i + 1, i \in {1, 2, \dots, n - 1} }.)
Min from time differences is the minimal value from all time differences, i.e., min space between packets.
Max from time differences is the maximum value from all time differences, i.e., max space between packets.
Time distribution describes the deviation of time differences between individual packets within the time series. The feature is computed by the following equation:(tdist = \frac{ \frac{1}{n-1} \sum_{i=1}^{n-1} \left| \mu_{{dt_{n-1}}} - dt_i \right| }{ \frac{1}{2} \left(max\left({dt_{n-1}}\right) - min\left({dt_{n-1}}\right) \right) })
Switching ratio represents a value change ratio (switching) between payload lengths. The switching ratio is computed by equation:(sr = \frac{s_n}{\frac{1}{2} (n - 1)})
where \(s_n\) is number of switches.
Features computed at the collectorThe third set contains features that are computed from the previous two groups prior to classification. Therefore, they do not influence the network telemetry size and their computation does not put additional load to resource-constrained flow monitoring probes. The NetTiSA flow combined with this feature set is called the Enhanced NetTiSA flow and contains the following features:
Max minus min is the difference between minimum and maximum payload lengths
Percent deviation is the dispersion of the average absolute difference to the mean value
Variance is the spread measure of the data from its mean
Burstiness is the degree of peakedness in the central part of the distribution
Coefficient of variation is a dimensionless quantity that compares the dispersion of a time series to its mean value and is often used to compare the variability of different time series that have different units of measurement
Directions describe a percentage ratio of packet direction computed as (\frac{d_1}{ d_1 + d_0}), where (d_1) is a number of packets in a direction from source to destination IP address and (d_0) the opposite direction. Both (d_1) and (d_0) are inside the classical bidirectional flow.
Duration is the duration of the flow
The NetTiSA flow is implemented into IP flow exporter ipfixprobe.
Description of dataset files
In the following table is a description of each dataset file:
File name
Detection problem
Citation of the original raw dataset
botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
doh_cic.csv Binary detection of DoH Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020
doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
unsw_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
unsw_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the emergence of Artificial Intelligence technology and the advancement of science and technology, the current mainstream path of social development is continuously updating and improving various industries using technology. Therefore, in order to promote the development of sneaker consumer culture, this study explores the use of technological means to improve the dissemination effect of symbolic culture in sneaker consumer culture. Firstly, the development concept and mainstream direction of sneaker consumer culture in the era of big data are discussed, and the application principle of big data technology is introduced. Then, a sneaker culture dissemination model based on big data technology is designed. Finally, the model is optimized using a Convolutional Neural Network (CNN), and its effectiveness is evaluated. The results show that the Convolutional Neural Network-Big Data (CNN-BD) model designed in this study has the highest fitting degree of 93% and a lowest fitting degree of 78% in the UT-Zap50K dataset. In the Ai2 dataset, the highest fitting degree of the big data classification model is 94%, and the lowest is 76%. In the Kaggle Women’s Shoe dataset, the highest fitting degree of the big data classification model is 92%, and the lowest is 77%. In the Kaggle Men’s Shoe dataset, the highest fitting degree of the big data classification model is 94%, and the lowest is 79%. The designed model has the highest accuracy rate of 93% in sneaker classification, while other models have the highest accuracy rate of around 82% in sneaker classification. Compared with traditional big data technology, the designed model has greatly improved and can adapt to more working environments. This study not only provides technical support for the application of big data technology but also contributes to improving the dissemination effect and promoting the comprehensive development of sneaker consumer culture.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This purpose of this (imaginary) study is to detect faulty satellite in order to prevent communication interruption. Understand how confirmation bias can lead to misleading results by utilizing a synthetic data set with a developed story line.
You are leading a team to conduct a study of the that will allow the space agency to predict the health status of satellites based on telemetry data to enable proactive maintenance and ensure optimal performance in space missions.
Accurately predicting satellite health to minimize the risk of mission failures and optimize satellite usage.
Identifying the most critical factors that affect satellite health to focus on improving those aspects during the satellite design and maintenance process.
Reducing the rate of false positives and false negatives in predictions to avoid unnecessary maintenance efforts and ensure that actual issues are addressed promptly.
False Positive (predicting a malfunction when the component is healthy): Unnecessary maintenance check: $5,000 Unwarranted component replacement: $50,000
False Negative (predicting a component is healthy when it is malfunctioning): Data loss or degradation: $100,000 Partial mission failure: $500,000 Total mission failure or satellite loss: $300,000,000
Thoroughly exploring the data to understand the relationships between various telemetry variables and satellite health.
Ensuring the model is accurate and reliable by selecting appropriate algorithms, performing feature engineering, and validating the model's performance using relevant metrics.
Identifying and addressing any data quality issues, such as missing values and incorrect data.
Investigating the importance of each variable in the prediction task and communicating these insights to stakeholders for better decision-making
DATA DICTIONARY:
Range: 0 to 3650 Description: Time since the satellite was launched.
Range: 300 to 2000 Description: Altitude of the satellite's orbit.
Range: 20 to 30 Description: Satellite's battery voltage.
Range: -50 to 50 Description: Temperature of the satellite's solar panels.
Range: 0 to 5 Description: Error in the satellite's attitude control system.
Range: 10 to 100 Description: Rate of data transmission from the satellite to the ground station.
Range: 0 (not working) or 1 (working) Description: Binary flag indicating if the thermal control system is working or not.
Range: 0 (unhealthy) or 1 (healthy) Description: Target variable - binary flag indicating if the satellite is healthy or unhealthy
Facebook
TwitterThis survey was conducted in India between June 2013 and December 2014 as part of the Enterprise Survey project, an initiative of the World Bank. The objective of the survey is to obtain feedback from enterprises on the state of the private sector as well as to help in building a panel of enterprise data that will make it possible to track changes in the business environment over time, thus allowing, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the survey assesses the constraints to private sector growth and creates statistically significant business environment indicators that are comparable across countries.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90% of the questions objectively ascertain characteristics of a country's business environment. The remaining questions assess the survey respondents' opinions on what are the obstacles to firm growth and performance.
Data from 1,247 establishments was analyzed. Stratified random sampling was used to select the surveyed businesses. Data was collected using face-to-face interviews.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
Sample survey data [ssd]
The sample was selected using stratified random sampling. Three levels of stratification were used: industry, establishment size, and region.
For stratification by industry, the universe was stratified into seven manufacturing industries (food, textiles, garments, chemicals, non-metalic minerals, motor vehicles, other manufacturing) and two service sectors (retail and other services).
Size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees).
Regional stratification was defined in 5 regions: Punjab, Sindh, KPK, Balochistan, and Islamabad.
The sample frame for Manufacturing establishments was from the Pakistan Bureau of Statistics (PBS). For Retail and Other Services establishments, Nielsen Pakistan provided the sample frame through desk research. For confidentiality purposes, PBS randomly drew the sample of fresh manufacturing establishment to be interviewed based on the sample design provided by the World Bank.
The combination of the PBS-provided sample along with the services lists from Nielsen were then used as the sample frame for the Pakistan Enterprise Survey with the aim of obtaining interviews at 1,320 establishments.
Given the impact that non-eligible units included in the sample universe may have on the results, adjustments may be needed when computing the appropriate weights for individual observations. The percentage of confirmed non-eligible units as a proportion of the total number of sampled establishments contacted for the survey was 4.1% (116 out of 2,841 establishments).
Face-to-face [f2f]
The structure of the database reflects the fact that two different versions of the questionnaire were used for 3 categories of businesses (manufacturing, retail, and other services/non-retail). The Manufacturing Questionnaire includes all common questions asked to all establishments and some specific questions relevant to manufacturing firms. The Services Questionnaire, administered to retail and other services/non-retail establishments, includes all common questions asked to all establishments and some specific questions relevant retail and other services firms. Each variation of the questionnaire is identified by the index variable, a0.
All variables are named using, first, the letter of each section and, second, the number of the variable within the section, i.e. a1 denotes section A, question 1. Variable names proceeded by a prefix "SAR" or "IND" indicate questions specific to the South Asia region or India only, therefore, they may not be found in the implementation of the rollout in other countries. All other suffixed variables are global and are present in all country surveys over the world. All variables are numeric with the exception of those variables with an "x" at the end of their names. The suffix "x" denotes that the variable is alpha-numeric.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as a different option from don’t know. b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals.
The number of interviews per contacted establishments was 0.16. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 0.12.
Facebook
TwitterA GPS survey by Andrew Ruddell (AAD Glaciology) on 9 March 1999 at Casey Station, Antarctica. The survey was conducted along the road from Casey Station to Old Casey. The aim of the survey was to investigate the cause of the 'disappearance' of road gravel applied to the compacted snow road in the depression between Casey Station to Old Casey. This dataset consists of point data with an elevation (above mean sea level) attribute. The data, in Excel and shapefile formats, and Andrew's report are available for download (see Related URL below).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Movement definitions and manifestations of the 7 behavior outcome classes in comparison with the Attuning Theory.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data sources used in this analysis, updated frequently to support COVID-19 research.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Cellular communications, especially with the advent of 5G mobile networks, demand stringent adherence to high-reliability standards, ultra-low latency, increased capacity, enhanced security, and high-speed user connectivity. To fulfill these requirements, mobile operators require a programmable solution capable of supporting multiple independent tenants on a single physical infrastructure. The advent of 5G networks facilitates end-to-end resource allocation through Network Slicing (NS), which allows for the division of the network into distinct virtual slices.
Network slicing in 5G stands as a pivotal feature for next-generation wireless networks, delivering substantial benefits to both mobile operators and businesses. Developing a Machine Learning (ML) model is crucial for accurately predicting the optimal network slice based on key device parameters. Such a model also plays a vital role in managing network load balancing and addressing network slice failures.
The dataset is structured to support the development of an ML model that can classify the optimal network slice based on device parameters. The target output comprises three distinct classes:
Enhanced Mobile Broadband (eMBB):
Ultra-Reliable Low Latency Communication (URLLC):
Massive Machine Type Communication (mMTC):
deepslice_data.csvThe dataset includes labeled instances categorized into the three target classes: eMBB, URLLC, and mMTC. Each instance corresponds to a specific device configuration and its optimal network slice.
Network slicing in 5G is instrumental in provisioning tailored network services for specific use cases, ensuring optimal performance, resource utilization, and user experiences based on the requirements of eMBB, URLLC, and mMTC applications. This dataset is invaluable for researchers and practitioners aiming to design and implement ML models for network slice prediction, thereby enhancing the operational efficiency and reliability of 5G networks.
This dataset is meticulously curated to facilitate the development of ML models for predicting the optimal 5G network slice. It encompasses a comprehensive set of attributes and target classes, ensuring that it meets the highest standards required for advanced research and practical applications in the field of cellular communications and network management.