10 datasets found
  1. Network Slicing

    • kaggle.com
    zip
    Updated Aug 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Puspak Meher (2022). Network Slicing [Dataset]. https://www.kaggle.com/datasets/puspakmeher/networkslicing
    Explore at:
    zip(307691 bytes)Available download formats
    Dataset updated
    Aug 8, 2022
    Authors
    Puspak Meher
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Overview

    Cellular communications, especially with the advent of 5G mobile networks, demand stringent adherence to high-reliability standards, ultra-low latency, increased capacity, enhanced security, and high-speed user connectivity. To fulfill these requirements, mobile operators require a programmable solution capable of supporting multiple independent tenants on a single physical infrastructure. The advent of 5G networks facilitates end-to-end resource allocation through Network Slicing (NS), which allows for the division of the network into distinct virtual slices.

    Network slicing in 5G stands as a pivotal feature for next-generation wireless networks, delivering substantial benefits to both mobile operators and businesses. Developing a Machine Learning (ML) model is crucial for accurately predicting the optimal network slice based on key device parameters. Such a model also plays a vital role in managing network load balancing and addressing network slice failures.

    Dataset Characteristics and Target Classes

    The dataset is structured to support the development of an ML model that can classify the optimal network slice based on device parameters. The target output comprises three distinct classes:

    1. Enhanced Mobile Broadband (eMBB):

      • Focuses on high-bandwidth and high-speed data transmission.
      • Facilitates activities such as high-definition video streaming, online gaming, and immersive media experiences.
    2. Ultra-Reliable Low Latency Communication (URLLC):

      • Emphasizes extremely reliable and low-latency connections.
      • Supports critical applications like autonomous vehicles, industrial automation, and remote surgery.
    3. Massive Machine Type Communication (mMTC):

      • Aims to support a massive number of connected devices.
      • Enables efficient communication between Internet of Things (IoT) devices, smart cities, and sensor networks.

    File name: deepslice_data.csv

    Data Attributes (Columns Desc)

    • Device ID: Unique identifier for each device.
    • Connection Type: Specifies the type of connection (e.g., LTE, 5G).
    • Latency Requirements (ms): The maximum allowable latency for the device's operation.
    • Bandwidth Requirements (Mbps): The bandwidth needed for optimal device performance.
    • Reliability (%): The required reliability level for the device's connection.
    • Data Rate (Mbps): The data rate the device can handle.
    • Device Type: Categorizes the device (e.g., smartphone, IoT sensor).
    • Mobility (Low/Medium/High): Indicates the mobility level of the device.
    • Battery Life (hours): Expected battery life of the device.
    • Application Type: The primary application for the device's connection (e.g., video streaming, industrial control).

    Class Distribution

    The dataset includes labeled instances categorized into the three target classes: eMBB, URLLC, and mMTC. Each instance corresponds to a specific device configuration and its optimal network slice.

    Application and Relevance

    Network slicing in 5G is instrumental in provisioning tailored network services for specific use cases, ensuring optimal performance, resource utilization, and user experiences based on the requirements of eMBB, URLLC, and mMTC applications. This dataset is invaluable for researchers and practitioners aiming to design and implement ML models for network slice prediction, thereby enhancing the operational efficiency and reliability of 5G networks.

    Conclusion

    This dataset is meticulously curated to facilitate the development of ML models for predicting the optimal 5G network slice. It encompasses a comprehensive set of attributes and target classes, ensuring that it meets the highest standards required for advanced research and practical applications in the field of cellular communications and network management.

  2. Path loss at 5G high frequency range in South Asia

    • kaggle.com
    Updated Apr 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S M MEHEDI ZAMAN (2023). Path loss at 5G high frequency range in South Asia [Dataset]. https://www.kaggle.com/datasets/smmehedizaman/path-loss-at-5g-high-frequency-range-in-south-asia
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    S M MEHEDI ZAMAN
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    South Asia, Asia
    Description

    This dataset has been generated using NYUSIM 3.0 mm-Wave channel simulator software, which takes into account atmospheric data such as rain rate, humidity, barometric pressure, and temperature. The input data was collected over the course of a year in South Asia. As a result, the dataset provides an accurate representation of the seasonal variations in mm-wave channel characteristics in these areas. The dataset includes a total of 2835 records, each of which contains T-R Separation Distance (m), Time Delay (ns), Received Power (dBm), Phase (rad), Azimuth AoD (degree), Elevation AoD (degree), Azimuth AoA (degree), Elevation, AoA (degree), RMS Delay Spread (ns), Season, Frequency and Path Loss (dB). Four main seasons have been considered in this dataset: Spring, Summer, Fall, and Winter. Each season is subdivided into three parts (i.e., low, medium, and high), to accurately include the atmospheric variations in a season. To simulate the path loss, realistic Tx and Rx height, NLoS environment, and mean human blockage attenuation effects have been taken into consideration. The data has been preprocessed and normalized to ensure consistency and ease of use. Researchers in the field of mm-wave communications and networking can use this dataset to study the impact of atmospheric conditions on mm-wave channel characteristics and develop more accurate models for predicting channel behavior. The dataset can also be used to evaluate the performance of different communication protocols and signal processing techniques under varying weather conditions. Note that while the data was collected specifically in South Asia region, the high correlation between the weather patterns in this region and other areas means that the dataset may also be applicable to other regions with similar atmospheric conditions.

    Acknowledgements The paper in which the dataset was proposed is available on: https://ieeexplore.ieee.org/abstract/document/10307972

    Citation

    If you use this dataset, please cite the following paper:

    Rashed Hasan Ratul, S. M. Mehedi Zaman, Hasib Arman Chowdhury, Md. Zayed Hassan Sagor, Mohammad Tawhid Kawser, and Mirza Muntasir Nishat, “Atmospheric Influence on the Path Loss at High Frequencies for Deployment of 5G Cellular Communication Networks,” 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–6. https://doi.org/10.1109/ICCCNT56998.2023.10307972

    BibTeX ```bibtex @inproceedings{Ratul2023Atmospheric, author = {Ratul, Rashed Hasan and Zaman, S. M. Mehedi and Chowdhury, Hasib Arman and Sagor, Md. Zayed Hassan and Kawser, Mohammad Tawhid and Nishat, Mirza Muntasir}, title = {Atmospheric Influence on the Path Loss at High Frequencies for Deployment of {5G} Cellular Communication Networks}, booktitle = {2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)}, year = {2023}, pages = {1--6}, doi = {10.1109/ICCCNT56998.2023.10307972}, keywords = {Wireless communication; Fluctuations; Rain; 5G mobile communication; Atmospheric modeling; Simulation; Predictive models; 5G-NR; mm-wave propagation; path loss; atmospheric influence; NYUSIM; ML} }

  3. GCalignR: An R package for aligning gas-chromatography data for ecological...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meinolf Ottensmann; Martin A. Stoffel; Hazel J. Nichols; Joseph I. Hoffman (2023). GCalignR: An R package for aligning gas-chromatography data for ecological and evolutionary studies [Dataset]. http://doi.org/10.1371/journal.pone.0198311
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Meinolf Ottensmann; Martin A. Stoffel; Hazel J. Nichols; Joseph I. Hoffman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Chemical cues are arguably the most fundamental means of animal communication and play an important role in mate choice and kin recognition. Consequently, there is growing interest in the use of gas chromatography (GC) to investigate the chemical basis of eco-evolutionary interactions. Both GC-MS (mass spectrometry) and FID (flame ionization detection) are commonly used to characterise the chemical composition of biological samples such as skin swabs. The resulting chromatograms comprise peaks that are separated according to their retention times and which represent different substances. Across chromatograms of different samples, homologous substances are expected to elute at similar retention times. However, random and often unavoidable experimental variation introduces noise, making the alignment of homologous peaks challenging, particularly with GC-FID data where mass spectral data are lacking. Here we present GCalignR, a user-friendly R package for aligning GC-FID data based on retention times. The package was developed specifically for ecological and evolutionary studies that seek to investigate similarity patterns across multiple and often highly variable biological samples, for example representing different sexes, age classes or reproductive stages. The package also implements dynamic visualisations to facilitate inspection and fine-tuning of the resulting alignments and can be integrated within a broader workflow in R to facilitate downstream multivariate analyses. We demonstrate an example workflow using empirical data from Antarctic fur seals and explore the impact of user-defined parameter values by calculating alignment error rates for multiple datasets. The resulting alignments had low error rates for most of the explored parameter space and we could also show that GCalignR performed equally well or better than other available software. We hope that GCalignR will help to simplify the processing of chemical datasets and improve the standardization and reproducibility of chemical analyses in studies of animal chemical communication and related fields.

  4. Network traffic datasets with novel extended IP flow called NetTiSA flow

    • data.niaid.nih.gov
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josef Koumar; Karel Hynek; Jaroslav Pešek; Tomáš Čejka (2024). Network traffic datasets with novel extended IP flow called NetTiSA flow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8301042
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    CESNEThttp://www.cesnet.cz/
    Czech Technical University in Prague
    Authors
    Josef Koumar; Karel Hynek; Jaroslav Pešek; Tomáš Čejka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network traffic datasets with novel extended IP flow called NetTiSA flow

    Datasets were created for the paper: NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification -- Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka -- which is published in The International Journal of Computer and Telecommunications Networking https://doi.org/10.1016/j.comnet.2023.110147Please cite the usage of our datasets as:

    Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka, "NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification", Computer Networks, Volume 240, 2024, 110147, ISSN 1389-1286

    @article{KOUMAR2024110147, title = {NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification}, journal = {Computer Networks}, volume = {240}, pages = {110147}, year = {2024}, issn = {1389-1286}, doi = {https://doi.org/10.1016/j.comnet.2023.110147}, url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923}, author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka} }

    This Zenodo repository contains 23 datasets created from 15 well-known published datasets, which are cited in the table below. Each dataset contains the NetTiSA flow feature vector.

    NetTiSA flow feature vector

    The novel extended IP flow called NetTiSA (Network Time Series Analysed) flow contains a universal bandwidth-constrained feature vector consisting of 20 features. We divide the NetTiSA flow classification features into three groups by computation. The first group of features is based on classical bidirectional flow information---a number of transferred bytes, and packets. The second group contains statistical and time-based features calculated using the time-series analysis of the packet sequences. The third type of features can be computed from the previous groups (i.e., on the flow collector) and improve the classification performance without any impact on the telemetry bandwidth.

    Flow features

    The flow features are:

    Packets is the number of packets in the direction from the source to the destination IP address.

    Packets in reverse order is the number of packets in the direction from the destination to the source IP address.

    Bytes is the size of the payload in bytes transferred in the direction from the source to the destination IP address.

    Bytes in reverse order is the size of the payload in bytes transferred in the direction from the destination to the source IP address.

    Statistical and Time-based features

    The features that are exported in the extended part of the flow. All of them can be computed (exactly or in approximative) by stream-wise computation, which is necessary for keeping memory requirements low. The second type of feature set contains the following features:

    Mean represents mean of the payload lengths of packets

    Min is the minimal value from payload lengths of all packets in a flow

    Max is the maximum value from payload lengths of all packets in a flow

    Standard deviation is a measure of the variation of payload lengths from the mean payload length

    Root mean square is the measure of the magnitude of payload lengths of packets

    Average dispersion is the average absolute difference between each payload length of the packet and the mean value

    Kurtosis is the measure describing the extent to which the tails of a distribution differ from the tails of a normal distribution

    Mean of relative times is the mean of the relative times which is a sequence defined as (st = {t_1 - t_1, t_2 - t_1, ..., t_n - t_1} )

    Mean of time differences is the mean of the time differences which is a sequence defined as (dt = { t_j - t_i | j = i + 1, i \in {1, 2, \dots, n - 1} }.)

    Min from time differences is the minimal value from all time differences, i.e., min space between packets.

    Max from time differences is the maximum value from all time differences, i.e., max space between packets.

    Time distribution describes the deviation of time differences between individual packets within the time series. The feature is computed by the following equation:(tdist = \frac{ \frac{1}{n-1} \sum_{i=1}^{n-1} \left| \mu_{{dt_{n-1}}} - dt_i \right| }{ \frac{1}{2} \left(max\left({dt_{n-1}}\right) - min\left({dt_{n-1}}\right) \right) })

    Switching ratio represents a value change ratio (switching) between payload lengths. The switching ratio is computed by equation:(sr = \frac{s_n}{\frac{1}{2} (n - 1)})

        where \(s_n\) is number of switches.
    

    Features computed at the collectorThe third set contains features that are computed from the previous two groups prior to classification. Therefore, they do not influence the network telemetry size and their computation does not put additional load to resource-constrained flow monitoring probes. The NetTiSA flow combined with this feature set is called the Enhanced NetTiSA flow and contains the following features:

    Max minus min is the difference between minimum and maximum payload lengths

    Percent deviation is the dispersion of the average absolute difference to the mean value

    Variance is the spread measure of the data from its mean

    Burstiness is the degree of peakedness in the central part of the distribution

    Coefficient of variation is a dimensionless quantity that compares the dispersion of a time series to its mean value and is often used to compare the variability of different time series that have different units of measurement

    Directions describe a percentage ratio of packet direction computed as (\frac{d_1}{ d_1 + d_0}), where (d_1) is a number of packets in a direction from source to destination IP address and (d_0) the opposite direction. Both (d_1) and (d_0) are inside the classical bidirectional flow.

    Duration is the duration of the flow

    The NetTiSA flow is implemented into IP flow exporter ipfixprobe.

    Description of dataset files

    In the following table is a description of each dataset file:

    File name

    Detection problem

    Citation of the original raw dataset

    botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

    botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

    cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

    cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

    dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.

    doh_cic.csv Binary detection of DoH Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

    doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022

    dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.

    edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

    edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

    https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020

    ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

    ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

    unsw_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

    unsw_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

    iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23

    ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

    ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets.

  5. S1 Data -

    • figshare.com
    zip
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhongyuan Lv (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0287757.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Zhongyuan Lv
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the emergence of Artificial Intelligence technology and the advancement of science and technology, the current mainstream path of social development is continuously updating and improving various industries using technology. Therefore, in order to promote the development of sneaker consumer culture, this study explores the use of technological means to improve the dissemination effect of symbolic culture in sneaker consumer culture. Firstly, the development concept and mainstream direction of sneaker consumer culture in the era of big data are discussed, and the application principle of big data technology is introduced. Then, a sneaker culture dissemination model based on big data technology is designed. Finally, the model is optimized using a Convolutional Neural Network (CNN), and its effectiveness is evaluated. The results show that the Convolutional Neural Network-Big Data (CNN-BD) model designed in this study has the highest fitting degree of 93% and a lowest fitting degree of 78% in the UT-Zap50K dataset. In the Ai2 dataset, the highest fitting degree of the big data classification model is 94%, and the lowest is 76%. In the Kaggle Women’s Shoe dataset, the highest fitting degree of the big data classification model is 92%, and the lowest is 77%. In the Kaggle Men’s Shoe dataset, the highest fitting degree of the big data classification model is 94%, and the lowest is 79%. The designed model has the highest accuracy rate of 93% in sneaker classification, while other models have the highest accuracy rate of around 82% in sneaker classification. Compared with traditional big data technology, the designed model has greatly improved and can adapt to more working environments. This study not only provides technical support for the application of big data technology but also contributes to improving the dissemination effect and promoting the comprehensive development of sneaker consumer culture.

  6. Synthetic_Data_Satellite_Health

    • kaggle.com
    zip
    Updated Mar 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JeffDJeffD (2024). Synthetic_Data_Satellite_Health [Dataset]. https://www.kaggle.com/datasets/jeffdjeffd/synthetic-data-satellite-health
    Explore at:
    zip(458476 bytes)Available download formats
    Dataset updated
    Mar 2, 2024
    Authors
    JeffDJeffD
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This purpose of this (imaginary) study is to detect faulty satellite in order to prevent communication interruption. Understand how confirmation bias can lead to misleading results by utilizing a synthetic data set with a developed story line.

    Problem Statement:

    You are leading a team to conduct a study of the that will allow the space agency to predict the health status of satellites based on telemetry data to enable proactive maintenance and ensure optimal performance in space missions.

    Stakeholders' Concerns:

    1. Accurately predicting satellite health to minimize the risk of mission failures and optimize satellite usage.

    2. Identifying the most critical factors that affect satellite health to focus on improving those aspects during the satellite design and maintenance process.

    3. Reducing the rate of false positives and false negatives in predictions to avoid unnecessary maintenance efforts and ensure that actual issues are addressed promptly.

    Misclassification Costs (estimation):

    False Positive (predicting a malfunction when the component is healthy): Unnecessary maintenance check: $5,000 Unwarranted component replacement: $50,000

    False Negative (predicting a component is healthy when it is malfunctioning): Data loss or degradation: $100,000 Partial mission failure: $500,000 Total mission failure or satellite loss: $300,000,000

    Team Focus:

    1. Thoroughly exploring the data to understand the relationships between various telemetry variables and satellite health.

    2. Ensuring the model is accurate and reliable by selecting appropriate algorithms, performing feature engineering, and validating the model's performance using relevant metrics.

    3. Identifying and addressing any data quality issues, such as missing values and incorrect data.

    4. Investigating the importance of each variable in the prediction task and communicating these insights to stakeholders for better decision-making

    DATA DICTIONARY:

    Data Dictionary

    1. time_since_launch (days)

    Range: 0 to 3650 Description: Time since the satellite was launched.

    1. orbital_altitude (km)

    Range: 300 to 2000 Description: Altitude of the satellite's orbit.

    1. battery_voltage (V)

    Range: 20 to 30 Description: Satellite's battery voltage.

    1. solar_panel_temperature (°C)

    Range: -50 to 50 Description: Temperature of the satellite's solar panels.

    1. attitude_control_error (degrees)

    Range: 0 to 5 Description: Error in the satellite's attitude control system.

    1. data_transmission_rate (Mbps)

    Range: 10 to 100 Description: Rate of data transmission from the satellite to the ground station.

    1. thermal_control_status (0 or 1)

    Range: 0 (not working) or 1 (working) Description: Binary flag indicating if the thermal control system is working or not.

    1. satellite_health (0 or 1)

    Range: 0 (unhealthy) or 1 (healthy) Description: Target variable - binary flag indicating if the satellite is healthy or unhealthy

  7. Enterprise Survey 2013 - Pakistan

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Oct 8, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2015). Enterprise Survey 2013 - Pakistan [Dataset]. https://microdata.worldbank.org/index.php/catalog/2363
    Explore at:
    Dataset updated
    Oct 8, 2015
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Authors
    World Bank
    Time period covered
    2013 - 2014
    Area covered
    Pakistan
    Description

    Abstract

    This survey was conducted in India between June 2013 and December 2014 as part of the Enterprise Survey project, an initiative of the World Bank. The objective of the survey is to obtain feedback from enterprises on the state of the private sector as well as to help in building a panel of enterprise data that will make it possible to track changes in the business environment over time, thus allowing, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the survey assesses the constraints to private sector growth and creates statistically significant business environment indicators that are comparable across countries.

    The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90% of the questions objectively ascertain characteristics of a country's business environment. The remaining questions assess the survey respondents' opinions on what are the obstacles to firm growth and performance.

    Data from 1,247 establishments was analyzed. Stratified random sampling was used to select the surveyed businesses. Data was collected using face-to-face interviews.

    Geographic coverage

    National

    Analysis unit

    The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

    Universe

    The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample was selected using stratified random sampling. Three levels of stratification were used: industry, establishment size, and region.

    For stratification by industry, the universe was stratified into seven manufacturing industries (food, textiles, garments, chemicals, non-metalic minerals, motor vehicles, other manufacturing) and two service sectors (retail and other services).

    Size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees).

    Regional stratification was defined in 5 regions: Punjab, Sindh, KPK, Balochistan, and Islamabad.

    The sample frame for Manufacturing establishments was from the Pakistan Bureau of Statistics (PBS). For Retail and Other Services establishments, Nielsen Pakistan provided the sample frame through desk research. For confidentiality purposes, PBS randomly drew the sample of fresh manufacturing establishment to be interviewed based on the sample design provided by the World Bank.

    The combination of the PBS-provided sample along with the services lists from Nielsen were then used as the sample frame for the Pakistan Enterprise Survey with the aim of obtaining interviews at 1,320 establishments.

    Given the impact that non-eligible units included in the sample universe may have on the results, adjustments may be needed when computing the appropriate weights for individual observations. The percentage of confirmed non-eligible units as a proportion of the total number of sampled establishments contacted for the survey was 4.1% (116 out of 2,841 establishments).

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The structure of the database reflects the fact that two different versions of the questionnaire were used for 3 categories of businesses (manufacturing, retail, and other services/non-retail). The Manufacturing Questionnaire includes all common questions asked to all establishments and some specific questions relevant to manufacturing firms. The Services Questionnaire, administered to retail and other services/non-retail establishments, includes all common questions asked to all establishments and some specific questions relevant retail and other services firms. Each variation of the questionnaire is identified by the index variable, a0.

    All variables are named using, first, the letter of each section and, second, the number of the variable within the section, i.e. a1 denotes section A, question 1. Variable names proceeded by a prefix "SAR" or "IND" indicate questions specific to the South Asia region or India only, therefore, they may not be found in the implementation of the rollout in other countries. All other suffixed variables are global and are present in all country surveys over the world. All variables are numeric with the exception of those variables with an "x" at the end of their names. The suffix "x" denotes that the variable is alpha-numeric.

    Cleaning operations

    Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.

    Response rate

    Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

    Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as a different option from don’t know. b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary.

    Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals.

    The number of interviews per contacted establishments was 0.16. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 0.12.

  8. n

    Survey of the road between Casey Station and Old Casey Station, 9 March 1999...

    • access.earthdata.nasa.gov
    • researchdata.edu.au
    • +1more
    Updated Jun 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Survey of the road between Casey Station and Old Casey Station, 9 March 1999 [Dataset]. https://access.earthdata.nasa.gov/collections/C1214313486-AU_AADC
    Explore at:
    Dataset updated
    Jun 4, 2018
    Time period covered
    Mar 9, 1999
    Area covered
    Description

    A GPS survey by Andrew Ruddell (AAD Glaciology) on 9 March 1999 at Casey Station, Antarctica. The survey was conducted along the road from Casey Station to Old Casey. The aim of the survey was to investigate the cause of the 'disappearance' of road gravel applied to the compacted snow road in the depression between Casey Station to Old Casey. This dataset consists of point data with an elevation (above mean sea level) attribute. The data, in Excel and shapefile formats, and Andrew's report are available for download (see Related URL below).

  9. f

    Movement definitions and manifestations of the 7 behavior outcome classes in...

    • figshare.com
    xls
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Von Ralph Dane Marquez Herbuela; Tomonori Karita; Yoshiya Furukawa; Yoshinori Wada; Akihiro Toya; Shuichiro Senba; Eiko Onishi; Tatsuo Saeki (2023). Movement definitions and manifestations of the 7 behavior outcome classes in comparison with the Attuning Theory. [Dataset]. http://doi.org/10.1371/journal.pone.0269472.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Von Ralph Dane Marquez Herbuela; Tomonori Karita; Yoshiya Furukawa; Yoshinori Wada; Akihiro Toya; Shuichiro Senba; Eiko Onishi; Tatsuo Saeki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Movement definitions and manifestations of the 7 behavior outcome classes in comparison with the Attuning Theory.

  10. Data sources used in this analysis, updated frequently to support COVID-19...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. DelPozo-Banos; S. C. Lee; Y. Friedmann; A. Akbari; F. Torabi; K. Lloyd; R. A. Lyons; A. John (2023). Data sources used in this analysis, updated frequently to support COVID-19 research. [Dataset]. http://doi.org/10.1371/journal.pone.0266967.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    M. DelPozo-Banos; S. C. Lee; Y. Friedmann; A. Akbari; F. Torabi; K. Lloyd; R. A. Lyons; A. John
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data sources used in this analysis, updated frequently to support COVID-19 research.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Puspak Meher (2022). Network Slicing [Dataset]. https://www.kaggle.com/datasets/puspakmeher/networkslicing
Organization logo

Network Slicing

5g Network Slicing dataset according to 7 parameters

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(307691 bytes)Available download formats
Dataset updated
Aug 8, 2022
Authors
Puspak Meher
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Overview

Cellular communications, especially with the advent of 5G mobile networks, demand stringent adherence to high-reliability standards, ultra-low latency, increased capacity, enhanced security, and high-speed user connectivity. To fulfill these requirements, mobile operators require a programmable solution capable of supporting multiple independent tenants on a single physical infrastructure. The advent of 5G networks facilitates end-to-end resource allocation through Network Slicing (NS), which allows for the division of the network into distinct virtual slices.

Network slicing in 5G stands as a pivotal feature for next-generation wireless networks, delivering substantial benefits to both mobile operators and businesses. Developing a Machine Learning (ML) model is crucial for accurately predicting the optimal network slice based on key device parameters. Such a model also plays a vital role in managing network load balancing and addressing network slice failures.

Dataset Characteristics and Target Classes

The dataset is structured to support the development of an ML model that can classify the optimal network slice based on device parameters. The target output comprises three distinct classes:

  1. Enhanced Mobile Broadband (eMBB):

    • Focuses on high-bandwidth and high-speed data transmission.
    • Facilitates activities such as high-definition video streaming, online gaming, and immersive media experiences.
  2. Ultra-Reliable Low Latency Communication (URLLC):

    • Emphasizes extremely reliable and low-latency connections.
    • Supports critical applications like autonomous vehicles, industrial automation, and remote surgery.
  3. Massive Machine Type Communication (mMTC):

    • Aims to support a massive number of connected devices.
    • Enables efficient communication between Internet of Things (IoT) devices, smart cities, and sensor networks.

File name: deepslice_data.csv

Data Attributes (Columns Desc)

  • Device ID: Unique identifier for each device.
  • Connection Type: Specifies the type of connection (e.g., LTE, 5G).
  • Latency Requirements (ms): The maximum allowable latency for the device's operation.
  • Bandwidth Requirements (Mbps): The bandwidth needed for optimal device performance.
  • Reliability (%): The required reliability level for the device's connection.
  • Data Rate (Mbps): The data rate the device can handle.
  • Device Type: Categorizes the device (e.g., smartphone, IoT sensor).
  • Mobility (Low/Medium/High): Indicates the mobility level of the device.
  • Battery Life (hours): Expected battery life of the device.
  • Application Type: The primary application for the device's connection (e.g., video streaming, industrial control).

Class Distribution

The dataset includes labeled instances categorized into the three target classes: eMBB, URLLC, and mMTC. Each instance corresponds to a specific device configuration and its optimal network slice.

Application and Relevance

Network slicing in 5G is instrumental in provisioning tailored network services for specific use cases, ensuring optimal performance, resource utilization, and user experiences based on the requirements of eMBB, URLLC, and mMTC applications. This dataset is invaluable for researchers and practitioners aiming to design and implement ML models for network slice prediction, thereby enhancing the operational efficiency and reliability of 5G networks.

Conclusion

This dataset is meticulously curated to facilitate the development of ML models for predicting the optimal 5G network slice. It encompasses a comprehensive set of attributes and target classes, ensuring that it meets the highest standards required for advanced research and practical applications in the field of cellular communications and network management.

Search
Clear search
Close search
Google apps
Main menu