23 datasets found
  1. Monte Carlo Simulation dataset

    • kaggle.com
    zip
    Updated Sep 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MD TANVIR SARDARR (2025). Monte Carlo Simulation dataset [Dataset]. https://www.kaggle.com/datasets/mdtanvirsardarr/monte-carlo-simulation-dataset
    Explore at:
    zip(303090 bytes)Available download formats
    Dataset updated
    Sep 10, 2025
    Authors
    MD TANVIR SARDARR
    Description

    Dataset

    This dataset was created by MD TANVIR SARDARR

    Released under Other (specified in description)

    Contents

  2. WBAN-DS using Monte Carlo Simulation

    • kaggle.com
    zip
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    md Shamshuzzoha (2024). WBAN-DS using Monte Carlo Simulation [Dataset]. https://www.kaggle.com/datasets/mdshamshuzzoha/wban-ds-using-monte-carlo-simulation
    Explore at:
    zip(11204260 bytes)Available download formats
    Dataset updated
    Dec 10, 2024
    Authors
    md Shamshuzzoha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    These datasets were generated using Monte Carlo simulations to address critical challenges in Wireless Body Area Networks (WBANs), including anomaly detection and intrusion prevention. The datasets are designed for machine learning research in network security, particularly for healthcare environments where data protection is vital.

    Anomaly Detection Dataset: Focuses on identifying abnormal patterns in sensor data from WBANs, including metrics such as ECG, Spo2, BP, and temperature. Intrusion Detection Dataset: Captures network-level attacks such as unauthorized access and spoofing, with features like IP, port, protocol, and packet size. These datasets aim to enable researchers and developers to create and test lightweight machine learning models suitable for resource-constrained environments like WBANs.

  3. 2023CWC Safrica batting

    • kaggle.com
    zip
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JDau (2023). 2023CWC Safrica batting [Dataset]. https://www.kaggle.com/datasets/jdaustralia/2023cwc-safrica-batting
    Explore at:
    zip(1644 bytes)Available download formats
    Dataset updated
    Nov 3, 2023
    Authors
    JDau
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Bowling stats based on https://stats.espncricinfo.com/ statsguru data. Additional columns translating this data to include wicket ball probability and runs per ball (rpb) have also been included. Data for batsman who didn't face a ball has been removed

  4. Images of 9 types of control chart patterns

    • kaggle.com
    zip
    Updated Sep 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanchay Mahato (2024). Images of 9 types of control chart patterns [Dataset]. https://www.kaggle.com/datasets/sanchaymahato/mtp-vit-dataset/data
    Explore at:
    zip(18900994 bytes)Available download formats
    Dataset updated
    Sep 27, 2024
    Authors
    Sanchay Mahato
    Description

    Dataset

    This dataset was created by Sanchay Mahato

    Contents

  5. CWC2023 India bowling

    • kaggle.com
    zip
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JDau (2023). CWC2023 India bowling [Dataset]. https://www.kaggle.com/datasets/jdaustralia/cwc2023-india-bowling
    Explore at:
    zip(1080 bytes)Available download formats
    Dataset updated
    Nov 3, 2023
    Authors
    JDau
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Bowling stats based on https://stats.espncricinfo.com/ statsguru data. Additional columns translating this data to include wicket ball probability and runs per ball (rpb) have also been included. Data for non bowlers has been removed

  6. 2023CWC India Batting

    • kaggle.com
    zip
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JDau (2023). 2023CWC India Batting [Dataset]. https://www.kaggle.com/datasets/jdaustralia/2023cwc-india-batting
    Explore at:
    zip(1341 bytes)Available download formats
    Dataset updated
    Nov 3, 2023
    Authors
    JDau
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    India
    Description

    Bowling stats based on https://stats.espncricinfo.com/ statsguru data. Additional columns translating this data to include wicket ball probability and runs per ball (rpb) have also been included. Data for batsman who didn't face a ball has been removed

  7. Texas Holdem Monte Carlo Data

    • kaggle.com
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Niesmertelny (2025). Texas Holdem Monte Carlo Data [Dataset]. https://www.kaggle.com/datasets/benjaminniesmertelny/texas-holdem-monte-carlo-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Benjamin Niesmertelny
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Texas
    Description

    Texas Holdem Monte Carlo

    This dataset contains various tables that represent statistical data related to to simulated poker games and estimate equity in a number of situations, which ultimately could be useful to gain insights into player behavior, game dynamics, and strategies that can improve performance in poker games. Equity estimates are based on hand features for winning hands as they appear over Monte-Carlo simulation data.

    The entire project which generated this dataset is linked here.

  8. Reverberation Times (RT60s) of Rectangular Rooms

    • kaggle.com
    zip
    Updated Jul 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Z Chen (2025). Reverberation Times (RT60s) of Rectangular Rooms [Dataset]. https://www.kaggle.com/datasets/andrewzc1/reverberation-times-rt60s-of-rectangular-rooms/code
    Explore at:
    zip(214044 bytes)Available download formats
    Dataset updated
    Jul 28, 2025
    Authors
    Andrew Z Chen
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The columns of the dataset are:

    • x_dim, y_dim, z_dim: x, y, and z dimensions of the room
    • reflection_eff: reflection efficiency of each wall of the room
    • receiver1, receiver2, receiver3, receiver4: the x, y, and z positions of the receivers
    • volume: volume of the room
    • surface_area: surface area of the room
    • sabine_rt60: RT60 as estimated from room geometry by the Sabine equation (https://en.wikipedia.org/wiki/Reverberation#Sabine_equation)
    • eyring_rt60: RT60 as estimated from room geometry by the Eyring equation (https://en.wikipedia.org/wiki/Reverberation#Eyring_equation)
    • rt60_threshold: RT60 as computed by ray tracing simulation, by a linear regression on the log-power of ray hits over time, taking the time at which the fitted model decays by 60 dB
    • rt60_schroeder: RT60 as computed by ray tracing simulation, by Schroeder integration of the discrete ray hits, using a cumulative sum of hit powers over time
    • rt60_gaussian_schroeder: RT60 as computed by ray tracing simulation, by estimated via Schroeder integration on a smoothed energy decay curve, obtained by convolving the discrete ray-hit signal with a normalized Gaussian
  9. Firm Competition Analysis

    • kaggle.com
    zip
    Updated Feb 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monte Carlo (2024). Firm Competition Analysis [Dataset]. https://www.kaggle.com/datasets/montecarlo31/firm-competition-analysis
    Explore at:
    zip(3597443 bytes)Available download formats
    Dataset updated
    Feb 11, 2024
    Authors
    Monte Carlo
    Description

    This is a dataset for the analysis of firm competition using heterogeneous information of employees. The dataset includes the occupational and educational data of employees, as well as ground-truth labels of the competitive relationship between firms. Specifically, three data files are included, whose data dictionaries are provided as follows.

    Employment.csv - Employee: employee ID (hashed) - Year: start year of the employment record - Company: company ID - Title: job title

    Education.csv - Employee: employee ID (hashed) - Year: start year of the education record - School: school or university

    Competitors.csv - Company: company ID - Company2: company ID - Is_Competitor: ground-truth label for the competitive relationship - Dataset: train/validation/test

  10. Operational Risk Events Dataset

    • kaggle.com
    zip
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). Operational Risk Events Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/operational-risk-events-dataset
    Explore at:
    zip(41242 bytes)Available download formats
    Dataset updated
    Jul 1, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains 2147 records of operational risk events for retail banking, combining real-world observed loss events (OLE) and synthetic available loss events (ALE) generated via Monte Carlo simulation.

    Each record captures structured details about banking operational risk incidents, including:

    Event Type (e.g., Cyber-Fraud, System Failure, Phishing)

    Process Area (Retail Banking, ATM Network, E-Banking, etc.)

    Loss Amount (kUSD) and Frequency (Nk)

    Severity (Xi) estimated via loss distribution modeling

    OpVar (%) at 99.9% confidence level (Basel II/III standard)

    ML (Magnitude of Loss ratio) comparing synthetic stress to observed baseline

    Macro-Financial Indicators (GDP growth %, VIX volatility index)

    Target column for classification modeling (risk level: Low, Medium, High)

    The dataset is designed to support operational risk estimation, scenario analysis, stress testing, and machine learning model development (e.g. for predicting loss severity or classifying risk levels).

    It reflects realistic banking risk conditions across multiple business lines—such as e-banking, ATM networks, and payment platforms—and includes both historical data and simulated future scenarios to evaluate model generalization under diverse risk environments.

  11. LT-FS-ID: Intrusion detection in WSNs

    • kaggle.com
    Updated Jul 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diaa Eldiyn Essam (2023). LT-FS-ID: Intrusion detection in WSNs [Dataset]. https://www.kaggle.com/datasets/diaaessam/lt-fs-id-intrusion-detection-in-wsns/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 14, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Diaa Eldiyn Essam
    Description

    Description:There exist five columns in this dataset. The first four columns are features (i.e., area, sensing range, transmission range, number of sensor nodes), and the last column is the predictor or target variable (i.e., Number of barriers). This dataset is synthetically created through Monte-Carlo simulations. License: This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. Citation: Singh,Abhilash. (2022). LT-FS-ID: Intrusion detection in WSNs. UCI Machine Learning Repository. https://doi.org/10.3390/s22031070.

  12. Environmental and Energy Policy Impacts: OECD

    • kaggle.com
    zip
    Updated Jun 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ioana Birlan (2024). Environmental and Energy Policy Impacts: OECD [Dataset]. https://www.kaggle.com/datasets/ioanabirlan/green-growth
    Explore at:
    zip(1903 bytes)Available download formats
    Dataset updated
    Jun 2, 2024
    Authors
    Ioana Birlan
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset provides comprehensive panel data (2000-2019) on various environmental and energy metrics across two regions, focusing on indicators that influence CO2 emissions. It includes data from OECD statistics and encompasses variables that serve as indicators of smart urban development and governmental policies. Researchers and policymakers can use this dataset to analyze the impact of various factors on CO2 emissions and to compare the effectiveness of environmental policies between OECD countries in Europe and America. The dataset is valuable for exploring significant differences and similarities in environmental and energy policies, municipal waste management, renewable energy adoption, and technology development.

    Variables included: - Production-based CO2 emissions - Total primary energy supply - Renewable energy supply, % total energy supply - Municipal waste recycled or composted, % treated waste - Mortality from exposure to ambient PM2.5 - Welfare costs of premature mortalities from exposure to ambient PM2.5 - Development of environment-related technologies, % all technologies - Relative advantage in environment-related technology - Environmentally related taxes, % GDP - Terrestrial protected area, % land area - Population density inhabitants per km²

    Usage: This dataset is ideal for analyzing the determinants of CO2 emissions and understanding the effectiveness of different environmental policies. Users can explore:

    • The relationship between renewable energy supply and CO2 emissions.
    • Comparative analysis of CO2 emissions across OECD countries in Europe and America.
    • Evaluation of the effectiveness of environmental policies and their economic implications.
    • Assessment of the role of renewable energy and technology development in reducing emissions.
    • Exploration of smart urban development indicators and their impact.
    • The impact of municipal waste management practices on environmental outcomes.
    • Differences in environmental technology development between European and American OECD countries.
    • The influence of population density and urban development on CO2 emissions.
    • Statistical tests to test significant difference between the two regions.
    • Forecasting emissions for both regions
    • Scenario tests such as Monte Carlo Simulations

    Source: The data is sourced from OECD Statistics

  13. MIMIC-III - Deep Reinforcement Learning

    • kaggle.com
    zip
    Updated Apr 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asjad K (2022). MIMIC-III - Deep Reinforcement Learning [Dataset]. https://www.kaggle.com/datasets/asjad99/mimiciii
    Explore at:
    zip(11100065 bytes)Available download formats
    Dataset updated
    Apr 7, 2022
    Authors
    Asjad K
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Digitization of healthcare data along with algorithmic breakthroughts in AI will have a major impact on healthcare delivery in coming years. Its intresting to see application of AI to assist clinicians during patient treatment in a privacy preserving way. While scientific knowledge can help guide interventions, there remains a key need to quickly cut through the space of decision policies to find effective strategies to support patients during the care process.

    Offline Reinforcement learning (also referred to as safe or batch reinforcement learning) is a promising sub-field of RL which provides us with a mechanism for solving real world sequential decision making problems where access to simulator is not available. Here we assume that learn a policy from fixed dataset of trajectories with further interaction with the environment(agent doesn't receive reward or punishment signal from the environment). It has shown that such an approach can leverage vast amount of existing logged data (in the form of previous interactions with the environment) and can outperform supervised learning approaches or heuristic based policies for solving real world - decision making problems. Offline RL algorithms when trained on sufficiently large and diverse offline datasets can produce close to optimal policies(ability to generalize beyond training data).

    As Part of my PhD, research, I investigated the problem of developing a Clinical Decision Support System for Sepsis Management using Offline Deep Reinforcement Learning.

    MIMIC-III ('Medical Information Mart for Intensive Care') is a large open-access anonymized single-center database which consists of comprehensive clinical data of 61,532 critical care admissions from 2001–2012 collected at a Boston teaching hospital. Dataset consists of 47 features (including demographics, vitals, and lab test results) on a cohort of sepsis patients who meet the sepsis-3 definition criteria.

    we try to answer the following question:

    Given a particular patient’s characteristics and physiological information at each time step as input, can our DeepRL approach, learn an optimal treatment policy that can prescribe the right intervention(e.g use of ventilator) to the patient each stage of the treatment process, in order to improve the final outcome(e.g patient mortality)?

    we can use popular state-of-the-art algorithms such as Deep Q Learning(DQN), Double Deep Q Learning (DDQN), DDQN combined with BNC, Mixed Monte Carlo(MMC) and Persistent Advantage Learning (PAL). Using these methods we can train an RL policy to recommend optimum treatment path for a given patient.

    Data acquisition, standard pre-processing and modelling details can be found here in Github repo: https://github.com/asjad99/MIMIC_RL_COACH

  14. ZparticleSearch

    • kaggle.com
    zip
    Updated Apr 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Erik Risholm (2024). ZparticleSearch [Dataset]. https://www.kaggle.com/datasets/larserikrisholm/zparticlesearch
    Explore at:
    zip(6234879740 bytes)Available download formats
    Dataset updated
    Apr 14, 2024
    Authors
    Lars Erik Risholm
    Description

    Disclaimer: All these dataset are collected from opendata CERN and are converted from root files to hdf5. All neccesary citing is included for each dataset beneath:

    This is a number of datasets used for the task: "Search for new physics at the Large Hadron Collider"

    These datasets are simulations of various particle physics processes that could occur in a collider like the Large Hadron Collider (LHC). They are used for training and testing machine learning algorithms to distinguish between different types of events. All dataset simulations were generated using the Monte Carlo event creator.

    Here's what each dataset represents:

    mc_105986.ZZ.hdf5: Events where a pair of Z bosons are produced. This is a type of diboson production, a standard model process. Source: ATLAS Collaboration (2016). MC:Diboson process ZZ, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.MW2E.VY3R

    mc_105987.WZ.hdf5: Events where a W boson and a Z boson are produced together. Another diboson process. Source: ATLAS Collaboration (2016). MC:Diboson process WZ, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.7X9L.ZZ8H

    mc_110090.stop_tchan_top.hdf5: Single top quark production through the t-channel. Source: ATLAS Collaboration (2016). MC:Single top t-channel top, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.ZKB4.7X2E

    mc_110091.stop_tchan_antitop.hdf5: Single antitop quark production through the t-channel. Source: ATLAS Collaboration (2016). MC:Single top t-channel antitop, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.LD9U.V7P9

    mc_110119.stop_schan.hdf5: Single top (or antitop) quark production via the s-channel. Source: ATLAS Collaboration (2016). MC:Single top s-channel, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.9ZQW.R5SC

    mc_110140.stop_wtchan.hdf5: Single top quark production associated with a W boson, known as the tW-channel or associated production. Source: ATLAS Collaboration (2016). MC:Single top Wt-channel, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.D5RX.N4F8

    The datasets Zprime500 to Zprime 2500 are events simulating the production of a hypothetical Z' boson, a heavy gauge boson predicted by some extensions of the standard model. The numbers (500, 750, 1000, etc.) refer to the mass of the Z' boson in GeV/c².

    mc_110901.ZPrime500.hdf5: Source: ATLAS Collaboration (2016). MC:Z' → tt with MZ' = 500 GeV, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.UJC5.DF2N

    mc_110902.ZPrime750.hdf5: Source: ATLAS Collaboration (2016). MC:Z' → tt with MZ' = 750 GeV, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.W8VJ.NRVQ

    mc_110903.ZPrime1000.hdf5: Source: ATLAS Collaboration (2016). MC:Z' → tt with MZ' = 1000 GeV, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.ERXL.ZU41

    mc_110905.ZPrime1500.hdf5: Source: ATLAS Collaboration (2016). MC:Z' → tt with MZ' = 1500 GeV, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.G38J.HVDL

    mc_110906.ZPrime1750.hdf5: Source: ATLAS Collaboration (2016). MC:Z' → tt with MZ' = 1750 GeV, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.YSUF.NRUE

    mc_110907.ZPrime2000.hdf5: Source: ATLAS Collaboration (2016). MC:Z' → tt with MZ' = 2000 GeV, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.TMVL.VTP7

    mc_110909.ZPrime2500.hdf5: Source: ATLAS Collaboration (2016). MC:Z' → tt with MZ' = 2500 GeV, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.9XY3.CWJK

    mc_117049.ttbar_had.hdf5: Top-antitop quark pair production where both top quarks decay hadronically. Source: ATLAS Collaboration (2016). MC:tt → Jets, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.UZA2.CJH2

    mc_117050.ttbar_lep.hdf5: Top-antitop quark pair production where at least one of the top quarks decays leptonically. Source: ATLAS Collaboration (2016). MC:tt → l + X, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.FSPW.5CA5

    mc_147770.Zee.hdf5: Z boson production with a decay into an electron-positron pair. Source: ATLAS Collaboration (2016). MC:Zee + Jets, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.TC5G.AC24

    mc_147771.Zmumu.hdf5: Z boson production with a decay into a muon-antimuon pair. Source: ATLAS Collaboration (2016). MC:Zee + Jets, for 2016 ATLAS open data release. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.TC5G.AC24

    mc_147772.Ztautau.hdf5: Z boson production with a decay into a tau-antitau pair. Source: ATLAS Collab...

  15. Android Systems Comparison: Custom ROM & Stock ROM

    • kaggle.com
    zip
    Updated Jun 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel One (2024). Android Systems Comparison: Custom ROM & Stock ROM [Dataset]. https://www.kaggle.com/datasets/gabrielluizone/stock-rom-vs-custom-rom/data
    Explore at:
    zip(525664 bytes)Available download formats
    Dataset updated
    Jun 24, 2024
    Authors
    Gabriel One
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset was created for educational purposes to simulate the application of the scientific method in evaluating customized operating systems for smartphones. The data compare the performance and usability of Havoc-OS GSI, based on Android 14, with factory ROMs (Stock ROM) of different smartphones. The data were generated using Monte Carlo simulation methods, adjusted by domain knowledge.

    Dataset Columns: - ROM_Type: Type of ROM (StockROM or CustomROM) - Memory_Usage_MB: Memory usage (MB) - CPU_Temperature_C: CPU temperature (°C) - Battery_Duration_Min: Battery duration (minutes) - CPU_Usage_Percent: CPU usage (%) - Crashes_Per_Day: Number of crashes per day - Bugs_Per_Day: Number of bugs per day - camera_Working: Camera working (1: Yes, 0: No) - wifi_Working: Wi-Fi working (1: Yes, 0: No) - bluetooth_Working: Bluetooth working (1: Yes, 0: No) - audio_Working: Audio working (1: Yes, 0: No) - flashlight_Working: Flashlight working (1: Yes, 0: No) - sensors_Working: Sensors working (1: Yes, 0: No) - mobile_network_Working: Mobile network working (1: Yes, 0: No) - All_Components_Working: All components working (True/False)

    Context:

    This study aims to investigate whether Havoc-OS GSI offers performance and usability advantages compared to Stock ROMs, using a representative sample of 400 smartphones in the Brazilian market.

    Usage Instructions:

    The data can be used to explore comparisons between Custom ROMs and Stock ROMs, test performance and usability hypotheses, and apply data analysis and visualization techniques.

  16. Turbofan HPC Efficiency

    • kaggle.com
    zip
    Updated Apr 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Caparroz (2021). Turbofan HPC Efficiency [Dataset]. https://www.kaggle.com/datasets/nicolascaparroz/turbofan-hpc-efficiency/code
    Explore at:
    zip(1587576 bytes)Available download formats
    Dataset updated
    Apr 14, 2021
    Authors
    Nicolas Caparroz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    I come from a Mechanical Engineering background and, although I wish to slowly redirect my career into Data Science, my vision is that soon many aspects of the Engineering world will be data-driven, especially predictive maintenance. I want to be part of that change. That is my main motivation for pursuing this amazing path, thus most of my projects and efforts will be around "cool engineering stuff".

    Content

    This is a very clean and neat dataset created through Monte Carlo Simulations. Each column represents a specific variable from a Turbofan engine (one of these variables is the High-Pressure Compressor's isentropic efficiency). Each row contains the value from each variable corresponding to a specific efficiency number.

    Inspiration

    This is a great beginner-friendly dataset, perfect for engineering students or anyone who is interesting in getting some insights on Turbofan's predictive maintenance.

  17. Proton-Proton Collision process at LHC simulations

    • kaggle.com
    zip
    Updated May 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shirsh Mall (2023). Proton-Proton Collision process at LHC simulations [Dataset]. https://www.kaggle.com/shirshmall/lhc-events-ppee-ppmumu
    Explore at:
    zip(2410757321 bytes)Available download formats
    Dataset updated
    May 21, 2023
    Authors
    Shirsh Mall
    Description

    Proton-proton collision event generation is a key step in simulating high-energy physics experiments at particle colliders, such as the Large Hadron Collider (LHC) at CERN. Here's a brief overview of the process using the software tools you mentioned:

    MadGraph: This software package simulates high-energy particle collisions and generates event samples. It uses Feynman diagrams to calculate the cross sections for various collision processes and can generate events with user-specified cuts and kinematic constraints. MadGraph also has an interface to Pythia for showering and hadronization of the generated events.

    Pythia: This is a particle physics event generator designed for the simulation of high-energy collisions. It simulates the parton showering and hadronization processes that occur after the hard scatter, using various models and parameters. Pythia also includes a detailed simulation of the underlying event, which accounts for the multiple parton interactions that occur in a proton-proton collision.

    HEPMC2: This is a file format for storing Monte Carlo event samples in high-energy physics. It is used as an input/output format by many event generators and analysis tools, including Pythia and Delphes. HEPMC2 files contain information on the particles produced in a collision event, their kinematics, and the interactions that led to their production.

    Delphes: This is a software framework for simulating the response of a particle detector to high-energy collisions. It takes event samples generated by Pythia or other event generators and simulates the particle interactions in a detector, including effects such as energy deposition, tracking, and calorimetry. Delphes produces output files in a ROOT format that can be analyzed using ROOT.

    ROOT: This is a data analysis framework widely used in high-energy physics. It includes tools for manipulating and analyzing large datasets, including the simulation and reconstruction output files produced by event generators and detector simulation tools. ROOT also includes a powerful visualization tool for generating 2D and 3D plots of particle collisions and detector interactions.

    The proton-proton collision event generation process involves using MadGraph to generate hard scattering events, which are then passed to Pythia for parton showering and hadronization. The resulting events are stored in HEPMC2 format and then simulated in a detector using Delphes. The final output is a ROOT file that can be analyzed using various analysis tools within the ROOT framework.

    High-level overview of the steps involved in generating event data for high-energy physics experiments using Monte Carlo simulation:

    1. Specify the collision process: The first step is to specify the collision process that you want to simulate. This typically involves specifying the particles that will collide (e.g., protons, electrons, or other particles), the energy of the collision, and any relevant initial or final states.

    2. Generate hard scattering events: Once the collision process is specified, you can use a software package like MadGraph to generate hard scattering events. MadGraph uses Feynman diagrams to calculate the cross sections for various collision processes and generates events based on user-specified cuts and kinematic constraints.

    3. Apply parton showering and hadronization: After generating the hard scattering events, you can use a software package like Pythia to simulate the parton showering and hadronization processes that occur after the hard scatter. This involves simulating the fragmentation of the partons produced in the hard scattering event into hadrons and the subsequent showering of additional partons produced in the hadronization process.

    4. Simulate the detector response: Once you have generated a set of simulated events, you can use a software package like Delphes to simulate the response of the particle detector to the collisions. This involves simulating the interactions of particles with the detector material and the detector's response to the energy deposited by the particles.

    5. Analyze the data: Once you have generated and simulated the events, you can analyze the resulting data to extract information about the properties of the particles produced in the collision. This typically involves applying various cuts and selection criteria to the data and using statistical techniques to estimate the background and systematic uncertainties in the analysis.

    6. Compare with experimental data: Finally, you can compare the results of your Monte Carlo simulation with experimental data to test the accuracy of the simulation and to gain insights into the underlying physics of the collision process.

  18. WTA Match Statistics

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). WTA Match Statistics [Dataset]. https://www.kaggle.com/datasets/thedevastator/wta-match-statistics-from-2016-to-2018
    Explore at:
    zip(1065575 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    The Devastator
    Description

    WTA Match Statistics

    Player Performance and Tournament Details

    By Jon Loyens [source]

    About this dataset

    The Women's Tennis Association (WTA) is an international tennis organization that governs the Women’s professional tennis tours, leagues and events around the world. For over a decade, Jeff Sackmann has carefully collected WTA match results data that include valuable information about individual players, scores, and statistics. With this dataset of nearly 25000 matches you can analyze up-and-coming players in women's tennis, compare and contrast their performances across tournaments of different levels around the world, draw correlations between performance metrics such as service points won or break points faced to victory outcomes, or gain insight into game trends within all WTA nations. At Tennis Abstract we believe in open and fair access to sports statistics for everyone so this data set is licensed for non-commercial use only under a Creative Commons Attribution license. Now go ahead and explore!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Welcome to our guide on the WTA Match Statistics dataset! This guide will help you understand the different columns and what kind of information is available in this dataset.

    To start, let’s review some information about the table that contains WTA's match results. This table includes various match stats such as: tournament name, surface type, draw size, tournament level, date played and winner/loser entries. It also includes important player information such as a player's name, hand preference, height, country code and age. Additionally you'll find more detailed data related to individual matches including the score of each match as well as interesting details such as the number of double faults committed or aces made by each player etc. You can even filter results down by selecting specific rounds or best-of sets e.g., 5 sets or 3 sets - depending on your research requirements!

    The first column listed under “tourney_name” stands for Tournament Name which indicates which tournament your particular row pertains to for example ATP 1000 Monte Carlo etc... The next important bit of data is “surface” and this denotes the court surface that was used in that particular game; i..e; Clay , Hard Court .etc . Moving ahead we have draw_size column shows us how many players were involved in that specific game; typically ranging from 128 (Grand Slams) down to 16 (quarter finals). Similarly with tourney _level you can see whether it was an ATP 500 / Masters Series / ITF tour event ; all denoted by M(Masters Series),G(Grand Slams) ,A (ATP 1000 Series); I(ITF tour events). Next up are relevant dates indicated under tourney_date' listing all dates going forward until finals related which denote a Tournament Finals if any occur at month end etc;and denote loser winner ranks accordingly within correspoding columns listed. In cascading further , “Winner Seed” column stands for seed refers to ranking points gained by players based on their performance at ATP tournaments throughout their career.. Someone with higher seeds stand would mean they were respected competitors within this field while lower seed would not show equivalently strong ability nor accomplishments up until then However reliance solely on seeding won't give business true insights because lower seeded players go far sometimes abd nulify expectations , so let us move forward

    Research Ideas

    • Analyzing strategies employed by the top players in women’s tennis to better understand their approach to winning games and tournaments.
    • Examining competition levels in different regional and international tennis circuits, or among players of different ages, genders, and skill levels.
    • Using machine learning techniques to make predictions about upcoming WTA matches based on the historical trends present in this dataset

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: wta_matches_2016.csv | Column name | Description | |:-----------------------|:--------------------------------------------------------| | tourney_name | Name of the tournament. (String) | | surface | Type of court surface the match was played on. (String) | | draw_size | Number of players in the tournament. (Integer) | | tourney_level | Level o...

  19. HIGGS UCI DATASET

    • kaggle.com
    zip
    Updated Aug 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erik Biswas (2021). HIGGS UCI DATASET [Dataset]. https://www.kaggle.com/datasets/erikbiswas/higgs-uci-dataset/code
    Explore at:
    zip(2902133066 bytes)Available download formats
    Dataset updated
    Aug 30, 2021
    Authors
    Erik Biswas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Abstract:

    This is a classification problem to distinguish between a signal process that produces Higgs bosons and a background process that does not.

    Source:

    Daniel Whiteson daniel '@' uci.edu, Assistant Professor, Physics & Astronomy, Univ. of California Irvine

    Data Set Information:

    The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The last seven features are functions of the first 21 features; these are high-level features derived by physicists to help discriminate between the two classes. There is an interest in using deep learning methods to obviate the need for physicists to manually develop such features. Benchmark results using Bayesian Decision Trees from a standard physics package and 5-layer neural networks are presented in the original paper. The last 500,000 examples are used as a test set.

    Attribute Information:

    The first column is the class label (1 for signal, 0 for background), followed by the 28 features (21 low-level features then 7 high-level features): lepton pT, lepton eta, lepton phi, missing energy magnitude, missing energy phi, jet 1 pt, jet 1 eta, jet 1 phi, jet 1 b-tag, jet 2 pt, jet 2 eta, jet 2 phi, jet 2 b-tag, jet 3 pt, jet 3 eta, jet 3 phi, jet 3 b-tag, jet 4 pt, jet 4 eta, jet 4 phi, jet 4 b-tag, m_jj, m_jjj, m_lv, m_jlv, m_bb, m_wbb, m_wwbb.

    For more detailed information about each feature see the original paper.

    Relevant Papers:

    Baldi, P., P. Sadowski, and D. Whiteson. “Searching for Exotic Particles in High-energy Physics with Deep Learning.” Nature Communications 5 (July 2, 2014).

  20. The Economics of Happiness (TEH)

    • kaggle.com
    zip
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NikBearBrown (2022). The Economics of Happiness (TEH) [Dataset]. https://www.kaggle.com/datasets/nikbearbrown/the-economics-of-happiness-teh
    Explore at:
    zip(187739 bytes)Available download formats
    Dataset updated
    Nov 22, 2022
    Authors
    NikBearBrown
    Description

    This data is just the World Happiness Report https://www.kaggle.com/datasets/unsdsn/world-happiness data with some missing and zero values that were used as missing values were cleaned up using MICE-based imputation (Azur et al, 2011)

    We plan on making a much larger dataset with hundreds of independent variables but the main purpose of The Economics of Happiness (TEH) project is to explore using SHAP (SHapley Additive exPlanations) for economics and finance.

    Credit allocation is a fundamental problem in economics. If a country invests in better roads, clean water, education, or tax breaks it wants to estimate how desirable outcomes like driving time, life expectancy, reading levels, or GDP. This problem is further complicated by the fact the effect of one factor is often affected by other factors and one wants to disentangle how a change in one factor will affect a result given the state of other factors. Fortunately, the mathematician Lloyd Shapley in his Ph.D. thesis and post-doctoral work introduced the Shapley value and the core solution in game theory. Shapley defined game theory as "a mathematical study of conflict and cooperation." Lloyd Shapley won the Nobel Memorial Prize in Economic Sciences for his work on Shapley values in 2012.

    Recently the use of Shapley values has had a major resurgence in machine learning, particularly in the field model interpretability. Recent work has created the SHAP python and R libraries. SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. (Lundberg and Lee, 2017). SHAP is based on the game theoretically optimal Shapley values (Shapley, 1953). The approach connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (Strumbelj et al, 2014) and (Lipovetsky et al 2001).

    What is the connection between Shapley values to machine learning predictions and interpretability? The “game” is the prediction task for a single instance of the dataset. The “gain” is the actual prediction for this instance minus the average prediction for all instances. The “players” are the feature values of the instance that collaborate to receive the gain (= predict a certain value). In our case, we build an accurate machine learning model to predict a “happiness score” and perturb that model for a given to estimate how a change in a value affects the prediction. One does this over all of the rows and predictors so one disadvantage of The Shapley value is the average of all the marginal contributions to all possible combinations. The computation time increases exponentially with the number of features. To keep computational time reasonable one uses Monte Carlo sampling to compute contributions for only a few samples of the possible combinations that generate estimates of the Shapley values.

    References

    Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res. 2011 Mar;20(1):40-9. doi: 10.1002/mpr.329. PMID: 21499542; PMCID: PMC3074241

    Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001.

    Geurts, Ernst., and Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

    Lipovetsky, Stan, and Michael Conklin. "Analysis of regression in game theory approach." Applied Stochastic Models in Business and Industry 17.4 (2001): 319-330.

    Lundberg, Scott M., and Su-In Lee. “A unified approach to interpreting model predictions.” Advances in Neural Information Processing Systems (2017).

    Shapley, Lloyd S . “A value for n-person games”. In: Contributions to the Theory of Games 2.28 (1953), pp. 307–317.

    Strumbelj, Erik, and Igor Kononenko. "Explaining prediction models and individual predictions with feature contributions." Knowledge and information systems 41.3 (2014): 647-665.

    World Happiness Report https://www.kaggle.com/datasets/unsdsn/world-happiness

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
MD TANVIR SARDARR (2025). Monte Carlo Simulation dataset [Dataset]. https://www.kaggle.com/datasets/mdtanvirsardarr/monte-carlo-simulation-dataset
Organization logo

Monte Carlo Simulation dataset

Explore at:
zip(303090 bytes)Available download formats
Dataset updated
Sep 10, 2025
Authors
MD TANVIR SARDARR
Description

Dataset

This dataset was created by MD TANVIR SARDARR

Released under Other (specified in description)

Contents

Search
Clear search
Close search
Google apps
Main menu