59 datasets found
  1. Z

    Appendix for "Don't DIY: Automatically transform legacy Python code to...

    • data.niaid.nih.gov
    Updated Aug 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Balázs Rózsa; Gábor Antal; Rudolf Ferenc (2022). Appendix for "Don't DIY: Automatically transform legacy Python code to support structural pattern matching" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6812499
    Explore at:
    Dataset updated
    Aug 2, 2022
    Dataset provided by
    University of Szeged
    Authors
    Balázs Rózsa; Gábor Antal; Rudolf Ferenc
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the appendix for paper "Don’t DIY: Automatically transform legacy Python code to support structural pattern matching" presented in SCAM 2022.

    Abstract

    As data becomes more and more complex as technology evolves, the need to support more complex data types in programming languages has grown. However, without proper storage and manipulation capabilities, handling such data can result in hard-to-read, difficult-to-maintain code. Therefore, programming languages continuously evolve to provide more and more ways to handle complex data. Python 3.10 introduced structural pattern matching, which serves this exact purpose: we can split complex data into relevant parts by examining its structure, and store them for later processing. Previously, we could only use the traditional conditional branching, which could have led to long chains of nested conditionals. Maintaining such code fragments can be cumbersome. In this paper, we present a complete framework to solve the aforementioned problem. Our software is capable of examining Python source code and transforming relevant conditionals into structural pattern matching. Moreover, it is able to handle nested conditionals and it is also easily extensible, thus the set of possible transformations can be easily increased.

  2. PRLF match pair overlap with Link Plus match pairs.

    • plos.figshare.com
    xls
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Prindle; Himal Suthar; Emily Putnam-Hornstein (2023). PRLF match pair overlap with Link Plus match pairs. [Dataset]. http://doi.org/10.1371/journal.pone.0291581.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 20, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    John Prindle; Himal Suthar; Emily Putnam-Hornstein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PRLF match pair overlap with Link Plus match pairs.

  3. League of Legends Match Dataset (2024)

    • kaggle.com
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jakub Krasuski (2025). League of Legends Match Dataset (2024) [Dataset]. https://www.kaggle.com/datasets/jakubkrasuski/league-of-legends-match-dataset-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jakub Krasuski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    License

    CC BY 4.0 (Creative Commons Attribution 4.0 International)
    This license allows sharing, modification, and use of the dataset as long as proper attribution is given.

    Description

    This dataset provides detailed information about League of Legends matches, collected in 2025. It covers various aspects of the game, including player statistics, team performance, and match metadata. This dataset is ideal for statistical analysis, machine learning projects, and esports research.

    Data Collection & Replication

    The data was obtained using a custom Python script that queries the official Riot Games API. This script automates data retrieval by starting from a single player’s PUUID and expanding through match participants. If you’d like to replicate or extend this dataset, the script is freely available here:
    https://github.com/Blizzeq/league-of-legends-data-collector

    Key Features

    • Match Details: Game duration, mode (e.g., Classic), game version, and map identifiers.
    • Player Statistics: Includes kills, deaths, assists, damage dealt, gold earned, items purchased, and final game stats (e.g., magic resist, movement speed, mana regeneration).
    • Team Data: Insights into team performance and match outcomes.

    Dataset Overview

    • 94 attributes capturing comprehensive match and player data.
    • Key columns: game_id, game_start_utc, game_duration, queue_id, participant_id, kills, deaths, assists, final_damageDealt, final_goldEarned, and more.
    • Platform: EUN1 (Europe Nordic & East server).

    Use Cases

    • Performance analysis for individual players and teams.
    • Predictive modeling and machine learning applications in gaming.
    • Understanding player behavior and strategy in competitive gaming.

    Notes

    Ensure that the usage of this dataset complies with Riot Games' terms of service and privacy policies. If you plan to collect your own data using the script, remember to manage API rate limits and adhere to Riot’s policies.

  4. Data and code for the paper Detecting Road Network Errors from Trajectory...

    • figshare.com
    zip
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Can Yang (2024). Data and code for the paper Detecting Road Network Errors from Trajectory Data with Partial Map Matching and Bidirectional Recurrent Neural Network Model [Dataset]. http://doi.org/10.6084/m9.figshare.24056658.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Can Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains data and codes for the manuscript Detecting Road Network Errors from Trajectory Data with Partial Map Matching and Bidirectional Recurrent Neural Network Model submitted to IJGIS. Requirements (tested with Python 3.8.8):- Numpy >= 1.22.4- PyTorch >= 2.0.0- Sklearn >= 0.24.1 - Shapely >= 1.8.2- rtree >= 1.0.0- Networkx >= 2.8.4- Geopandas >= 0.10.2### Code structure - lib: codes for partial map matching, context feature extraction and BiRNN model- data: a sample of training and test dataset in npz format, which contains "xy" and "label" attribute storing the trajectory geometryand manually labels. It also contains a shapefile of road network. ### Run the program with two steps bash# 1. Generate context featurespython feature_extract.py# 2. Training a BiRNN model for classificationpython train_birnn.pyFor demonstration, data folder contains a small trajectory dataset with labels and road network downloaded from OSM. Accessing the complete dataset can be applied at https://outreach.didichuxing.com/.

  5. c

    Research Data Supporting "Horsetail Matching: A Flexible Approach to...

    • repository.cam.ac.uk
    txt, zip
    Updated May 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cook, LW; Jarrett, JP (2017). Research Data Supporting "Horsetail Matching: A Flexible Approach to Optimization Under Uncertainty" [Dataset]. http://doi.org/10.17863/CAM.9695
    Explore at:
    zip(221643 bytes), txt(3081 bytes)Available download formats
    Dataset updated
    May 17, 2017
    Dataset provided by
    University of Cambridge
    Apollo
    Authors
    Cook, LW; Jarrett, JP
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This data comprises python source code, along with scripts that illustrate how to use this source code to recreate results in the publication. Further details are given in the README.txt file.

  6. LoL Match History & Summoner Data – 78k Matches

    • kaggle.com
    zip
    Updated Oct 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nsmall (2025). LoL Match History & Summoner Data – 78k Matches [Dataset]. https://www.kaggle.com/datasets/nathansmallcalder/lol-match-history-and-summoner-data-80k-matches
    Explore at:
    zip(3770315 bytes)Available download formats
    Dataset updated
    Oct 20, 2025
    Authors
    nsmall
    Description

    League of Legends Relational Database for Match Prediction

    Context

    This dataset contains detailed match and player data from League of Legends, one of the most popular multiplayer online battle arena (MOBA) games in the world. It includes 35,000 matches and contains 78,000 summoner statistics, capturing a wide range of in-game statistics, such as champion selection, player performance metrics, match outcomes, and more.

    The dataset is structured to support a variety of analyses, including:

    • Predicting match outcomes based on team compositions and player stats
    • Evaluating player performance and progression over time
    • Exploring trends in champion popularity and win rates
    • Building machine learning models for esports analytics

    Whether you are interested in competitive gaming, data science, or predictive modeling, this dataset provides a rich source of structured data to explore the dynamics of League of Legends at scale.

    Data Schema and Dictionary

    Data was collected from Riot Games API using Python script(link) from Patch 25.19

    The datase consists of 7 csv files:

    • MatchStatsTbl - Match Stats given a summonerID and MatchID.Contains K/D/A, Items, Runes,Ward Score, Summoner Spells, Baron Kills, Dragon Kills, Lane, DmgTaken/Dealt, Total Gold, cs,Mastery Points and Win/Loss
    • TeamMatchStatsTbl - Containes Red/Blue Champions,Red/Blue BaronKills,Blue/Red Turret Kills, Red/Blue Kills, RiftHearaldKills and Win/loss
    • MatchTbl- Contains MatchID,Rank,Match Duration and MatchType.
    • RankTbl - Contains RankID and RankName
    • ChampionTbl- Contains ChampionID and ChampionName
    • ItemTbl - Contains ItemID and ItemName
    • SummonerTbl - Contains SummonerID and SummonerName
    • SummonerMatchTbl - Links MatchID,SummonerID and ChampionID

    Database Features

    • This dataset contains 35,422 League of Legends matches and 78,863 summoner statistics from those games.
    • Uses Data from over 2,381 summoners.
    • Consists of data only from Europe West(EUW)
    • Data is sampled from Unranked to Challenger tiers.

    Database Setup

    -MySQL Database using Linux -Database Schema Script can be found here. (Works with the gtihub project to collect your own data)

    Limitations

    The Riot API only provides the "BOTTOM" lane for bot-lane players. During Data collection, roles were inferred by combining chapions that often played support with CS metrics to distinguish ADC vs Support — especially for ambiguous picks like Senna or off-meta choices.

    Acknowledgements/Privacy

    Data is collected using the official Riot Games API. We thank Riot Games for providing the data and tools that make this project possible. This dataset is not endorsed or certified by Riot Games. No personal or identifiable player data (e.g., Summoner Names, Summoner IDs, or PUUIDs) are included. The SummonerTbl has been intentionally excluded from this public release.

    Github

    The Python scripts used for data collection, as well as various scripts I developed for API calls, database management, and initial data analytics, can be found on GitHub

  7. d

    Groundwater flow and SNTEMP stream temperature model build and history...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Groundwater flow and SNTEMP stream temperature model build and history matching workflows [Dataset]. https://catalog.data.gov/dataset/groundwater-flow-and-sntemp-stream-temperature-model-build-and-history-matching-workflows
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    MODFLOW6 and SNTEMP models were developed to simulate groundwater flows and instream temperatures in Beaver Creek, Alaska from 2019-2023 using python scripts to create a reproducible workflow to process input datasets into model files. This data release contains the scripts used to build the SNTEMP and MODFLOW models, process model output to compare to field observations, and develop and run the PEST++ workflow for history matching. These workflows are described in the readme.md files in this archive and are used to implement the modeling decisions described in the associated report, "Simulating present and future Groundwater/Surface-water interactions and stream temperatures in Beaver Creek, Kenai Peninsula, Alaska".

  8. r

    Estimating separable matching models (replication data)

    • resodate.org
    Updated Oct 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alfred Galichon; Bernard Salanié (2025). Estimating separable matching models (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9lc3RpbWF0aW5nLXNlcGFyYWJsZS1tYXRjaGluZy1tb2RlbHMtcmVwbGljYXRpb24tZGF0YQ==
    Explore at:
    Dataset updated
    Oct 14, 2025
    Dataset provided by
    ZBW Journal Data Archive
    ZBW
    Journal of Applied Econometrics
    Authors
    Alfred Galichon; Bernard Salanié
    Description

    Code for Galichon-Salanie's "Estimating Separable Matching Models"

    Usage

    Create a virtual environment, e.g. with python3 -m venv env. Activate it with source env/bin/activate.

    Install the requirements with pip install -r requirements.txt. Among the packages it downloads are two created by Bernard Salanié: bs_python_utils and cupid_matching. The former is just a set of utility programs. The latter contains code to solve for the stable matching and estimate the parameters of separable matching models with MDE and Poisson-GLM. The code in this folder relies heavily on these two packages, which are documented on Salanie's website: bs_python_utils and cupid_matching.

    Choose the parameters in config.py and run the code with python main.py. The program will create a folder Results and save a plot and a pickled file with the estimates for the sample sample_size defined in config.py.

    Each simulation sample (that is, n_sim=1) takes a few seconds (4 seconds on a Mac M2 Max 2023) to estimate the Choo and Siow model by the two methods in the paper --- minimum distance and Poisson GLM. The code is parallelized over samples, unless you choose use_multiprocessing=False. By default, it uses all except 2 of your CPUs.

    Structure of the code

    The master program main.py reads the parameters in config.py.

    1. If do_create_samples is True it uses create_samples.py to read the Choo and Siow datasets in the data_dir directory and to create two samples in samples_dir (both directories are specified in config,.py). The two samples correspond to the small and large samples described in Section 6 of the paper. The files created have the marriage patterns by age (*muxy.txt), the margins (*nx.txt and *my.txt), and the variance-covariance matrix of these estimates (*varmus.pkl).
    2. It calls read_data.py, which reads the sample defined by sample_size in config.py and prepares it for the simulation. read_data.py also has code to add a small positive number (see zero_guard in config.py) for zero cells; this is used in the MDE simulation.
    3. specification.py creates the basis functions according to the specification given by degrees in config.py.
    4. Then main.py runs the simulation via simulate.py as defined by config.py.

    Configuration

    All parameters of the simulation are in config.py:

    • do_create_samples, do_simuls, plot_simuls, do_simuls_mde, do_simuls_poisson define what the program does;
    • n_sim is the number of simulated samples;
    • use_multiprocessing and nb_cpus define the parallelization;
    • zero_guard is the small positive number added to zero cells in the sample for MDE estimation;
    • degrees is a list of tuples that define the degrees of the polynomials for the basis functions; e.g. an (a,b) tuple means that the basis function is $L_a(x)L_b(y)$, where $x$ and $y$ are the ages of the partners and $L_a$ is the Legendre polynomial of degree $a$. In addition to these terms, the basis functions also include the constant term; $\mathbf{1}(x>y)$, and a term proportional to $\max(x-y,0)$. The function generate_bases in specification.py creates the basis functions.

    Questions

    Please direct all questions to Bernard Salanie.

  9. c

    Research Data Supporting "Extending Horsetail Matching for Optimization...

    • repository.cam.ac.uk
    Updated Aug 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cook, LW (2017). Research Data Supporting "Extending Horsetail Matching for Optimization Under Probabilistic, Interval and Mixed Uncertainties" [Dataset]. http://doi.org/10.17863/CAM.12577
    Explore at:
    Dataset updated
    Aug 14, 2017
    Dataset provided by
    University of Cambridge
    Apollo
    Authors
    Cook, LW
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data comprises a python module that implements the horsetail matching method presented in this publication, and can thus be used to recreate the results in the paper. This python module is also available at: http://www-edc.eng.cam.ac.uk/aerotools/horsetailmatching/, but is archived here.

  10. Features of probabilistic linkage solutions available for record linkage...

    • plos.figshare.com
    xls
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Prindle; Himal Suthar; Emily Putnam-Hornstein (2023). Features of probabilistic linkage solutions available for record linkage applications. [Dataset]. http://doi.org/10.1371/journal.pone.0291581.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 20, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    John Prindle; Himal Suthar; Emily Putnam-Hornstein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Features of probabilistic linkage solutions available for record linkage applications.

  11. Data from: EyeFi: Fast Human Identification Through Vision and WiFi-based...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon; Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon (2022). EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching [Dataset]. http://doi.org/10.5281/zenodo.7396485
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 5, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon; Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    EyeFi Dataset

    This dataset is collected as a part of the EyeFi project at Bosch Research and Technology Center, Pittsburgh, PA, USA. The dataset contains WiFi CSI values of human motion trajectories along with ground truth location information captured through a camera. This dataset is used in the following paper "EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching" that is published in the IEEE International Conference on Distributed Computing in Sensor Systems 2020 (DCOSS '20). We also published a dataset paper titled as "Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones" in Data: Acquisition to Analysis 2020 (DATA '20) workshop describing details of data collection. Please check it out for more information on the dataset.

    Data Collection Setup

    In our experiments, we used Intel 5300 WiFi Network Interface Card (NIC) installed in an Intel NUC and Linux CSI tools [1] to extract the WiFi CSI packets. The (x,y) coordinates of the subjects are collected from Bosch Flexidome IP Panoramic 7000 panoramic camera mounted on the ceiling and Angle of Arrivals (AoAs) are derived from the (x,y) coordinates. Both the WiFi card and camera are located at the same origin coordinates but at different height, the camera is location around 2.85m from the ground and WiFi antennas are around 1.12m above the ground.

    The data collection environment consists of two areas, first one is a rectangular space measured 11.8m x 8.74m, and the second space is an irregularly shaped kitchen area with maximum distances of 19.74m and 14.24m between two walls. The kitchen also has numerous obstacles and different materials that pose different RF reflection characteristics including strong reflectors such as metal refrigerators and dishwashers.

    To collect the WiFi data, we used a Google Pixel 2 XL smartphone as an access point and connect the Intel 5300 NIC to it for WiFi communication. The transmission rate is about 20-25 packets per second. The same WiFi card and phone are used in both lab and kitchen area.

    List of Files
    Here is a list of files included in the dataset:

    |- 1_person
      |- 1_person_1.h5
      |- 1_person_2.h5
    |- 2_people
      |- 2_people_1.h5
      |- 2_people_2.h5
      |- 2_people_3.h5
    |- 3_people
      |- 3_people_1.h5
      |- 3_people_2.h5
      |- 3_people_3.h5
    |- 5_people
      |- 5_people_1.h5
      |- 5_people_2.h5
      |- 5_people_3.h5
      |- 5_people_4.h5
    |- 10_people
      |- 10_people_1.h5
      |- 10_people_2.h5
      |- 10_people_3.h5
    |- Kitchen
      |- 1_person
        |- kitchen_1_person_1.h5
        |- kitchen_1_person_2.h5
        |- kitchen_1_person_3.h5
      |- 3_people
        |- kitchen_3_people_1.h5
    |- training
      |- shuffuled_train.h5
      |- shuffuled_valid.h5
      |- shuffuled_test.h5
    View-Dataset-Example.ipynb
    README.md
    
    

    In this dataset, folder `1_person/` , `2_people/` , `3_people/` , `5_people/`, and `10_people/` contains data collected from the lab area whereas `Kitchen/` folder contains data collected from the kitchen area. To see how the each file is structured, please see below in section Access the data.

    The training folder contains the training dataset we used to train the neural network discussed in our paper. They are generated by shuffling all the data from `1_person/` folder collected in the lab area (`1_person_1.h5` and `1_person_2.h5`).

    Why multiple files in one folder?

    Each folder contains multiple files. For example, `1_person` folder has two files: `1_person_1.h5` and `1_person_2.h5`. Files in the same folder always have the same number of human subjects present simultaneously in the scene. However, the person who is holding the phone can be different. Also, the data could be collected through different days and/or the data collection system needs to be rebooted due to stability issue. As result, we provided different files (like `1_person_1.h5`, `1_person_2.h5`) to distinguish different person who is holding the phone and possible system reboot that introduces different phase offsets (see below) in the system.

    Special note:

    For `1_person_1.h5`, this file is generated by the same person who is holding the phone, and `1_person_2.h5` contains different people holding the phone but only one person is present in the area at a time. Boths files are collected in different days as well.


    Access the data
    To access the data, hdf5 library is needed to open the dataset. There are free HDF5 viewer available on the official website: https://www.hdfgroup.org/downloads/hdfview/. We also provide an example Python code View-Dataset-Example.ipynb to demonstrate how to access the data.

    Each file is structured as (except the files under *"training/"* folder):

    |- csi_imag
    |- csi_real
    |- nPaths_1
      |- offset_00
        |- spotfi_aoa
      |- offset_11
        |- spotfi_aoa
      |- offset_12
        |- spotfi_aoa
      |- offset_21
        |- spotfi_aoa
      |- offset_22
        |- spotfi_aoa
    |- nPaths_2
      |- offset_00
        |- spotfi_aoa
      |- offset_11
        |- spotfi_aoa
      |- offset_12
        |- spotfi_aoa
      |- offset_21
        |- spotfi_aoa
      |- offset_22
        |- spotfi_aoa
    |- nPaths_3
      |- offset_00
        |- spotfi_aoa
      |- offset_11
        |- spotfi_aoa
      |- offset_12
        |- spotfi_aoa
      |- offset_21
        |- spotfi_aoa
      |- offset_22
        |- spotfi_aoa
    |- nPaths_4
      |- offset_00
        |- spotfi_aoa
      |- offset_11
        |- spotfi_aoa
      |- offset_12
        |- spotfi_aoa
      |- offset_21
        |- spotfi_aoa
      |- offset_22
        |- spotfi_aoa
    |- num_obj
    |- obj_0
      |- cam_aoa
      |- coordinates
    |- obj_1
      |- cam_aoa
      |- coordinates
    ...
    |- timestamp
    

    The `csi_real` and `csi_imag` are the real and imagenary part of the CSI measurements. The order of antennas and subcarriers are as follows for the 90 `csi_real` and `csi_imag` values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. `nPaths_x` group are SpotFi [2] calculated WiFi Angle of Arrival (AoA) with `x` number of multiple paths specified during calculation. Under the `nPath_x` group are `offset_xx` subgroup where `xx` stands for the offset combination used to correct the phase offset during the SpotFi calculation. We measured the offsets as:

    |Antennas | Offset 1 (rad) | Offset 2 (rad) |
    |:-------:|:---------------:|:-------------:|
    | 1 & 2 |   1.1899   |   -2.0071
    | 1 & 3 |   1.3883   |   -1.8129
    
    

    The measurement is based on the work [3], where the authors state there are two possible offsets between two antennas which we measured by booting the device multiple times. The combination of the offset are used for the `offset_xx` naming. For example, `offset_12` is offset 1 between antenna 1 & 2 and offset 2 between antenna 1 & 3 are used in the SpotFi calculation.

    The `num_obj` field is used to store the number of human subjects present in the scene. The `obj_0` is always the subject who is holding the phone. In each file, there are `num_obj` of `obj_x`. For each `obj_x1`, we have the `coordinates` reported from the camera and `cam_aoa`, which is estimated AoA from the camera reported coordinates. The (x,y) coordinates and AoA listed here are chronologically ordered (except the files in the `training` folder) . It reflects the way the person carried the phone moved in the space (for `obj_0`) and everyone else walked (for other `obj_y`, where `y` > 0).

    The `timestamp` is provided here for time reference for each WiFi packets.

    To access the data (Python):

    import h5py
    
    data = h5py.File('3_people_3.h5','r')
    
    csi_real = data['csi_real'][()]
    csi_imag = data['csi_imag'][()]
    
    cam_aoa = data['obj_0/cam_aoa'][()] 
    cam_loc = data['obj_0/coordinates'][()] 
    

    For file inside `training/` folder:

    Files inside training folder has a different data structure:

    
    |- nPath-1
      |- aoa
      |- csi_imag
      |- csi_real
      |- spotfi
    |- nPath-2
      |- aoa
      |- csi_imag
      |- csi_real
      |- spotfi
    |- nPath-3
      |- aoa
      |- csi_imag
      |- csi_real
      |- spotfi
    |- nPath-4
      |- aoa
      |- csi_imag
      |- csi_real
      |- spotfi
    


    The group `nPath-x` is the number of multiple path specified during the SpotFi calculation. `aoa` is the camera generated angle of arrival (AoA) (can be considered as ground truth), `csi_image` and `csi_real` is the imaginary and real component of the CSI value. `spotfi` is the SpotFi calculated AoA values. The SpotFi values are chosen based on the lowest median and mean error from across `1_person_1.h5` and `1_person_2.h5`. All the rows under the same `nPath-x` group are aligned (i.e., first row of `aoa` corresponds to the first row of `csi_imag`, `csi_real`, and `spotfi`. There is no timestamp recorded and the sequence of the data is not chronological as they are randomly shuffled from the `1_person_1.h5` and `1_person_2.h5` files.

    Citation
    If you use the dataset, please cite our paper:

    @inproceedings{eyefi2020,
     title={EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching},
     author={Fang, Shiwei and Islam, Tamzeed and Munir, Sirajum and Nirjon, Shahriar},
     booktitle={2020 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)},
     year={2020},

  12. R

    Crosscut matching

    • entrepot.recherche.data.gouv.fr
    Updated Nov 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugues FRANCOIS; Hugues FRANCOIS (2021). Crosscut matching [Dataset]. http://doi.org/10.15454/COS00O
    Explore at:
    Dataset updated
    Nov 23, 2021
    Dataset provided by
    Recherche Data Gouv
    Authors
    Hugues FRANCOIS; Hugues FRANCOIS
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Description

    Crosscut est un ensemble de scripts python et R pour faire l'appariement entre les simulations ADAMONT de la neige adaptées aux pratiques de gestion du manteau neigeux en station de sports d'hiver. Elle a été développée dans le cadre des travaux partagés entre l'INRAE et Météo France et consolidée à l'issue des travaux de thèse de P. Spandre et elle est utilisée dans le cadre du service ClimSnow (TM). -- Crosscut is a set of python and R scripts to match ADAMONT snow simulations adapted to snowpack management practices in ski areas. It was developed in the framework of shared work between INRAE and Météo France and consolidated at the end of P. Spandre's PhD work and is used in the ClimSnow (TM) service.

  13. League of Legends Summoners and Match Data

    • kaggle.com
    zip
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebeca Chinicz (2023). League of Legends Summoners and Match Data [Dataset]. https://www.kaggle.com/datasets/chiniczr/league-of-legends-summoners-and-match-data/suggestions
    Explore at:
    zip(2892577848 bytes)Available download formats
    Dataset updated
    Dec 19, 2023
    Authors
    Rebeca Chinicz
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    I wanted to create something similar to LoL Esports Win Probability, but for regular, SoloQ games, and see what could be learned from it. I uploaded here all the data I collected along the way.

    I went about collecting data from as many matches as I could, in an organized manner: for 3 major regions (EUW, NA and KR) get the first 50 players in each division (7x4+3 = 31) and then get each of their last 100 matches. Due to issues like no-longer-existing summoner names, arena games also classified as ranked, etc. the total of games is less than I originally expected, but still a generous 260,367.

    Since most of this data comes directly from the API, it's in JSON, but there is also a CSV file with the tabular data of the features I ended up extracting to train my simple model. I've posted a work-in-progress article explaining the project on Medium. Head on over there for more details: https://medium.com/@chiniczrebeca/practical-machine-learning-with-lol-a-simple-predictive-use-case-with-data-collection-learning-c2b6e621df66.

    And if you want to see the code I used, it's all on https://github.com/Intigram/DataCollection.

    Observation: on the CSV file, the value of winning_team is 0 when blue side won, and 1 when red side won.

    See below a summary of the files here and what they contain: | File | Description | | --- | --- | | summoner_names.json | The first 50 summoner names (display names) from each division in EUW, NA and KR | | {Short Region Name}_puuids.json | Player Universally Unique IDentifiers, the necessary argument to get match history | | {Long Region Name}_match_ids.json | List of match IDs from each region (what you get from querying by player, you then use these to get the match details) | | {Long Region Name}_matches_{#}.json | List of final match data from each region (not timeline) | | data_all.csv | Tabular data of the (normalized-*ish*) features I extracted to train my prediction model |

  14. f

    Fit statistics for scored XGBoost models with 50,000 rows per dataset.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Prindle; Himal Suthar; Emily Putnam-Hornstein (2023). Fit statistics for scored XGBoost models with 50,000 rows per dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0291581.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 20, 2023
    Dataset provided by
    PLOS ONE
    Authors
    John Prindle; Himal Suthar; Emily Putnam-Hornstein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fit statistics for scored XGBoost models with 50,000 rows per dataset.

  15. S

    The data and code of the article ''SNSAlib: a python library for analyzing...

    • scidb.cn
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    aiwenli; Jun-Lin Lu; Ying Fan; Xiao-Ke Xu (2025). The data and code of the article ''SNSAlib: a python library for analyzing signed network'' [Dataset]. http://doi.org/10.57760/sciencedb.j00113.00178
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    Science Data Bank
    Authors
    aiwenli; Jun-Lin Lu; Ying Fan; Xiao-Ke Xu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The data and code related to the article ''SNSAlib: a python library for analyzing signed network'' was published in the journal of Chinese Physics B. This project contains null model construction of signed networks and its statistic features. The whole project is divided into three parts, as follows: Part1: signed networks datasetsThis part involves ten empirical signed network datasets: SPP, GGS, Wiring, Sampson, Teams, Alpha, OTC, Wiki, Slashdot, and Epinions. The first five datasets are sourced from offline real-world social networks, and the latter five are obtained from online internet platforms. The processed data is stored as a triplet in a text file (.txt). Part2: null model construction of signed networksThis part is null model construction of undirected signed networks. It have seven different methods of null model construction of undirected signed networks: positive-edge randomized null model, negative-edge randomized null model, the positive-edge and negative-edge randomized null model, full-edge randomized null model, signed randomized null model, diminish community structure null model, and enhance community structure null model. Part3: statistic features of signed networksThis part is statistic features of signed model, which can describe the difference between the null model and the real networks, and discover the extraordinary characteristics of real networks. These statistic features are common neighbors, matching coefficient, excess average degree, clustering coefficient, embeddedness, FMF, FECS and DECDS.

  16. IPL 2024 Dataset

    • kaggle.com
    zip
    Updated Jun 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lokesh Gopal (2024). IPL 2024 Dataset [Dataset]. https://www.kaggle.com/datasets/lokeshmadiga/ipl-2024-dataset-ball-by-ball-match-wise/data
    Explore at:
    zip(122241 bytes)Available download formats
    Dataset updated
    Jun 27, 2024
    Authors
    Lokesh Gopal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    IPL 2024 Dataset from cricbuzz using webscraping with the help of python Pandas and Beautifulsoup. This dataset contains Schedule, Match by Match and Ball by Ball Data. For Code you can checkout my github

  17. s

    Data sets and coding scripts for research on sensory processing in ADHD and...

    • orda.shef.ac.uk
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vesko Varbanov; Paul Overton (2025). Data sets and coding scripts for research on sensory processing in ADHD and ASD [Dataset]. http://doi.org/10.15131/shef.data.30704810.v1
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    The University of Sheffield
    Authors
    Vesko Varbanov; Paul Overton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DescriptionThis repository contains all anonymised data and analysis files for a study examining whether clinical diagnosis—beyond self-reported trait severity—differentiates sensory processing profiles in adults with ADHD and ASD. The research tested visual orientation discrimination using a psychophysical two-alternative forced-choice (2AFC) task with vertical and oblique stimuli, comparing four propensity-matched groups (n = 38 per group): clinical ADHD, non-clinical ADHD, clinical ASD, and non-clinical ASD.Methodology and TechniquesParticipants completed validated self-report measures: the Adult ADHD Self-Report Scale (ASRS) and the Broad Autism Phenotype Questionnaire (BAPQ). Sensory processing was assessed via a method-of-constant-stimuli orientation discrimination task implemented in PsychoPy, using interleaved adaptive staircases following a one-up/three-down rule.Propensity score matching (1:1 nearest neighbour, no replacement, 0.20 SD caliper on logit-transformed probabilities) was used to match clinical and non-clinical groups on trait severity. All inferential analyses were performed on the matched samples using ANCOVAs controlling for age and gender. The repository includes raw and matched datasets, analysis outputs, and the full Python code used for the matching pipeline.Ethics and ApprovalAll procedures were approved by the University of Sheffield Department of Psychology Ethics Committee (Ref: 046476). Informed consent was obtained from all participants, and all data have been anonymised following institutional and GDPR requirements.ContentsThe repository includes:Questionnaire data (ASRS, BAPQ)Visual orientation discrimination thresholds (vertical and oblique)Demographic variables (age, gender)Clinical vs. non-clinical group labelsPropensity score matching files and reproducible Python codeJASP analysis files and outputsStudy documentation and methodological detailsThese data support the study’s finding that ADHD and ASD show distinct sensory signatures: clinical ADHD was associated with reduced oblique sensitivity, while clinical ASD showed enhanced vertical discrimination relative to matched non-clinical controls. The dataset enables full reproducibility of all analyses and supports further research on sensory processing in neurodevelopmental conditions.

  18. f

    Generalized linear modeling parameter estimates of birth characteristics...

    • plos.figshare.com
    • figshare.com
    xls
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Prindle; Himal Suthar; Emily Putnam-Hornstein (2023). Generalized linear modeling parameter estimates of birth characteristics predicting CPS referral by age 3. [Dataset]. http://doi.org/10.1371/journal.pone.0291581.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 20, 2023
    Dataset provided by
    PLOS ONE
    Authors
    John Prindle; Himal Suthar; Emily Putnam-Hornstein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generalized linear modeling parameter estimates of birth characteristics predicting CPS referral by age 3.

  19. Industry-Education Skills Matching Dataset

    • kaggle.com
    zip
    Updated Sep 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Python Developer (2025). Industry-Education Skills Matching Dataset [Dataset]. https://www.kaggle.com/datasets/programmer3/industry-education-skills-matching-dataset/data
    Explore at:
    zip(925395 bytes)Available download formats
    Dataset updated
    Sep 8, 2025
    Authors
    Python Developer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset has been curated to explore the dynamic alignment of industry skill demands with educational course offerings. It integrates 5000 rows of data representing both technical and non-technical skills, mapped against job categories and course types.

    Key Features:

    Skill Attributes (20+): Includes technical skills such as programming, data analysis, machine learning, cloud computing, web development, cybersecurity, DevOps, mobile development, IoT, and AI programming, as well as essential soft skills like communication, teamwork, problem-solving, project management, business analysis, research, and ethics.

    Job Category: Numerical representation of job domains to which skill demands are mapped.

    Course Type: Categorized labels (e.g., Tech, Design) indicating the course domain recommended for bridging identified skill gaps.

  20. Data from: Evaluation of the Applicability of AEOLUS Satellite Wind Products...

    • zenodo.org
    zip
    Updated Sep 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chanfang Shu; zongyu chen; Zhaoliang Zeng; Zhaoliang Zeng; wenhao li; wenhao li; C K Shum; C K Shum; Shengkai Zhang; Shengkai Zhang; li fei; Chanfang Shu; zongyu chen; li fei (2025). Evaluation of the Applicability of AEOLUS Satellite Wind Products in Antarctica [Dataset]. http://doi.org/10.5281/zenodo.15743887
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 12, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chanfang Shu; zongyu chen; Zhaoliang Zeng; Zhaoliang Zeng; wenhao li; wenhao li; C K Shum; C K Shum; Shengkai Zhang; Shengkai Zhang; li fei; Chanfang Shu; zongyu chen; li fei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 2025
    Area covered
    Antarctica
    Description
    # README
    ## Data and Code Description
    This package supports the paper **"Evaluation of the Applicability of AEOLUS Satellite Wind Products in Antarctica"**. It contains all data and code needed to reproduce every analysis and figure presented in the manuscript.
    ### Folder Structure
    - **data/**: Processed data files for analysis and plotting.
    - **code/**: Python scripts for data processing, analysis, and visualization.
    - **README.md**: This documentation file.
    ---
    ## System Requirements
    - **Operating System:** Windows 10 (or equivalent)
    - **Language & Dependencies:**
    - Python 3.9 or higher
    - See `requirements.txt` for a full list (e.g., `numpy`, `pandas`, `scipy`, `matplotlib`).
    ---
    ## Usage Instructions
    1. **Unpack** the archive to your working directory.
    2. **(Optional, recommended)** Create a Python virtual environment and install dependencies:
    ```bash
    python -m venv venv
    source venv/bin/activate # Linux/macOS
    venv\Scripts\activate # Windows
    pip install -r requirements.txt
    ```
    3. **Run the analysis scripts** in the `/code` directory **in the following order**:
    - **01_rs_binavg_to_aeolus.py**
    Perform vertical bin-averaging of radiosonde data to match Aeolus vertical resolution.
    - **02_era5_binavg_to_aeolus_strict.py**
    Perform strict vertical bin-averaging of ERA5 reanalysis data to match Aeolus profiles.
    - **03_merge_rs_era5_binavg.py**
    Merge radiosonde and ERA5 bin-averaged datasets into unified collocation files.
    - **04_ee_threshold_sensitivity.py**
    Sensitivity analysis for EE (estimated error) thresholds applied to Aeolus observations.
    - **05_outlier_removal.py**
    Apply Modified Z-score and other criteria to remove outliers from Aeolus–RS/ERA5 collocations.
    - **06_station_statistics.py**
    Compute collocation statistics grouped by station (Rothera, Mawson, Davis).
    - **07_plot_scatter.py**
    Generate scatterplots comparing Aeolus vs. RS/ERA5 winds.
    - **08_plot_meteorology.py**
    Generate meteorological context figures (e.g., ERA5 composites, cyclone cases).
    ---
    ## Data Source
    All raw Aeolus Level-2B data and ERA5 reference data were obtained from the Copernicus Climate Data Store:
    ---
    *Last updated: Sep 2025*

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Balázs Rózsa; Gábor Antal; Rudolf Ferenc (2022). Appendix for "Don't DIY: Automatically transform legacy Python code to support structural pattern matching" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6812499

Appendix for "Don't DIY: Automatically transform legacy Python code to support structural pattern matching"

Explore at:
Dataset updated
Aug 2, 2022
Dataset provided by
University of Szeged
Authors
Balázs Rózsa; Gábor Antal; Rudolf Ferenc
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the appendix for paper "Don’t DIY: Automatically transform legacy Python code to support structural pattern matching" presented in SCAM 2022.

Abstract

As data becomes more and more complex as technology evolves, the need to support more complex data types in programming languages has grown. However, without proper storage and manipulation capabilities, handling such data can result in hard-to-read, difficult-to-maintain code. Therefore, programming languages continuously evolve to provide more and more ways to handle complex data. Python 3.10 introduced structural pattern matching, which serves this exact purpose: we can split complex data into relevant parts by examining its structure, and store them for later processing. Previously, we could only use the traditional conditional branching, which could have led to long chains of nested conditionals. Maintaining such code fragments can be cumbersome. In this paper, we present a complete framework to solve the aforementioned problem. Our software is capable of examining Python source code and transforming relevant conditionals into structural pattern matching. Moreover, it is able to handle nested conditionals and it is also easily extensible, thus the set of possible transformations can be easily increased.

Search
Clear search
Close search
Google apps
Main menu