59 datasets found

Z
Appendix for "Don't DIY: Automatically transform legacy Python code to...
data.niaid.nih.gov
Updated Aug 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Balázs Rózsa; Gábor Antal; Rudolf Ferenc (2022). Appendix for "Don't DIY: Automatically transform legacy Python code to support structural pattern matching" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6812499
Explore at:
Dataset updated
Aug 2, 2022
Dataset provided by
University of Szeged
Authors
Balázs Rózsa; Gábor Antal; Rudolf Ferenc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the appendix for paper "Don’t DIY: Automatically transform legacy Python code to support structural pattern matching" presented in SCAM 2022.

Abstract

As data becomes more and more complex as technology evolves, the need to support more complex data types in programming languages has grown. However, without proper storage and manipulation capabilities, handling such data can result in hard-to-read, difficult-to-maintain code. Therefore, programming languages continuously evolve to provide more and more ways to handle complex data. Python 3.10 introduced structural pattern matching, which serves this exact purpose: we can split complex data into relevant parts by examining its structure, and store them for later processing. Previously, we could only use the traditional conditional branching, which could have led to long chains of nested conditionals. Maintaining such code fragments can be cumbersome. In this paper, we present a complete framework to solve the aforementioned problem. Our software is capable of examining Python source code and transforming relevant conditionals into structural pattern matching. Moreover, it is able to handle nested conditionals and it is also easily extensible, thus the set of possible transformations can be easily increased.
PRLF match pair overlap with Link Plus match pairs.
plos.figshare.com
xls
Updated Oct 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Prindle; Himal Suthar; Emily Putnam-Hornstein (2023). PRLF match pair overlap with Link Plus match pairs. [Dataset]. http://doi.org/10.1371/journal.pone.0291581.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0291581.t003
Dataset updated
Oct 20, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
John Prindle; Himal Suthar; Emily Putnam-Hornstein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PRLF match pair overlap with Link Plus match pairs.
League of Legends Match Dataset (2024)
kaggle.com
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakub Krasuski (2025). League of Legends Match Dataset (2024) [Dataset]. https://www.kaggle.com/datasets/jakubkrasuski/league-of-legends-match-dataset-2025
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jakub Krasuski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
License

CC BY 4.0 (Creative Commons Attribution 4.0 International)
This license allows sharing, modification, and use of the dataset as long as proper attribution is given.

Description

This dataset provides detailed information about League of Legends matches, collected in 2025. It covers various aspects of the game, including player statistics, team performance, and match metadata. This dataset is ideal for statistical analysis, machine learning projects, and esports research.

Data Collection & Replication

The data was obtained using a custom Python script that queries the official Riot Games API. This script automates data retrieval by starting from a single player’s PUUID and expanding through match participants. If you’d like to replicate or extend this dataset, the script is freely available here:
https://github.com/Blizzeq/league-of-legends-data-collector

Key Features

Match Details: Game duration, mode (e.g., Classic), game version, and map identifiers.

Player Statistics: Includes kills, deaths, assists, damage dealt, gold earned, items purchased, and final game stats (e.g., magic resist, movement speed, mana regeneration).

Team Data: Insights into team performance and match outcomes.

Dataset Overview

94 attributes capturing comprehensive match and player data.

Key columns: game_id, game_start_utc, game_duration, queue_id, participant_id, kills, deaths, assists, final_damageDealt, final_goldEarned, and more.

Platform: EUN1 (Europe Nordic & East server).

Use Cases

Performance analysis for individual players and teams.

Predictive modeling and machine learning applications in gaming.

Understanding player behavior and strategy in competitive gaming.

Notes

Ensure that the usage of this dataset complies with Riot Games' terms of service and privacy policies. If you plan to collect your own data using the script, remember to manage API rate limits and adhere to Riot’s policies.
Data and code for the paper Detecting Road Network Errors from Trajectory...
figshare.com
zip
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Can Yang (2024). Data and code for the paper Detecting Road Network Errors from Trajectory Data with Partial Map Matching and Bidirectional Recurrent Neural Network Model [Dataset]. http://doi.org/10.6084/m9.figshare.24056658.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24056658.v1
Dataset updated
Jan 9, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Can Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains data and codes for the manuscript Detecting Road Network Errors from Trajectory Data with Partial Map Matching and Bidirectional Recurrent Neural Network Model submitted to IJGIS. Requirements (tested with Python 3.8.8):- Numpy >= 1.22.4- PyTorch >= 2.0.0- Sklearn >= 0.24.1 - Shapely >= 1.8.2- rtree >= 1.0.0- Networkx >= 2.8.4- Geopandas >= 0.10.2### Code structure - lib: codes for partial map matching, context feature extraction and BiRNN model- data: a sample of training and test dataset in npz format, which contains "xy" and "label" attribute storing the trajectory geometryand manually labels. It also contains a shapefile of road network. ### Run the program with two steps bash# 1. Generate context featurespython feature_extract.py# 2. Training a BiRNN model for classificationpython train_birnn.pyFor demonstration, data folder contains a small trajectory dataset with labels and road network downloaded from OSM. Accessing the complete dataset can be applied at https://outreach.didichuxing.com/.
c
Research Data Supporting "Horsetail Matching: A Flexible Approach to...
repository.cam.ac.uk
txt, zip
Updated May 17, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cook, LW; Jarrett, JP (2017). Research Data Supporting "Horsetail Matching: A Flexible Approach to Optimization Under Uncertainty" [Dataset]. http://doi.org/10.17863/CAM.9695
Explore at:
zip(221643 bytes), txt(3081 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.9695
Dataset updated
May 17, 2017
Dataset provided by
University of Cambridge
Apollo
Authors
Cook, LW; Jarrett, JP
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This data comprises python source code, along with scripts that illustrate how to use this source code to recreate results in the publication. Further details are given in the README.txt file.
LoL Match History & Summoner Data – 78k Matches
kaggle.com
zip
Updated Oct 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nsmall (2025). LoL Match History & Summoner Data – 78k Matches [Dataset]. https://www.kaggle.com/datasets/nathansmallcalder/lol-match-history-and-summoner-data-80k-matches
Explore at:
zip(3770315 bytes)Available download formats
Dataset updated
Oct 20, 2025
Authors
nsmall
Description
League of Legends Relational Database for Match Prediction

Context

This dataset contains detailed match and player data from League of Legends, one of the most popular multiplayer online battle arena (MOBA) games in the world. It includes 35,000 matches and contains 78,000 summoner statistics, capturing a wide range of in-game statistics, such as champion selection, player performance metrics, match outcomes, and more.

The dataset is structured to support a variety of analyses, including:

Predicting match outcomes based on team compositions and player stats

Evaluating player performance and progression over time

Exploring trends in champion popularity and win rates

Building machine learning models for esports analytics

Whether you are interested in competitive gaming, data science, or predictive modeling, this dataset provides a rich source of structured data to explore the dynamics of League of Legends at scale.

Data Schema and Dictionary

Data was collected from Riot Games API using Python script(link) from Patch 25.19

The datase consists of 7 csv files:

MatchStatsTbl - Match Stats given a summonerID and MatchID.Contains K/D/A, Items, Runes,Ward Score, Summoner Spells, Baron Kills, Dragon Kills, Lane, DmgTaken/Dealt, Total Gold, cs,Mastery Points and Win/Loss

TeamMatchStatsTbl - Containes Red/Blue Champions,Red/Blue BaronKills,Blue/Red Turret Kills, Red/Blue Kills, RiftHearaldKills and Win/loss

MatchTbl- Contains MatchID,Rank,Match Duration and MatchType.

RankTbl - Contains RankID and RankName

ChampionTbl- Contains ChampionID and ChampionName

ItemTbl - Contains ItemID and ItemName

SummonerTbl - Contains SummonerID and SummonerName

SummonerMatchTbl - Links MatchID,SummonerID and ChampionID

Database Features

This dataset contains 35,422 League of Legends matches and 78,863 summoner statistics from those games.

Uses Data from over 2,381 summoners.

Consists of data only from Europe West(EUW)

Data is sampled from Unranked to Challenger tiers.

Database Setup

-MySQL Database using Linux -Database Schema Script can be found here. (Works with the gtihub project to collect your own data)

Limitations

The Riot API only provides the "BOTTOM" lane for bot-lane players. During Data collection, roles were inferred by combining chapions that often played support with CS metrics to distinguish ADC vs Support — especially for ambiguous picks like Senna or off-meta choices.

Acknowledgements/Privacy

Data is collected using the official Riot Games API. We thank Riot Games for providing the data and tools that make this project possible. This dataset is not endorsed or certified by Riot Games. No personal or identifiable player data (e.g., Summoner Names, Summoner IDs, or PUUIDs) are included. The SummonerTbl has been intentionally excluded from this public release.

Github

The Python scripts used for data collection, as well as various scripts I developed for API calls, database management, and initial data analytics, can be found on GitHub
d
Groundwater flow and SNTEMP stream temperature model build and history...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Groundwater flow and SNTEMP stream temperature model build and history matching workflows [Dataset]. https://catalog.data.gov/dataset/groundwater-flow-and-sntemp-stream-temperature-model-build-and-history-matching-workflows
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
MODFLOW6 and SNTEMP models were developed to simulate groundwater flows and instream temperatures in Beaver Creek, Alaska from 2019-2023 using python scripts to create a reproducible workflow to process input datasets into model files. This data release contains the scripts used to build the SNTEMP and MODFLOW models, process model output to compare to field observations, and develop and run the PEST++ workflow for history matching. These workflows are described in the readme.md files in this archive and are used to implement the modeling decisions described in the associated report, "Simulating present and future Groundwater/Surface-water interactions and stream temperatures in Beaver Creek, Kenai Peninsula, Alaska".
r
Estimating separable matching models (replication data)
resodate.org
Updated Oct 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfred Galichon; Bernard Salanié (2025). Estimating separable matching models (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9lc3RpbWF0aW5nLXNlcGFyYWJsZS1tYXRjaGluZy1tb2RlbHMtcmVwbGljYXRpb24tZGF0YQ==
Explore at:
Dataset updated
Oct 14, 2025
Dataset provided by
ZBW Journal Data Archive
ZBW
Journal of Applied Econometrics
Authors
Alfred Galichon; Bernard Salanié
Description
Code for Galichon-Salanie's "Estimating Separable Matching Models"

Usage

Create a virtual environment, e.g. with python3 -m venv env. Activate it with source env/bin/activate.

Install the requirements with pip install -r requirements.txt. Among the packages it downloads are two created by Bernard Salanié: bs_python_utils and cupid_matching. The former is just a set of utility programs. The latter contains code to solve for the stable matching and estimate the parameters of separable matching models with MDE and Poisson-GLM. The code in this folder relies heavily on these two packages, which are documented on Salanie's website: bs_python_utils and cupid_matching.

Choose the parameters in config.py and run the code with python main.py. The program will create a folder Results and save a plot and a pickled file with the estimates for the sample sample_size defined in config.py.

Each simulation sample (that is, n_sim=1) takes a few seconds (4 seconds on a Mac M2 Max 2023) to estimate the Choo and Siow model by the two methods in the paper --- minimum distance and Poisson GLM. The code is parallelized over samples, unless you choose use_multiprocessing=False. By default, it uses all except 2 of your CPUs.

Structure of the code

The master program main.py reads the parameters in config.py.

If do_create_samples is True it uses create_samples.py to read the Choo and Siow datasets in the data_dir directory and to create two samples in samples_dir (both directories are specified in config,.py). The two samples correspond to the small and large samples described in Section 6 of the paper. The files created have the marriage patterns by age (*muxy.txt), the margins (*nx.txt and *my.txt), and the variance-covariance matrix of these estimates (*varmus.pkl).

It calls read_data.py, which reads the sample defined by sample_size in config.py and prepares it for the simulation. read_data.py also has code to add a small positive number (see zero_guard in config.py) for zero cells; this is used in the MDE simulation.

specification.py creates the basis functions according to the specification given by degrees in config.py.

Then main.py runs the simulation via simulate.py as defined by config.py.

Configuration

All parameters of the simulation are in config.py:

do_create_samples, do_simuls, plot_simuls, do_simuls_mde, do_simuls_poisson define what the program does;

n_sim is the number of simulated samples;

use_multiprocessing and nb_cpus define the parallelization;

zero_guard is the small positive number added to zero cells in the sample for MDE estimation;

degrees is a list of tuples that define the degrees of the polynomials for the basis functions; e.g. an (a,b) tuple means that the basis function is $L_a(x)L_b(y)$, where $x$ and $y$ are the ages of the partners and $L_a$ is the Legendre polynomial of degree $a$. In addition to these terms, the basis functions also include the constant term; $\mathbf{1}(x>y)$, and a term proportional to $\max(x-y,0)$. The function generate_bases in specification.py creates the basis functions.

Questions

Please direct all questions to Bernard Salanie.
c
Research Data Supporting "Extending Horsetail Matching for Optimization...
repository.cam.ac.uk
Updated Aug 14, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cook, LW (2017). Research Data Supporting "Extending Horsetail Matching for Optimization Under Probabilistic, Interval and Mixed Uncertainties" [Dataset]. http://doi.org/10.17863/CAM.12577
Explore at:
Unique identifier
https://doi.org/10.17863/CAM.12577
Dataset updated
Aug 14, 2017
Dataset provided by
University of Cambridge
Apollo
Authors
Cook, LW
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data comprises a python module that implements the horsetail matching method presented in this publication, and can thus be used to recreate the results in the paper. This python module is also available at: http://www-edc.eng.cam.ac.uk/aerotools/horsetailmatching/, but is archived here.
Features of probabilistic linkage solutions available for record linkage...
plos.figshare.com
xls
Updated Oct 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Prindle; Himal Suthar; Emily Putnam-Hornstein (2023). Features of probabilistic linkage solutions available for record linkage applications. [Dataset]. http://doi.org/10.1371/journal.pone.0291581.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0291581.t001
Dataset updated
Oct 20, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
John Prindle; Himal Suthar; Emily Putnam-Hornstein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Features of probabilistic linkage solutions available for record linkage applications.
Data from: EyeFi: Fast Human Identification Through Vision and WiFi-based...
zenodo.org
data.niaid.nih.gov
zip
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon; Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon (2022). EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching [Dataset]. http://doi.org/10.5281/zenodo.7396485
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7396485
Dataset updated
Dec 5, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon; Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
EyeFi Dataset

This dataset is collected as a part of the EyeFi project at Bosch Research and Technology Center, Pittsburgh, PA, USA. The dataset contains WiFi CSI values of human motion trajectories along with ground truth location information captured through a camera. This dataset is used in the following paper "EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching" that is published in the IEEE International Conference on Distributed Computing in Sensor Systems 2020 (DCOSS '20). We also published a dataset paper titled as "Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones" in Data: Acquisition to Analysis 2020 (DATA '20) workshop describing details of data collection. Please check it out for more information on the dataset.

Data Collection Setup

In our experiments, we used Intel 5300 WiFi Network Interface Card (NIC) installed in an Intel NUC and Linux CSI tools [1] to extract the WiFi CSI packets. The (x,y) coordinates of the subjects are collected from Bosch Flexidome IP Panoramic 7000 panoramic camera mounted on the ceiling and Angle of Arrivals (AoAs) are derived from the (x,y) coordinates. Both the WiFi card and camera are located at the same origin coordinates but at different height, the camera is location around 2.85m from the ground and WiFi antennas are around 1.12m above the ground.

The data collection environment consists of two areas, first one is a rectangular space measured 11.8m x 8.74m, and the second space is an irregularly shaped kitchen area with maximum distances of 19.74m and 14.24m between two walls. The kitchen also has numerous obstacles and different materials that pose different RF reflection characteristics including strong reflectors such as metal refrigerators and dishwashers.

To collect the WiFi data, we used a Google Pixel 2 XL smartphone as an access point and connect the Intel 5300 NIC to it for WiFi communication. The transmission rate is about 20-25 packets per second. The same WiFi card and phone are used in both lab and kitchen area.

List of Files
Here is a list of files included in the dataset:

|- 1_person |- 1_person_1.h5 |- 1_person_2.h5 |- 2_people |- 2_people_1.h5 |- 2_people_2.h5 |- 2_people_3.h5 |- 3_people |- 3_people_1.h5 |- 3_people_2.h5 |- 3_people_3.h5 |- 5_people |- 5_people_1.h5 |- 5_people_2.h5 |- 5_people_3.h5 |- 5_people_4.h5 |- 10_people |- 10_people_1.h5 |- 10_people_2.h5 |- 10_people_3.h5 |- Kitchen |- 1_person |- kitchen_1_person_1.h5 |- kitchen_1_person_2.h5 |- kitchen_1_person_3.h5 |- 3_people |- kitchen_3_people_1.h5 |- training |- shuffuled_train.h5 |- shuffuled_valid.h5 |- shuffuled_test.h5 View-Dataset-Example.ipynb README.md

In this dataset, folder `1_person/` , `2_people/` , `3_people/` , `5_people/`, and `10_people/` contains data collected from the lab area whereas `Kitchen/` folder contains data collected from the kitchen area. To see how the each file is structured, please see below in section Access the data.

The training folder contains the training dataset we used to train the neural network discussed in our paper. They are generated by shuffling all the data from `1_person/` folder collected in the lab area (`1_person_1.h5` and `1_person_2.h5`).

Why multiple files in one folder?

Each folder contains multiple files. For example, `1_person` folder has two files: `1_person_1.h5` and `1_person_2.h5`. Files in the same folder always have the same number of human subjects present simultaneously in the scene. However, the person who is holding the phone can be different. Also, the data could be collected through different days and/or the data collection system needs to be rebooted due to stability issue. As result, we provided different files (like `1_person_1.h5`, `1_person_2.h5`) to distinguish different person who is holding the phone and possible system reboot that introduces different phase offsets (see below) in the system.

Special note:

For `1_person_1.h5`, this file is generated by the same person who is holding the phone, and `1_person_2.h5` contains different people holding the phone but only one person is present in the area at a time. Boths files are collected in different days as well.

Access the data
To access the data, hdf5 library is needed to open the dataset. There are free HDF5 viewer available on the official website: https://www.hdfgroup.org/downloads/hdfview/. We also provide an example Python code View-Dataset-Example.ipynb to demonstrate how to access the data.

Each file is structured as (except the files under *"training/"* folder):

|- csi_imag |- csi_real |- nPaths_1 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_2 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_3 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_4 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- num_obj |- obj_0 |- cam_aoa |- coordinates |- obj_1 |- cam_aoa |- coordinates ... |- timestamp

The `csi_real` and `csi_imag` are the real and imagenary part of the CSI measurements. The order of antennas and subcarriers are as follows for the 90 `csi_real` and `csi_imag` values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. `nPaths_x` group are SpotFi [2] calculated WiFi Angle of Arrival (AoA) with `x` number of multiple paths specified during calculation. Under the `nPath_x` group are `offset_xx` subgroup where `xx` stands for the offset combination used to correct the phase offset during the SpotFi calculation. We measured the offsets as:

|Antennas | Offset 1 (rad) | Offset 2 (rad) | |:-------:|:---------------:|:-------------:| | 1 & 2 | 1.1899 | -2.0071 | 1 & 3 | 1.3883 | -1.8129

The measurement is based on the work [3], where the authors state there are two possible offsets between two antennas which we measured by booting the device multiple times. The combination of the offset are used for the `offset_xx` naming. For example, `offset_12` is offset 1 between antenna 1 & 2 and offset 2 between antenna 1 & 3 are used in the SpotFi calculation.

The `num_obj` field is used to store the number of human subjects present in the scene. The `obj_0` is always the subject who is holding the phone. In each file, there are `num_obj` of `obj_x`. For each `obj_x1`, we have the `coordinates` reported from the camera and `cam_aoa`, which is estimated AoA from the camera reported coordinates. The (x,y) coordinates and AoA listed here are chronologically ordered (except the files in the `training` folder) . It reflects the way the person carried the phone moved in the space (for `obj_0`) and everyone else walked (for other `obj_y`, where `y` > 0).

The `timestamp` is provided here for time reference for each WiFi packets.

To access the data (Python):

import h5py data = h5py.File('3_people_3.h5','r') csi_real = data['csi_real'][()] csi_imag = data['csi_imag'][()] cam_aoa = data['obj_0/cam_aoa'][()] cam_loc = data['obj_0/coordinates'][()]

For file inside `training/` folder:

Files inside training folder has a different data structure:

|- nPath-1 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-2 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-3 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-4 |- aoa |- csi_imag |- csi_real |- spotfi

The group `nPath-x` is the number of multiple path specified during the SpotFi calculation. `aoa` is the camera generated angle of arrival (AoA) (can be considered as ground truth), `csi_image` and `csi_real` is the imaginary and real component of the CSI value. `spotfi` is the SpotFi calculated AoA values. The SpotFi values are chosen based on the lowest median and mean error from across `1_person_1.h5` and `1_person_2.h5`. All the rows under the same `nPath-x` group are aligned (i.e., first row of `aoa` corresponds to the first row of `csi_imag`, `csi_real`, and `spotfi`. There is no timestamp recorded and the sequence of the data is not chronological as they are randomly shuffled from the `1_person_1.h5` and `1_person_2.h5` files.

Citation
If you use the dataset, please cite our paper:

@inproceedings{eyefi2020, title={EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching}, author={Fang, Shiwei and Islam, Tamzeed and Munir, Sirajum and Nirjon, Shahriar}, booktitle={2020 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)}, year={2020},
R
Crosscut matching
entrepot.recherche.data.gouv.fr
Updated Nov 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugues FRANCOIS; Hugues FRANCOIS (2021). Crosscut matching [Dataset]. http://doi.org/10.15454/COS00O
Explore at:
Unique identifier
https://doi.org/10.15454/COS00O
Dataset updated
Nov 23, 2021
Dataset provided by
Recherche Data Gouv
Authors
Hugues FRANCOIS; Hugues FRANCOIS
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
Crosscut est un ensemble de scripts python et R pour faire l'appariement entre les simulations ADAMONT de la neige adaptées aux pratiques de gestion du manteau neigeux en station de sports d'hiver. Elle a été développée dans le cadre des travaux partagés entre l'INRAE et Météo France et consolidée à l'issue des travaux de thèse de P. Spandre et elle est utilisée dans le cadre du service ClimSnow (TM). -- Crosscut is a set of python and R scripts to match ADAMONT snow simulations adapted to snowpack management practices in ski areas. It was developed in the framework of shared work between INRAE and Météo France and consolidated at the end of P. Spandre's PhD work and is used in the ClimSnow (TM) service.
League of Legends Summoners and Match Data
kaggle.com
zip
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebeca Chinicz (2023). League of Legends Summoners and Match Data [Dataset]. https://www.kaggle.com/datasets/chiniczr/league-of-legends-summoners-and-match-data/suggestions
Explore at:
zip(2892577848 bytes)Available download formats
Dataset updated
Dec 19, 2023
Authors
Rebeca Chinicz
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
I wanted to create something similar to LoL Esports Win Probability, but for regular, SoloQ games, and see what could be learned from it. I uploaded here all the data I collected along the way.

I went about collecting data from as many matches as I could, in an organized manner: for 3 major regions (EUW, NA and KR) get the first 50 players in each division (7x4+3 = 31) and then get each of their last 100 matches. Due to issues like no-longer-existing summoner names, arena games also classified as ranked, etc. the total of games is less than I originally expected, but still a generous 260,367.

Since most of this data comes directly from the API, it's in JSON, but there is also a CSV file with the tabular data of the features I ended up extracting to train my simple model. I've posted a work-in-progress article explaining the project on Medium. Head on over there for more details: https://medium.com/@chiniczrebeca/practical-machine-learning-with-lol-a-simple-predictive-use-case-with-data-collection-learning-c2b6e621df66.

And if you want to see the code I used, it's all on https://github.com/Intigram/DataCollection.

Observation: on the CSV file, the value of winning_team is 0 when blue side won, and 1 when red side won.

See below a summary of the files here and what they contain: | File | Description | | --- | --- | | summoner_names.json | The first 50 summoner names (display names) from each division in EUW, NA and KR | | {Short Region Name}_puuids.json | Player Universally Unique IDentifiers, the necessary argument to get match history | | {Long Region Name}_match_ids.json | List of match IDs from each region (what you get from querying by player, you then use these to get the match details) | | {Long Region Name}_matches_{#}.json | List of final match data from each region (not timeline) | | data_all.csv | Tabular data of the (normalized-*ish*) features I extracted to train my prediction model |
f
Fit statistics for scored XGBoost models with 50,000 rows per dataset.
figshare.com
plos.figshare.com
xls
Updated Oct 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Prindle; Himal Suthar; Emily Putnam-Hornstein (2023). Fit statistics for scored XGBoost models with 50,000 rows per dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0291581.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0291581.t002
Dataset updated
Oct 20, 2023
Dataset provided by
PLOS ONE
Authors
John Prindle; Himal Suthar; Emily Putnam-Hornstein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Fit statistics for scored XGBoost models with 50,000 rows per dataset.
S
The data and code of the article ''SNSAlib: a python library for analyzing...
scidb.cn
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
aiwenli; Jun-Lin Lu; Ying Fan; Xiao-Ke Xu (2025). The data and code of the article ''SNSAlib: a python library for analyzing signed network'' [Dataset]. http://doi.org/10.57760/sciencedb.j00113.00178
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00113.00178
Dataset updated
Jan 24, 2025
Dataset provided by
Science Data Bank
Authors
aiwenli; Jun-Lin Lu; Ying Fan; Xiao-Ke Xu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The data and code related to the article ''SNSAlib: a python library for analyzing signed network'' was published in the journal of Chinese Physics B. This project contains null model construction of signed networks and its statistic features. The whole project is divided into three parts, as follows: Part1: signed networks datasetsThis part involves ten empirical signed network datasets: SPP, GGS, Wiring, Sampson, Teams, Alpha, OTC, Wiki, Slashdot, and Epinions. The first five datasets are sourced from offline real-world social networks, and the latter five are obtained from online internet platforms. The processed data is stored as a triplet in a text file (.txt). Part2: null model construction of signed networksThis part is null model construction of undirected signed networks. It have seven different methods of null model construction of undirected signed networks: positive-edge randomized null model, negative-edge randomized null model, the positive-edge and negative-edge randomized null model, full-edge randomized null model, signed randomized null model, diminish community structure null model, and enhance community structure null model. Part3: statistic features of signed networksThis part is statistic features of signed model, which can describe the difference between the null model and the real networks, and discover the extraordinary characteristics of real networks. These statistic features are common neighbors, matching coefficient, excess average degree, clustering coefficient, embeddedness, FMF, FECS and DECDS.
IPL 2024 Dataset
kaggle.com
zip
Updated Jun 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lokesh Gopal (2024). IPL 2024 Dataset [Dataset]. https://www.kaggle.com/datasets/lokeshmadiga/ipl-2024-dataset-ball-by-ball-match-wise/data
Explore at:
zip(122241 bytes)Available download formats
Dataset updated
Jun 27, 2024
Authors
Lokesh Gopal
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
IPL 2024 Dataset from cricbuzz using webscraping with the help of python Pandas and Beautifulsoup. This dataset contains Schedule, Match by Match and Ball by Ball Data. For Code you can checkout my github
s
Data sets and coding scripts for research on sensory processing in ADHD and...
orda.shef.ac.uk
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vesko Varbanov; Paul Overton (2025). Data sets and coding scripts for research on sensory processing in ADHD and ASD [Dataset]. http://doi.org/10.15131/shef.data.30704810.v1
Explore at:
Unique identifier
https://doi.org/10.15131/shef.data.30704810.v1
Dataset updated
Nov 26, 2025
Dataset provided by
The University of Sheffield
Authors
Vesko Varbanov; Paul Overton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DescriptionThis repository contains all anonymised data and analysis files for a study examining whether clinical diagnosis—beyond self-reported trait severity—differentiates sensory processing profiles in adults with ADHD and ASD. The research tested visual orientation discrimination using a psychophysical two-alternative forced-choice (2AFC) task with vertical and oblique stimuli, comparing four propensity-matched groups (n = 38 per group): clinical ADHD, non-clinical ADHD, clinical ASD, and non-clinical ASD.Methodology and TechniquesParticipants completed validated self-report measures: the Adult ADHD Self-Report Scale (ASRS) and the Broad Autism Phenotype Questionnaire (BAPQ). Sensory processing was assessed via a method-of-constant-stimuli orientation discrimination task implemented in PsychoPy, using interleaved adaptive staircases following a one-up/three-down rule.Propensity score matching (1:1 nearest neighbour, no replacement, 0.20 SD caliper on logit-transformed probabilities) was used to match clinical and non-clinical groups on trait severity. All inferential analyses were performed on the matched samples using ANCOVAs controlling for age and gender. The repository includes raw and matched datasets, analysis outputs, and the full Python code used for the matching pipeline.Ethics and ApprovalAll procedures were approved by the University of Sheffield Department of Psychology Ethics Committee (Ref: 046476). Informed consent was obtained from all participants, and all data have been anonymised following institutional and GDPR requirements.ContentsThe repository includes:Questionnaire data (ASRS, BAPQ)Visual orientation discrimination thresholds (vertical and oblique)Demographic variables (age, gender)Clinical vs. non-clinical group labelsPropensity score matching files and reproducible Python codeJASP analysis files and outputsStudy documentation and methodological detailsThese data support the study’s finding that ADHD and ASD show distinct sensory signatures: clinical ADHD was associated with reduced oblique sensitivity, while clinical ASD showed enhanced vertical discrimination relative to matched non-clinical controls. The dataset enables full reproducibility of all analyses and supports further research on sensory processing in neurodevelopmental conditions.
f
Generalized linear modeling parameter estimates of birth characteristics...
plos.figshare.com
figshare.com
xls
Updated Oct 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Prindle; Himal Suthar; Emily Putnam-Hornstein (2023). Generalized linear modeling parameter estimates of birth characteristics predicting CPS referral by age 3. [Dataset]. http://doi.org/10.1371/journal.pone.0291581.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0291581.t005
Dataset updated
Oct 20, 2023
Dataset provided by
PLOS ONE
Authors
John Prindle; Himal Suthar; Emily Putnam-Hornstein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Generalized linear modeling parameter estimates of birth characteristics predicting CPS referral by age 3.
Industry-Education Skills Matching Dataset
kaggle.com
zip
Updated Sep 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Python Developer (2025). Industry-Education Skills Matching Dataset [Dataset]. https://www.kaggle.com/datasets/programmer3/industry-education-skills-matching-dataset/data
Explore at:
zip(925395 bytes)Available download formats
Dataset updated
Sep 8, 2025
Authors
Python Developer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset has been curated to explore the dynamic alignment of industry skill demands with educational course offerings. It integrates 5000 rows of data representing both technical and non-technical skills, mapped against job categories and course types.

Key Features:

Skill Attributes (20+): Includes technical skills such as programming, data analysis, machine learning, cloud computing, web development, cybersecurity, DevOps, mobile development, IoT, and AI programming, as well as essential soft skills like communication, teamwork, problem-solving, project management, business analysis, research, and ethics.

Job Category: Numerical representation of job domains to which skill demands are mapped.

Course Type: Categorized labels (e.g., Tech, Design) indicating the course domain recommended for bridging identified skill gaps.
Data from: Evaluation of the Applicability of AEOLUS Satellite Wind Products...
zenodo.org
zip
Updated Sep 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chanfang Shu; zongyu chen; Zhaoliang Zeng; Zhaoliang Zeng; wenhao li; wenhao li; C K Shum; C K Shum; Shengkai Zhang; Shengkai Zhang; li fei; Chanfang Shu; zongyu chen; li fei (2025). Evaluation of the Applicability of AEOLUS Satellite Wind Products in Antarctica [Dataset]. http://doi.org/10.5281/zenodo.15743887
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15743887
Dataset updated
Sep 12, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chanfang Shu; zongyu chen; Zhaoliang Zeng; Zhaoliang Zeng; wenhao li; wenhao li; C K Shum; C K Shum; Shengkai Zhang; Shengkai Zhang; li fei; Chanfang Shu; zongyu chen; li fei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Sep 2025
Area covered
Antarctica
Description

# README

## Data and Code Description

This package supports the paper **"Evaluation of the Applicability of AEOLUS Satellite Wind Products in Antarctica"**. It contains all data and code needed to reproduce every analysis and figure presented in the manuscript.

### Folder Structure

- **data/**: Processed data files for analysis and plotting.

- **code/**: Python scripts for data processing, analysis, and visualization.

- **README.md**: This documentation file.

---

## System Requirements

- **Operating System:** Windows 10 (or equivalent)

- **Language & Dependencies:**

- Python 3.9 or higher

- See `requirements.txt` for a full list (e.g., `numpy`, `pandas`, `scipy`, `matplotlib`).

---

## Usage Instructions

1. **Unpack** the archive to your working directory.

2. **(Optional, recommended)** Create a Python virtual environment and install dependencies:

```bash

python -m venv venv

source venv/bin/activate # Linux/macOS

venv\Scripts\activate # Windows

pip install -r requirements.txt

```

3. **Run the analysis scripts** in the `/code` directory **in the following order**:

- **01_rs_binavg_to_aeolus.py**

Perform vertical bin-averaging of radiosonde data to match Aeolus vertical resolution.

- **02_era5_binavg_to_aeolus_strict.py**

Perform strict vertical bin-averaging of ERA5 reanalysis data to match Aeolus profiles.

- **03_merge_rs_era5_binavg.py**

Merge radiosonde and ERA5 bin-averaged datasets into unified collocation files.

- **04_ee_threshold_sensitivity.py**

Sensitivity analysis for EE (estimated error) thresholds applied to Aeolus observations.

- **05_outlier_removal.py**

Apply Modified Z-score and other criteria to remove outliers from Aeolus–RS/ERA5 collocations.

- **06_station_statistics.py**

Compute collocation statistics grouped by station (Rothera, Mawson, Davis).

- **07_plot_scatter.py**

Generate scatterplots comparing Aeolus vs. RS/ERA5 winds.

- **08_plot_meteorology.py**

Generate meteorological context figures (e.g., ERA5 composites, cyclone cases).

---

## Data Source

All raw Aeolus Level-2B data and ERA5 reference data were obtained from the Copernicus Climate Data Store:

> https://aeolus-ds.eo.esa.int/oads/access/collection

>https://cds.climate.copernicus.eu/datasets

>https://weather.uwyo.edu/upperair/seasia.html

---

*Last updated: Sep 2025*

Facebook

Twitter

Click to copy link

Link copied

Cite

Balázs Rózsa; Gábor Antal; Rudolf Ferenc (2022). Appendix for "Don't DIY: Automatically transform legacy Python code to support structural pattern matching" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6812499

Appendix for "Don't DIY: Automatically transform legacy Python code to support structural pattern matching"

Explore at:

Dataset updated

Aug 2, 2022

Dataset provided by

University of Szeged

Authors

Balázs Rózsa; Gábor Antal; Rudolf Ferenc

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the appendix for paper "Don’t DIY: Automatically transform legacy Python code to support structural pattern matching" presented in SCAM 2022.

Abstract

As data becomes more and more complex as technology evolves, the need to support more complex data types in programming languages has grown. However, without proper storage and manipulation capabilities, handling such data can result in hard-to-read, difficult-to-maintain code. Therefore, programming languages continuously evolve to provide more and more ways to handle complex data. Python 3.10 introduced structural pattern matching, which serves this exact purpose: we can split complex data into relevant parts by examining its structure, and store them for later processing. Previously, we could only use the traditional conditional branching, which could have led to long chains of nested conditionals. Maintaining such code fragments can be cumbersome. In this paper, we present a complete framework to solve the aforementioned problem. Our software is capable of examining Python source code and transforming relevant conditionals into structural pattern matching. Moreover, it is able to handle nested conditionals and it is also easily extensible, thus the set of possible transformations can be easily increased.

Clear search

Close search

Google apps

Main menu

Appendix for "Don't DIY: Automatically transform legacy Python code to...

PRLF match pair overlap with Link Plus match pairs.

League of Legends Match Dataset (2024)

License

Description

Data Collection & Replication

Key Features

Dataset Overview

Use Cases

Notes

Data and code for the paper Detecting Road Network Errors from Trajectory...

Research Data Supporting "Horsetail Matching: A Flexible Approach to...

LoL Match History & Summoner Data – 78k Matches

League of Legends Relational Database for Match Prediction

Context

Data Schema and Dictionary

Database Features

Database Setup

Limitations

Acknowledgements/Privacy

Github

Groundwater flow and SNTEMP stream temperature model build and history...

Estimating separable matching models (replication data)

Code for Galichon-Salanie's "Estimating Separable Matching Models"

Usage

Structure of the code

Configuration

Questions

Research Data Supporting "Extending Horsetail Matching for Optimization...

Features of probabilistic linkage solutions available for record linkage...

Data from: EyeFi: Fast Human Identification Through Vision and WiFi-based...

Crosscut matching

League of Legends Summoners and Match Data

Fit statistics for scored XGBoost models with 50,000 rows per dataset.

The data and code of the article ''SNSAlib: a python library for analyzing...

IPL 2024 Dataset

Data sets and coding scripts for research on sensory processing in ADHD and...

Generalized linear modeling parameter estimates of birth characteristics...

Industry-Education Skills Matching Dataset

Data from: Evaluation of the Applicability of AEOLUS Satellite Wind Products...

Appendix for "Don't DIY: Automatically transform legacy Python code to support structural pattern matching"