Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Full and dummy snapshots (2022-06-04) of data for mp-time-split encoded via matminer convenience functions grabbed via the new Materials Project API. The dataset is restricted to experimentally verified compounds with no more than 52 sites. No other filtering criteria were applied. The snapshots were developed for sparks-baird/mp-time-split as a benchmark dataset for materials generative modeling. Compressed version of the files (.gz) are also available.
dtypes
python
from pprint import pprint
from matminer.utils.io import load_dataframe_from_json
filepath = "insert/path/to/file/here.json"
expt_df = load_dataframe_from_json(filepath)
pprint(expt_df.iloc[0].apply(type).to_dict())
{'discovery': , 'energy_above_hull': , 'formation_energy_per_atom': , 'material_id': , 'references': , 'structure': , 'theoretical': , 'year': }
index/mpids
(just the number for the index). Note that material_id
-s that begin with "mvc-" have the "mvc" dropped and the hyphen (minus sign) is left to distinguish between "mp-" and "mvc-" types while still allowing for sorting. E.g. mvc-001
-> -1.
{146: MPID(mp-146), 925: MPID(mp-925), 1282: MPID(mp-1282), 1335: MPID(mp-1335), 12778: MPID(mp-12778), 2540: MPID(mp-2540), 316: MPID(mp-316), 1395: MPID(mp-1395), 2678: MPID(mp-2678), 1281: MPID(mp-1281), 1251: MPID(mp-1251)}
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Netflix, Inc. is an American media company engaged in paid streaming and the production of films and series.
Market capitalization of Netflix (NFLX)
Market cap: $517.08 Billion USD
As of June 2025 Netflix has a market cap of $517.08 Billion USD. This makes Netflix the world's 19th most valuable company by market cap according to our data. The market capitalization, commonly called market cap, is the total market value of a publicly traded company's outstanding shares and is commonly used to measure how much a company is worth.
Revenue for Netflix (NFLX)
Revenue in 2025: $40.17 Billion USD
According to Netflix's latest financial reports the company's current revenue (TTM ) is $40.17 Billion USD. In 2024 the company made a revenue of $39.00 Billion USD an increase over the revenue in the year 2023 that were of $33.72 Billion USD. The revenue is the total amount of income that a company generates by the sale of goods or services. Unlike with the earnings no expenses are subtracted.
Earnings for Netflix (NFLX)
Earnings in 2025 (TTM): $11.31 Billion USD
According to Netflix's latest financial reports the company's current earnings are $40.17 Billion USD. In 2024 the company made an earning of $10.70 Billion USD, an increase over its 2023 earnings that were of $7.02 Billion USD. The earnings displayed on this page is the company's Pretax Income.
On Jun 12th, 2025 the market cap of Netflix was reported to be:
$517.08 Billion USD by Yahoo Finance
$517.08 Billion USD by CompaniesMarketCap
$517.21 Billion USD by Nasdaq
Geography: USA
Time period: May 2002- June 2025
Unit of analysis: Netflix Stock Data 2025
Variable | Description |
---|---|
date | date |
open | The price at market open. |
high | The highest price for that day. |
low | The lowest price for that day. |
close | The price at market close, adjusted for splits. |
adj_close | The closing price after adjustments for all applicable splits and dividend distributions. Data is adjusted using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards. |
volume | The number of shares traded on that day. |
This dataset belongs to me. I’m sharing it here for free. You may do with it as you wish.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Alinaghi, N., Giannopoulos, I., Kattenbeck, M., & Raubal, M. (2025). Decoding wayfinding: analyzing wayfinding processes in the outdoor environment. International Journal of Geographical Information Science, 1–31. https://doi.org/10.1080/13658816.2025.2473599
Link to the paper: https://www.tandfonline.com/doi/full/10.1080/13658816.2025.2473599
The folder named “submission” contains the following:
ijgis.yml
: This file lists all the Python libraries and dependencies required to run the code.ijgis.yml
file to create a Python project and environment. Ensure you activate the environment before running the code.pythonProject
folder contains several .py
files and subfolders, each with specific functionality as described below..png
file for each column of the raw gaze and IMU recordings, color-coded with logged events..csv
files.overlapping_sliding_window_loop.py
.plot_labels_comparison(df, save_path, x_label_freq=10, figsize=(15, 5))
in line 116 visualizes the data preparation results. As this visualization is not used in the paper, the line is commented out, but if you want to see visually what has been changed compared to the original data, you can comment out this line..csv
files in the results folder.This part contains three main code blocks:
iii. One for the XGboost code with correct hyperparameter tuning:
Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically test the confidence threshold of
Note: Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically calculated the confidence threshold of the model (explained in the paper in Section 5.2. Part II: Decoding surveillance by sequence analysis) is given in this block in lines 361 to 380.
.csv
file containing inferred labels.The data is licensed under CC-BY, the code is licensed under MIT.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Three dev/test sets for MT quality estimation created from subcorpora of ParIce. The dev/test sets contain English-Icelandic segment pairs. One of the three sets is made up of subtitle segments from OpenSubtitles, one of segments from drug descriptions distributed by the European Medical Agency (EMA) and one from EEA documents. The sets are manually annotated so all pairs are correct.
The goal was to create dev/test sets with a total of at least 3000 correct translation segments from each subcorpus. All segments contain four or more words in the English segments. The OpenSubtitles set contains 1,531/1,532 segments in dev/test. Furthermore, It contains 2,277 segment pairs that have less than four words on the English side and 777 segment pairs that have incorrect alignments or translations. The training set contains 1,298,489 segments, which have not been manually checked for errors. The OpenSubtitles sets are compiled using a Python script that downloads the segments and creates the splits. The EMA set contains 2,254/2,255 segment pairs in dev/test. Furthermore, it contains 491 segment pairs that have less than four words on the English side and 240 segments that have incorrect alignments or translations. The training set contains 399.093 segments, which have not been manually checked for errors. The EEA set contains 22 whole documents. Documents with between 100 and 200 sentences were selected at random until we reached more than 3000 sentence pairs. Alignments and translations were manually corrected for these documents. Longer sentences were split into smaller parts, where possible. The split consists of 2,292/2,396 dev/test segments and 1,697,927 training segments that have not been manually checked.
Þrjú sett af setningum til þróunar/prófunar á þýðingavélum. Settin eru búin til úr undirmálheildum ParIce og innihalda ensk-íslensk pör. Eitt af settunum er búið til úr skjátextum úr OpenSubtitles, annað úr fylgiseðlatextum frá EMA og það þriðja úr EES-þýðingum. Pörin hafa verið handyfirfarin til að tryggja að þróunar-/prófunargögn séu örugglega rétt.
Markmiðið var að búa til sett til þróunar/prófunar sem hefðu a.m.k. 3000 réttar þýðingar samtals fyrir hverja undirmálheild. Í öllum pörunum eru a.m.k. fjögur orð í enska hlutanum. Settin úr OpenSubtitles inniheldur 1,531/1,532 pör fyrir þróun/prófun. Að auki fylgja með 2,277 pör þar sem færri en fjögur orð eru í enska hlutanum og 777 pör þar sem þýðing eða samröðun er röng. Þjálfunarsettið inniheldur 1,298,489 pör, sem ekki hafa verið handyfirfarin. OpenSubtitles settin eru mynduð með Python forriti sem sækir pörin og skiptir þeim upp í settin. EMA settin innihalda 2,254/2,255 pör fyrir þróun/prófun. Að auki fylgja með 491 pör þar sem færri en fjögur orð eru í enska hlutanum og 240 pör þar sem þýðing eða samröðun er röng. Þjálfunarsettið inniheldur 399,093 pör, sem ekki hafa verið handyfirfarin. EES settin innihalda 22 heil skjöl. Skjölin voru valin af handahófi úr þeim skjölum í málheildinni sem innihalda á milli 100 og 200 setningar, þar til fleiri en 3000 setningum var náð. Samröðun var handyfirfarin og löguð og rangar þýðingar einnig. Lengri setningum var skipt upp í minni hluta, þegar hægt var. Settin innihalda 2,292/2,396 pör fyrir þróun/prófun og 1,697,927 pör til þjálfunar. Þjálfunarpörin hafa ekki verið handyfirfarin.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and Python code used for AOD prediction with DustNet model - a Machine Learning/AI based forecasting.
Model input data and code
Processed MODIS AOD data (from Aqua and Terra) and selected ERA5 variables* ready to reproduce the DustNet model results or for similar forecasting with Machine Learning. These long-term daily timeseries (2003-2022) are provided as n-dimensional NumPy arrays. The Python code to handle the data and run the DustNet model** is included as Jupyter Notebook ‘DustNet_model_code.ipynb’. A subfolder with normalised and split data into training/validation/testing sets is also provided with Python code for two additional ML based models** used for comparison (U-NET and Conv2D). Pre-trained models are also archived here as TensorFlow files.
Model output data and code
This dataset was constructed by running the ‘DustNet_model_code.ipynb’ (see above). It consists of 1095 days of forecased AOD data (2020-2022) by CAMS, DustNet model, naïve prediction (persistence) and gridded climatology. The ground truth raw AOD data form MODIS is provided for comparison and statystical analysis of predictions. It is intended for a quick reproduction of figures and statystical analysis presented in DustNet introducing paper.
*datasets are NumPy arrays (v1.23) created in Python v3.8.18.
**all ML models were created with Keras in Python v3.10.10.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pol Febrer (pol.febrer@icn2.cat, ORCID 0000-0003-0904-2234) Peter Bjorn Jorgensen (peterbjorgensen@gmail.com, ORCID 0000-0003-4404-7276) Arghya Bhowmik (arbh@dtu.dk, ORCID 0000-0003-3198-5116)
The dataset is published as part of the paper: "GRAPH2MAT: UNIVERSAL GRAPH TO MATRIX CONVERSION FOR ELECTRON DENSITY PREDICTION" (https://doi.org/10.26434/chemrxiv-2024-j4g21) https://github.com/BIG-MAP/graph2mat
This dataset contains the Hamiltonian, Overlap, Density and Energy Density matrices from SIESTA calculations of a subset of the MD17 aspirin dataset. The subset is taken from the third split in (https://doi.org/10.6084/m9.figshare.12672038.v3).
SIESTA 5.0.0 was used to compute the dataset.
The dataset has two directories:
And then, three directories containing the calculations with different basis sets: - matrix_dataset_defsplit: Uses the default split-valence DZP basis in SIESTA. - matrix_dataset_optimsplit: Uses a split-valence DZP basis optimized for aspirin. - matrix_dataset_defnodes: Uses the default nodes DZP basis in SIESTA.
Each of the basis directories has two subdirectories: - basis: Contains the files specifying the basis used for each atom. - runs: The results of running the SIESTA simulations. Contents are discussed next.
The "runs" directory contains one directory for each run, named with the index of the run. Each directory contains: - RUN.fdf, geom.fdf: The input files used for the SIESTA calculation. - RUN.out: The log of the SIESTA run, which apar - siesta.TSDE: Contains the Density and Energy Density matrices. - siesta.TSHS: Contains the Hamiltonian and Overlap matrices.
Each matrix can be read using the sisl python package (https://github.com/zerothi/sisl) like:
import sisl
matrix = sisl.get_sile("RUN.fdf").read_X()
where X is hamiltonian, overlap, density_matrix or energy_density_matrix.
To reproduce the results presented in the paper, follow the documentation of the graph2mat package (https://github.com/BIG-MAP/graph2mat).
https://doi.org/10.11583/DTU.c.7310005 © 2024 Technical University of Denmark
This dataset is published under the CC BY 4.0 license. This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is obtained as part of AIEP prject by Digital Green and Karya from the extension workers, lead farmers and farmers. Process of collection of data: Selected users were given the option of doing a task and getting paid for it. The users were supposed to record the sentence as it appeared on the screen. The audio file thus obtained was validated matched with the sentences to fine tune the model. Also available are the python script that helps in processing and splitting the data into… See the full description on the dataset page: https://huggingface.co/datasets/CGIAR/KikuyuASR_trainingdataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data from an NIH HTS of 17K compounds against five isozymes of cytochrome P450 screening for inhibition. The activity score is taken from the NIH assay and merged with all the 2-D descriptors from the program Molecular Operating Environment (MOE). The datasets are separated by isozyme and then balanced between actives and inactives. Finally the balanced datasets are subject to an 80/20 training/test split. Link to python script of data manipulation...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Description
This dataset is a large-scale set of measurements for RSS-based localization. The data consists of received signal strength (RSS) measurements taken using the POWDER Testbed at the University of Utah. Samples include either 0, 1, or 2 active transmitters.
The dataset consists of 5,214 unique samples, with transmitters in 5,514 unique locations. The majority of the samples contain only 1 transmitter, but there are small sets of samples with 0 or 2 active transmitters, as shown below. Each sample has RSS values from between 10 and 25 receivers. The majority of the receivers are stationary endpoints fixed on the side of buildings, on rooftop towers, or on free-standing poles. A small set of receivers are located on shuttles which travel specific routes throughout campus.
Dataset Description | Sample Count | Receiver Count |
---|---|---|
No-Tx Samples | 46 | 10 to 25 |
1-Tx Samples | 4822 | 10 to 25 |
2-Tx Samples | 346 | 11 to 12 |
The transmitters for this dataset are handheld walkie-talkies (Baofeng BF-F8HP) transmitting in the FRS/GMRS band at 462.7 MHz. These devices have a rated transmission power of 1 W. The raw IQ samples were processed through a 6 kHz bandpass filter to remove neighboring transmissions, and the RSS value was calculated as follows:
\(RSS = \frac{10}{N} \log_{10}\left(\sum_i^N x_i^2 \right) \)
Measurement Parameters | Description |
---|---|
Frequency | 462.7 MHz |
Radio Gain | 35 dB |
Receiver Sample Rate | 2 MHz |
Sample Length | N=10,000 |
Band-pass Filter | 6 kHz |
Transmitters | 0 to 2 |
Transmission Power | 1 W |
Receivers consist of Ettus USRP X310 and B210 radios, and a mix of wide- and narrow-band antennas, as shown in the table below Each receiver took measurements with a receiver gain of 35 dB. However, devices have different maxmimum gain settings, and no calibration data was available, so all RSS values in the dataset are uncalibrated, and are only relative to the device.
Usage Instructions
Data is provided in .json
format, both as one file and as split files.
import json
data_file = 'powder_462.7_rss_data.json'
with open(data_file) as f:
data = json.load(f)
The json
data is a dictionary with the sample timestamp as a key. Within each sample are the following keys:
rx_data
: A list of data from each receiver. Each entry contains RSS value, latitude, longitude, and device name.tx_coords
: A list of coordinates for each transmitter. Each entry contains latitude and longitude.metadata
: A list of dictionaries containing metadata for each transmitter, in the same order as the rows in tx_coords
File Separations and Train/Test Splits
In the separated_data.zip
folder there are several train/test separations of the data.
all_data
contains all the data in the main JSON file, separated by the number of transmitters.stationary
consists of 3 cases where a stationary receiver remained in one location for several minutes. This may be useful for evaluating localization using mobile shuttles, or measuring the variation in the channel characteristics for stationary receivers.train_test_splits
contains unique data splits used for training and evaluating ML models. These splits only used data from the single-tx case. In other words, the union of each splits, along with unused.json
, is equivalent to the file all_data/single_tx.json
.
random
split is a random 80/20 split of the data.special_test_cases
contains the stationary transmitter data, indoor transmitter data (with high noise in GPS location), and transmitters off campus.grid
split divides the campus region in to a 10 by 10 grid. Each grid square is assigned to the training or test set, with 80 squares in the training set and the remainder in the test set. If a square is assigned to the test set, none of its four neighbors are included in the test set. Transmitters occuring in each grid square are assigned to train or test. One such random assignment of grid squares makes up the grid
split.seasonal
split contains data separated by the month of collection, in April or July.transportation
split contains data separated by the method of movement for the transmitter: walking, cycling, or driving. The non-driving.json
file contains the union of the walking and cycling data.campus.json
contains the on-campus data, so is equivalent to the union of each split, not including unused.json
.Digital Surface Model
The dataset includes a digital surface model (DSM) from a State of Utah 2013-2014 LiDAR survey. This map includes the University of Utah campus and surrounding area. The DSM includes buildings and trees, unlike some digital elevation models.
To read the data in python:
import rasterio as rio
import numpy as np
import utm
dsm_object = rio.open('dsm.tif')
dsm_map = dsm_object.read(1) # a np.array containing elevation values
dsm_resolution = dsm_object.res # a tuple containing x,y resolution (0.5 meters)
dsm_transform = dsm_object.transform # an Affine transform for conversion to UTM-12 coordinates
utm_transform = np.array(dsm_transform).reshape((3,3))[:2]
utm_top_left = utm_transform @ np.array([0,0,1])
utm_bottom_right = utm_transform @ np.array([dsm_object.shape[0], dsm_object.shape[1], 1])
latlon_top_left = utm.to_latlon(utm_top_left[0], utm_top_left[1], 12, 'T')
latlon_bottom_right = utm.to_latlon(utm_bottom_right[0], utm_bottom_right[1], 12, 'T')
Dataset Acknowledgement: This DSM file is acquired by the State of Utah and its partners, and is in the public domain and can be freely distributed with proper credit to the State of Utah and its partners. The State of Utah and its partners makes no warranty, expressed or implied, regarding its suitability for a particular use and shall not be liable under any circumstances for any direct, indirect, special, incidental, or consequential damages with respect to users of this product.
DSM DOI: https://doi.org/10.5069/G9TH8JNQ
Runs from two papers exploring the use of mass conserving LSTM. Model results used in the papers are 1) model_outputs_for_analysis_extreme_events_paper.tar.gz, and 2) model_outputs_for_analysis_mass_balance_paper.tar.gz.
The models here are trained/calibrated on three different time periods. Standard Time Split (time split 1): test period(1989-1999) is the same period used by previous studies which allows us to confirm that the deep learning models (LSTM andMC-LSTM) trained for this project perform as expected relative to prior work. NWM Time Split (time split 2): The second test period (1995-2014) allows us to benchmark against the NWM-Rv2, which does not provide data prior to 1995. Return period split: The third test period (based on return periods) allows us to benchmark only on water years that contain streamflow events that are larger (per basin) than anything seen in the training data (<= 5-year return periods in training and > 5-year return periods in testing).
Also included are an ensemble of model runs for LSTM, MC-LSTM for the "standard" training period and two forcing products. These files are provided in the format "
IMPORTANT NOTE: This python environment should be used to extract and load the data: https://github.com/jmframe/mclstm_2021_extrapolate/blob/main/python_environment.yml, as the pickle files serialized the data with specific versions of python libraries. Specifically, the pickle serialization was done with xarray=0.16.1.
Code to interpret these runs can be found here: https://github.com/jmframe/mclstm_2021_extrapolate https://github.com/jmframe/mclstm_2021_mass_balance
Papers are available here: https://hess.copernicus.org/preprints/hess-2021-423/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities. Details can be found in the attached report. Annotation The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library. The split into train, validation and test set follows the split of the original datasets. Installation
pip install pandas pyarrow Example
import pandas as pddf = pd.read_parquet('annotation_train.parquet', engine='pyarrow')print(df.iloc[0])
dataset AudioSet filename train/---2_BBVHAA.mp3 captions_visual [a man in a black hat and glasses.] captions_auditory [a man speaks and dishes clank.] tags [Speech] Description The annotation file consists of the following fields:filename: Name of the corresponding file (video or audio file)dataset: Source dataset associated with the data pointcaptions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual contentcaptions_auditory: A list of captions related to the auditory content of the videotags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided Data files The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at schaumloeffel@em.uni-frankfurt.de
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present the tables of integration coefficients for the 2- and 3-stage adaptive splitting integrators derived for Hamiltonian Monte Carlo (HMC) using the Adaptive Integration Approach s-AIA introduced in
The tables provide the maps that assign the optimal (in terms of the best conservation of energy for harmonic forces) integration coefficient for a k-stage palindromic splitting integrator to a nondimensional simulation step size in the stability interval (0, 2 k).
The repository includes the two tables for 2- and 3-stage s-AIA, a Python script that provides the optimal integration coefficient for a user-chosen dimensional step size, two .txt files containing the values of the optimal integration coefficients for 2- and 3-stage s-AIA used by the Python script, and a readme.pdf file describing the s-AIA methodology and the usage guidelines for the tables.
These data were originally collected in the 1970s and early 1980s, and archived at NODC in a text format whose column-based structure varies depending on the data record type represented by a given line of text. These text files were parsed using Python code which splits the data into separate files according to record type, and stores the data in comma-separated values format. Inputs to the Python code include the original data file, CSV files with information on how to parse each record type within the data file, and any lookups required to interpret the data, such as transforming an equipment code of "8" into "EKMAN GRAB". The CSV files with information on how to parse each record type were created by referencing parsing instructions provided by NCEI. If a given record type is not included in the actual data, then no output files for that record type are created. This project includes a readme file, original data files from prior investigators, code lookups, CSV files of parsing instructions, optional files created by splitting original data files into separate files by record type, output CSV files created by parsing original data files into separate files by record type, and Python scripts to perform the parsing. The output CSV files represent the dataset produced from this work. Parsing instructions for original data files as well as data codes can be found at https://www.nodc.noaa.gov/access/dataformats.html. Taxon identifiers from the Integrated Taxonomic Information System can be included in the output by the parsing code; full taxonomic information for these identifiers can be retrieved from the ITIS website, https://itis.gov/.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication data for Replication Data for Are Resdistricting No-Split Rules Neutral? Post-2020 Ohio as a Case Study. Python codes, input and output files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset provides a collection of behaviour biometrics data (commonly known as Keyboard, Mouse and Touchscreen (KMT) dynamics). The data was collected for use in a FinTech research project undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. The project called CyberSIgnature uses KMT dynamics data to distinguish between legitimate card owners and fraudsters. An application was developed that has a graphical user interface (GUI) similar to a standard online card payment form including fields for card type, name, card number, card verification code (cvc) and expiry date. Then, user KMT dynamics were captured while they entered fictitious card information on the GUI application.
The dataset consists of 1,760 KMT dynamic instances collected over 88 user sessions on the GUI application. Each user session involves 20 iterations of data entry in which the user is assigned a fictitious card information (drawn at random from a pool) to enter 10 times and subsequently presented with 10 additional card information, each to be entered once. The 10 additional card information is drawn from a pool that has been assigned or to be assigned to other users. A KMT data instance is collected during each data entry iteration. Thus, a total of 20 KMT data instances (i.e., 10 legitimate and 10 illegitimate) was collected during each user entry session on the GUI application.
The raw dataset is stored in .json format within 88 separate files. The root folder named behaviour_biometrics_dataset' consists of two sub-folders
raw_kmt_dataset' and `feature_kmt_dataset'; and a Jupyter notebook file (kmt_feature_classificatio.ipynb). Their folder and file content is described below:
-- raw_kmt_dataset': this folder contains 88 files, each named
raw_kmt_user_n.json', where n is a number from 0001 to 0088. Each file contains 20 instances of KMT dynamics data corresponding to a given fictitious card; and the data instances are equally split between legitimate (n = 10) and illegitimate (n = 10) classes. The legitimate class corresponds to KMT dynamics captured from the user that is assigned to the card detail; while the illegitimate class corresponds to KMT dynamics data collected from other users entering the same card detail.
-- feature_kmt_dataset': this folder contains two sub-folders, namely:
feature_kmt_json' and feature_kmt_xlsx'. Each folder contains 88 files (of the relevant format: .json or .xlsx) , each named
feature_kmt_user_n', where n is a number from 0001 to 0088. Each file contains 20 instances of features extracted from the corresponding `raw_kmt_user_n' file including the class labels (legitimate = 1 or illegitimate = 0).
-- `kmt_feature_classification.ipynb': this file contains python code necessary to generate features from the raw KMT files and apply simple machine learning classification task to generate results. The code is designed to run with minimal effort from the user.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A strawberry dataset for the paper "Qi Yang, Licheng Liu, Junxiong Zhou, Mary Rogers, Zhenong Jin, 2024. Predicting the growth trajectory and yield of greenhouse strawberries based on knowledge-guided computer vision, Computers and Electronics in Agriculture, 220, 108911. https://doi.org/10.1016/j.compag.2024.108911" target="_blank" rel="noreferrer noopener">https://doi.org/10.1016/j.compag.2024.108911"
The folder "measurement.zip" includes treatment-level and fruit-level ground truth data.
data_dryMatter_2022.csv
data_dryMatter_2023.csv
data_freshMatter_2022.csv
data_freshMatter_2023.csv
data_fruitNumber_2022.csv
data_fruitNumber_2023.csv
data_plantBiomass_2022.csv
data_plantBiomass_2023.csv
Fruit conditions with five classes, 1-5 represent Normal, Wizened, Malformed, Wizened & Malformed, and Overripe, respectively.
data_size_freshWeight_condition_2022_0N.csv
data_size_freshWeight_condition_2022_50N.csv
data_size_freshWeight_condition_2022_100N.csv
data_size_freshWeight_condition_2022_150N.csv
Fruit size for tagged fruits
data_taggedFruit_diameter_2022.csv
data_taggedFruit_diameter_2023.csv
data_taggedFruit_length_2022.csv
data_taggedFruit_length_2023.csv
Fresh yield and lifespan for tagged fruits (only available in experiment 2023)
data_taggedFruit_freshMatter_2023.csv
data_taggedFruit_lifespan_2023.csv
weather_daily_2022.csv
weather_daily_2023.csv
The folder "strawberry_img_random.zip" contains images and the corresponding JSON labels for object and phenological stages detection.
The folder "strawberry_img_tagged.zip" contains images and the corresponding JSON labels for fruit size and decimal phenological stages detection.
For example,
"label": "small g, 8.84, 7.62, 0.4",
This label means the fruit has an 8.84mm diameter and 7.62mm length,
with the main stage being small green and the decimal stage being DS-4
A Python script, "datasetProcessing.py", can be used to merge and split the image data into training and testing set.
models.zip
Data collector: Dr. Qi Yang, University of Minnesota, USA. Email: qiyang577@gmail.com
All the files belong to Prof. Zhenong Jin, University of Minnesota, USA. Email: jinzn@umn.edu
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pol Febrer (pol.febrer@icn2.cat, ORCID 0000-0003-0904-2234) Peter Bjorn Jorgensen (peterbjorgensen@gmail.com, ORCID 0000-0003-4404-7276) Arghya Bhowmik (arbh@dtu.dk, ORCID 0000-0003-3198-5116)
The dataset is published as part of the paper: "GRAPH2MAT: UNIVERSAL GRAPH TO MATRIX CONVERSION FOR ELECTRON DENSITY PREDICTION" (https://doi.org/10.26434/chemrxiv-2024-j4g21) https://github.com/BIG-MAP/graph2mat
This dataset contains the Hamiltonian, Overlap, Density and Energy Density matrices from SIESTA calculations of a subset of the MD17 aspirin dataset. The subset is taken from the third split in (https://doi.org/10.6084/m9.figshare.12672038.v3).
SIESTA 5.0.0 was used to compute the dataset.
The dataset has two directories:
And then, three directories containing the calculations with different basis sets: - matrix_dataset_defsplit: Uses the default split-valence DZP basis in SIESTA. - matrix_dataset_optimsplit: Uses a split-valence DZP basis optimized for aspirin. - matrix_dataset_defnodes: Uses the default nodes DZP basis in SIESTA.
Each of the basis directories has two subdirectories: - basis: Contains the files specifying the basis used for each atom. - runs: The results of running the SIESTA simulations. Contents are discussed next.
The "runs" directory contains one directory for each run, named with the index of the run. Each directory contains: - RUN.fdf, geom.fdf: The input files used for the SIESTA calculation. - RUN.out: The log of the SIESTA run, which apar - siesta.TSDE: Contains the Density and Energy Density matrices. - siesta.TSHS: Contains the Hamiltonian and Overlap matrices.
Each matrix can be read using the sisl python package (https://github.com/zerothi/sisl) like:
import sisl
matrix = sisl.get_sile("RUN.fdf").read_X()
where X is hamiltonian, overlap, density_matrix or energy_density_matrix.
To reproduce the results presented in the paper, follow the documentation of the graph2mat package (https://github.com/BIG-MAP/graph2mat).
https://doi.org/10.11583/DTU.c.7310005 © 2024 Technical University of Denmark
This dataset is published under the CC BY 4.0 license. This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SDC-Scissor tool for Cost-effective Simulation-based Test Selection in Self-driving Cars Software
This dataset provides test cases for self-driving cars with the BeamNG simulator. Check out the repository and demo video to get started.
GitHub: github.com/ChristianBirchler/sdc-scissor
This project extends the tool competition platform from the Cyber-Phisical Systems Testing Competition which was part of the SBST Workshop in 2021.
Usage
Demo
Installation
The tool can either be run with Docker or locally using Poetry.
When running the simulations a working installation of BeamNG.research is required. Additionally, this simulation cannot be run in a Docker container but must run locally.
To install the application use one of the following approaches:
docker build --tag sdc-scissor .
poetry install
Using the Tool
The tool can be used with the following two commands:
docker run --volume "$(pwd)/results:/out" --rm sdc-scissor [COMMAND] [OPTIONS]
(this will write all files written to /out
to the local folder results
)poetry run python sdc-scissor.py [COMMAND] [OPTIONS]
There are multiple commands to use. For simplifying the documentation only the command and their options are described.
generate-tests --out-path /path/to/store/tests
label-tests --road-scenarios /path/to/tests --result-folder /path/to/store/labeled/tests
evaluate-models --dataset /path/to/train/set --save
split-train-test-data --scenarios /path/to/scenarios --train-dir /path/for/train/data --test-dir /path/for/test/data --train-ratio 0.8
predict-tests --scenarios /path/to/scenarios --classifier /path/to/model.joblib
evaluate --scenarios /path/to/test/scenarios --classifier /path/to/model.joblib
The possible parameters are always documented with --help
.
Linting
The tool is verified the linters flake8 and pylint. These are automatically enabled in Visual Studio Code and can be run manually with the following commands:
poetry run flake8 . poetry run pylint **/*.py
License
The software we developed is distributed under GNU GPL license. See the LICENSE.md file.
Contacts
Christian Birchler - Zurich University of Applied Science (ZHAW), Switzerland - birc@zhaw.ch
Nicolas Ganz - Zurich University of Applied Science (ZHAW), Switzerland - gann@zhaw.ch
Sajad Khatiri - Zurich University of Applied Science (ZHAW), Switzerland - mazr@zhaw.ch
Dr. Alessio Gambi - Passau University, Germany - alessio.gambi@uni-passau.de
Dr. Sebastiano Panichella - Zurich University of Applied Science (ZHAW), Switzerland - panc@zhaw.ch
References
If you use this tool in your research, please cite the following papers:
@INPROCEEDINGS{Birchler2022,
author={Birchler, Christian and Ganz, Nicolas and Khatiri, Sajad and Gambi, Alessio, and Panichella, Sebastiano},
booktitle={2022 IEEE 29th International Conference on Software Analysis, Evolution and Reengineering (SANER),
title={Cost-effective Simulationbased Test Selection in Self-driving Cars Software with SDC-Scissor},
year={2022},
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data and code scripts used for the analysis in the paper entitled "The Impact of Traffic Lights on Modal Split and Route Choice: A use-case in Vienna", submitted to AGILE (Association of Geographic Information Laboratories in Europe) 2024 Conference.
It comprises three folders within the zip file:
Programming Language: Python
For reproducibility read the README.txt file included in the zip folder.
All data files are licensed under CC BY 4.0, all software is licensed under MIT License.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically