80 datasets found

m
Graphite//LFP synthetic training prognosis dataset
data.mendeley.com
Updated May 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthieu Dubarry (2020). Graphite//LFP synthetic training prognosis dataset [Dataset]. http://doi.org/10.17632/6s6ph9n8zg.1
Explore at:
Unique identifier
https://doi.org/10.17632/6s6ph9n8zg.1
Dataset updated
May 6, 2020
Authors
Matthieu Dubarry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This training dataset was calculated using the mechanistic modeling approach. See the “Benchmark Synthetic Training Data for Artificial Intelligence-based Li-ion Diagnosis and Prognosis“ publication for mode details. More details will be added when published. The prognosis dataset was harder to define as there are no limits on how the three degradation modes can evolve. For this proof of concept work, we considered eight parameters to scan. For each degradation mode, degradation was chosen to follow equation (1).

%degradation=a × cycle+ (exp^(b×cycle)-1) (1)

Considering the three degradation modes, this accounts for six parameters to scan. In addition, two other parameters were added, a delay for the exponential factor for LLI, and a parameter for the reversibility of lithium plating. The delay was introduced to reflect degradation paths where plating cannot be explained by an increase of LAMs or resistance [55]. The chosen parameters and their values are summarized in Table S1 and their evolution is represented in Figure S1. Figure S1(a,b) presents the evolution of parameters p1 to p7. At the worst, the cells endured 100% of one of the degradation modes in around 1,500 cycles. Minimal LLI was chosen to be 20% after 3,000 cycles. This is to guarantee at least 20% capacity loss for all the simulations. For the LAMs, conditions were less restrictive, and, after 3,000 cycles, the lowest degradation is of 3%. The reversibility factor p8 was calculated with equation (2) when LAMNE > PT.

%LLI=%LLI+p8 (LAM_PE-PT) (2)

Where PT was calculated with equation (3) from [60].

PT=100-((100-LAMPE)/(100×LRini-LAMPE ))×(100-OFSini-LLI) (3)

Varying all those parameters accounted for more than 130,000 individual duty cycles. With one voltage curve for every 100 cycles. 6 MATLAB© .mat files are included: The GIC-LFP_duty_other.mat file contains 12 variables Qnorm: normalize capacity scale for all voltage curves

P1 to p8: values used to generate the duty cycles

Key: index for which values were used for each degradation paths. 1 -p1, … 8 - p8

QL: capacity loss, one line per path, one column per 100 cycles.

File GIC-LFP_duty_LLI-LAMsvalues.mat contains the values for LLI, LAMPE and LAMNE for all cycles (1line per 100 cycles) and duty cycles (columns).

Files GIC-LFP_duty_1 to _4 files contains the voltage data split into 1GB chunks (40,000 simulations). Each cell corresponds to 1 line in the key variable. Inside each cell, one colunm per 100 cycles.
A geometric shape regularity effect in the human brain: fMRI dataset
openneuro.org
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathias Sablé-Meyer; Lucas Benjamin; Cassandra Potier Watkins; Chenxi He; Maxence Pajot; Théo Morfoisse; Fosca Al Roumi; Stanislas Dehaene (2025). A geometric shape regularity effect in the human brain: fMRI dataset [Dataset]. http://doi.org/10.18112/openneuro.ds006010.v1.0.1
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds006010.v1.0.1
Dataset updated
Mar 14, 2025
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Mathias Sablé-Meyer; Lucas Benjamin; Cassandra Potier Watkins; Chenxi He; Maxence Pajot; Théo Morfoisse; Fosca Al Roumi; Stanislas Dehaene
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A geometric shape regularity effect in the human brain: fMRI dataset

Authors:

Mathias Sablé-Meyer*

Lucas Benjamin

Cassandra Potier Watkins

Chenxi He

Maxence Pajot

Théo Morfoisse

Fosca Al Roumi

Stanislas Dehaene

*Corresponding author: mathias.sable-meyer@ucl.ac.uk

Abstract

The perception and production of regular geometric shapes is a characteristic trait of human cultures since prehistory, whose neural mechanisms are unknown. Behavioral studies suggest that humans are attuned to discrete regularities such as symmetries and parallelism, and rely on their combinations to encode regular geometric shapes in a compressed form. To identify the relevant brain systems and their dynamics, we collected functional MRI and magnetoencephalography data in both adults and six-year-olds during the perception of simple shapes such as hexagons, triangles and quadrilaterals. The results revealed that geometric shapes, relative to other visual categories, induce a hypoactivation of ventral visual areas and an overactivation of the intraparietal and inferior temporal regions also involved in mathematical processing, whose activation is modulated by geometric regularity. While convolutional neural networks captured the early visual activity evoked by geometric shapes, they failed to account for subsequent dorsal parietal and prefrontal signals, which could only be captured by discrete geometric features or by more advanced transformer models of vision. We propose that the perception of abstract geometric regularities engages an additional symbolic mode of visual perception.

Notes about this dataset

We separately share the MEG dataset at https://openneuro.org/datasets/ds006012. Below are some notes about the fMRI dataset of N=20 adult participants (sub-2xx, numbers between 204 and 223), and N=22 children (sub-3xx, numbers between 301 and 325).

The code for the analyses is provided at https://github.com/mathias-sm/AGeometricShapeRegularityEffectHumanBrain
However, the analyses work from already preprocessed data. Since there is no custom code per se for the preprocessing, I have not included it in the repository. To preprocess the data as was done in the published article, here is the command and software information:

fMRIPrep version: 20.0.5

fMRIPrep command: /usr/local/miniconda/bin/fmriprep /data /out participant --participant-label <label> --output-spaces MNI152NLin6Asym:res-2 MNI152NLin2009cAsym:res-2

Defacing has been performed with bidsonym running the pydeface masking, and nobrainer brain registraction pipeline.
The published analyses have been performed on the non-defaced data. I have checked for data quality on all participants after defacing. In specific cases, I may be able to request the permission to share the original, non-defaced dataset.

sub-325 was acquired by a different experimenter and defaced before being shared with the rest of the research team, hence why the slightly different defacing mask. That participant was also preprocessed separately, and using a more recent fMRIPrep version: 20.2.6.

The data associated with the children has a few missing files. Notably:

sub-313 and sub-316 are missing one run of the localizer each

sub-316 has no data at all for the geometry

sub-308 has eno useable data for the intruder task Since all of these still have some data to contribute to either task, all available files were kept on this dataset. The analysis code reflects these inconsistencies where required with specific exceptions.
Zero Modes and Classification of Combinatorial Metamaterials
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais (2022). Zero Modes and Classification of Combinatorial Metamaterials [Dataset]. http://doi.org/10.5281/zenodo.7070963
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7070963
Dataset updated
Nov 8, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the simulation data of the combinatorial metamaterial as used for the paper 'Machine Learning of Implicit Combinatorial Rules in Mechanical Metamaterials', as published in Physical Review Letters.

In this paper, the data is used to classify each \(k \times k\) unit cell design into one of two classes (C or I) based on the scaling (linear or constant) of the number of zero modes \(M_k(n)\) for metamaterials consisting of an \(n\times n\) tiling of the corresponding unit cell. Additionally, a random walk through the design space starting from class C unit cells was performed to characterize the boundary between class C and I in design space. A more detailed description of the contents of the dataset follows below.

Modescaling_raw_data.zip

This file contains uniformly sampled unit cell designs for metamaterial M2 and \(M_k(n)\) for \(1\leq n\leq 4\), which was used to classify the unit cell designs for the data set. There is a small subset of designs for \(k=\{3, 4, 5\}\) that do not neatly fall into the class C and I classification, and instead require additional simulation for \(4 \leq n \leq 6\) before either saturating to a constant number of zero modes (class I) or linearly increasing (class C). This file contains the simulation data of size \(3 \leq k \leq 8\) unit cells. The data is organized as follows.

Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4.npy", and contain a [Nsim, 1+k*k+4] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

col 0: label number to keep track

col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.

col k*k+1 - k*k+5: number of zero modes \(M_k(n)\) in ascending order of \(n\), so: \(\{M_k(1), M_k(2), M_k(3), M_k(4)\}\).

Note: the unit cell design uses the numbers \(\{0, 1, 2, 3\}\) to refer to each building block orientation. The building block orientations can be characterized through the orientation of the missing diagonal bar (see Fig. 2 in the paper), which can be Left Up (LU), Left Down (LD), Right Up (RU), or Right Down (RD). The numbers correspond to the building block orientation \(\{0, 1, 2, 3\} = \{\mathrm{LU, RU, RD, LD}\}\).

Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 6\) for unit cells that cannot be classified as class C or I for \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4_classX_extend.npy", and contain a [Nsim, 1+k*k+6] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

col 0: label number to keep track

col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.

col k*k+1 - k*k+5: number of zero modes \(M_k(n)\) in ascending order of \(n\), so: \(\{M_k(1), M_k(2), M_k(3), M_k(4), M_k(5), M_k(6)\}\).

Simulation data for \(6 \leq k \leq 8\) unit cells are stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. Note that the number of modes is now calculated for \(n_x \times n_y\) metamaterials, where we calculate \((n_x, n_y) = \{(1,1), (2, 2), (3, 2), (4,2), (2, 3), (2, 4)\}\) rather than \(n_x=n_y=n\) to save computation time. These files are named "data_new_rrQR_i_n_Mx_My_n4_kxk(_extended).npy", and contain a [Nsim, 1+k*k+8] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

col 0: label number to keep track

col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.

col k*k+1 - k*k+9: number of zero modes \(M_k(n_x, n_y)\) in order: \(\{M_k(1, 1), M_k(2, 2), M_k(3, 2), M_k(4, 2), M_k(1, 1), M_k(2, 2), M_k(2, 3), M_k(2, 4)\}\).

Simulation data of metamaterial M1 for \(k_x \times k_y\) metamaterials are stored in compressed numpy array format (.npz) and can be loaded in Python with the Numpy package using the numpy.load command. These files are named "smiley_cube_x_y_\(k_x\)x\(k_y\).npz", which contain all possible metamaterial designs, and "smiley_cube_uniform_sample_x_y_\(k_x\)x\(k_y\).npz", which contain uniformly sampled metamaterial designs. The configurations are accessed with the keyword argument 'configs'. The classification is accessed with the keyword argument 'compatible'. The configurations array is of shape [Nsim, \(k_x\), \(k_y\)], the classification array is of shape [Nsim]. The building blocks in the configuration are denoted by 0 or 1, which correspond to the red/green and white/dashed building blocks respectively. Classification is 0 or 1, which corresponds to I and C respectively.

Modescaling_classification_results.zip

This file contains the classification, slope, and offset of the scaling of the number of zero modes \(M_k(n)\) for the unit cells of metamaterial M2 in Modescaling_raw_data.zip. The data is organized as follows.

The results for \(3 \leq k \leq 5\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

col 0: label number to keep track

col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 4\))

col 2: slope from \(n \geq 2\) onward (undefined for class X)

col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)

col 4: \(M_k(1)\)

The results for \(3 \leq k \leq 5\) based on the extended \(1 \leq n \leq 6\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4_classC_extend.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

col 0: label number to keep track

col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 6\))

col 2: slope from \(n \geq 2\) onward (undefined for class X)

col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)

col 4: \(M_k(1)\)

The results for \(6 \leq k \leq 8\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scenx_Sceny_slopex_slopey_offsetx_offsety_M1k_kxk(_extended).txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

col 0: label number to keep track

col 1: the class_x based on \(M_k(n_x, 2)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_x \leq 4\))

col 2: the class_y based on \(M_k(2, n_y)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_y \leq 4\))

col 3: slope_x from \(n_x \geq 2\) onward (undefined for class X)

col 4: slope_y from \(n_y \geq 2\) onward (undefined for class X)

col 5: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_x}\)

col 6: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_y}\)

col 7: (M_k(1,
Data from: FISBe: A real-world benchmark dataset for instance segmentation...
zenodo.org
data.niaid.nih.gov
bin, json +3
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisa Mais; Lisa Mais; Peter Hirsch; Peter Hirsch; Claire Managan; Claire Managan; Ramya Kandarpa; Josef Lorenz Rumberger; Josef Lorenz Rumberger; Annika Reinke; Annika Reinke; Lena Maier-Hein; Lena Maier-Hein; Gudrun Ihrke; Gudrun Ihrke; Dagmar Kainmueller; Dagmar Kainmueller; Ramya Kandarpa (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. http://doi.org/10.5281/zenodo.10875063
Explore at:
zip, text/x-python, bin, json, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10875063
Dataset updated
Apr 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lisa Mais; Lisa Mais; Peter Hirsch; Peter Hirsch; Claire Managan; Claire Managan; Ramya Kandarpa; Josef Lorenz Rumberger; Josef Lorenz Rumberger; Annika Reinke; Annika Reinke; Lena Maier-Hein; Lena Maier-Hein; Gudrun Ihrke; Gudrun Ihrke; Dagmar Kainmueller; Dagmar Kainmueller; Ramya Kandarpa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 26, 2024
Description
General

For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

Summary

A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

30 completely labeled (segmented) images

71 partly labeled images

altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

A set of metrics and a novel ranking score for respective meaningful method benchmarking

An evaluation of three baseline methods in terms of the above metrics and score

Abstract

Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

Dataset documentation:

We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

>> FISBe Datasheet

Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

Files

fisbe_v1.0_{completely,partly}.zip

contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

fisbe_v1.0_mips.zip

maximum intensity projections of all samples, for convenience.

sample_list_per_split.txt

a simple list of all samples and the subset they are in, for convenience.

view_data.py

a simple python script to visualize samples, see below for more information on how to use it.

dim_neurons_val_and_test_sets.json

a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

Readme.md

general information

How to work with the image files

Each sample consists of a single 3d MCFO image of neurons of the fruit fly.
For each image, we provide a pixel-wise instance segmentation for all separable neurons.
Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").
The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.
The segmentation mask for each neuron is stored in a separate channel.
The order of dimensions is CZYX.

We recommend to work in a virtual environment, e.g., by using conda:

conda create -y -n flylight-env -c conda-forge python=3.9
conda activate flylight-env

How to open zarr files

Install the python zarr package:
pip install zarr

Opened a zarr file with:

import zarr
raw = zarr.open(
seg = zarr.open(

# optional:
import numpy as np
raw_np = np.array(raw)

Zarr arrays are read lazily on-demand.
Many functions that expect numpy arrays also work with zarr arrays.
Optionally, the arrays can also explicitly be converted to numpy arrays.

How to view zarr image files

We recommend to use napari to view the image data.

Install napari:
pip install "napari[all]"

Save the following Python script:

import zarr, sys, napari

raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")
gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

viewer = napari.Viewer(ndisplay=3)
for idx, gt in enumerate(gts):
viewer.add_labels(
gt, rendering='translucent', blending='additive', name=f'gt_{idx}')
viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')
viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')
viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')
napari.run()

Execute:
python view_data.py

Metrics

S: Average of avF1 and C

avF1: Average F1 Score

C: Average ground truth coverage

clDice_TP: Average true positives clDice

FS: Number of false splits

FM: Number of false merges

tp: Relative number of true positives

For more information on our selected metrics and formal definitions please see our paper.

Baseline

To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..
For detailed information on the methods and the quantitative results please see our paper.

License

The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Citation

If you use FISBe in your research, please use the following BibTeX entry:

@misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

Acknowledgments

We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuable
discussions.
P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.
This work was co-funded by Helmholtz Imaging.

Changelog

There have been no changes to the dataset so far.
All future change will be listed on the changelog page.

Contributing

If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

All contributions are welcome!
Explainable AI (XAI) Drilling Dataset
kaggle.com
Updated Aug 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raphael Wallsberger (2023). Explainable AI (XAI) Drilling Dataset [Dataset]. https://www.kaggle.com/datasets/raphaelwallsberger/xai-drilling-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Raphael Wallsberger
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032

This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:

ID: Every data point in the dataset is uniquely identifiable, thanks to the ID feature. This ensures traceability and easy referencing, especially when analyzing specific drilling scenarios or anomalies.

Cutting speed vc (m/min): The cutting speed is a pivotal parameter in drilling, influencing the efficiency and quality of the drilling process. It represents the speed at which the drill bit's cutting edge moves through the material.

Spindle speed n (1/min): This feature captures the rotational speed of the spindle or drill bit, respectively.

Feed f (mm/rev): Feed denotes the depth the drill bit penetrates into the material with each revolution. There is a balance between speed and precision, with higher feeds leading to faster drilling but potentially compromising hole quality.

Feed rate vf (mm/min): The feed rate is a measure of how quickly the material is fed to the drill bit. It is a determinant of the overall drilling time and influences the heat generated during the process.

Power Pc (kW): The power consumption during drilling can be indicative of the efficiency of the process and the wear state of the drill bit.

Cooling (%): Effective cooling is paramount in drilling, preventing overheating and reducing wear. This ordinal feature captures the cooling level applied, with four distinct states representing no cooling (0%), partial cooling (25% and 50%), and high to full cooling (75% and 100%).

Material: The type of material being drilled can significantly influence the drilling parameters and outcomes. This dataset encompasses three primary materials: C45K hot-rolled heat-treatable steel (EN 1.0503), cast iron GJL (EN GJL-250), and aluminum-silicon (AlSi) alloy (EN AC-42000), each presenting its unique challenges and considerations. The three materials are represented as “P (Steel)” for C45K, “K (Cast Iron)” for cast iron GJL and “N (Non-ferrous metal)” for AlSi alloy.

Drill Bit Type: Different materials often require specialized drill bits. This feature categorizes the type of drill bit used, ensuring compatibility with the material and optimizing the drilling process. It consists of three categories, which are based on the DIN 1836: “N” for C45K, “H” for cast iron and “W” for AlSi alloy [5].

Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.

Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.

Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.
Opal Tap On and Tap Off
data.nsw.gov.au
researchdata.edu.au
csv, pdf
Updated Feb 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Transport for NSW (2025). Opal Tap On and Tap Off [Dataset]. https://www.data.nsw.gov.au/data/dataset/2-opal-tap-on-and-tap-off
Explore at:
csv, pdfAvailable download formats
Dataset updated
Feb 4, 2025
Dataset provided by
Transport for NSWhttp://www.transport.nsw.gov.au/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides counts of tap ons and tap offs made on the Opal ticketing system during two non-consecutive weeks in 2016. The Opal tap on and tap off dataset contains six CSV files covering two weeks (14 days) of Opal data across the four public transport modes.

Privacy is the utmost priority for all Transport for NSW Open Data and there is no information that can identify any individual in the Open Opal Tap On and Tap Off data. This means that any data that is, or can be, linked to an individual’s Opal card has been removed.

This dataset is subject to specific terms and conditions

There are three CSV files per week, and these provide a privacy-protected count of taps against:

Time – binned to 15 minutes by tap (tap on or tap off), by date and by mode

Location– by tap (tap on or tap off), by date and by mode

Time with location – binned to 15 minutes, by tap (tap on or tap off), by date and by mode

The tap on and tap off counts are not linked and individual trips cannot be derived using the data.

The two weeks of Opal data are:

25 July to 31 July 2016 (before paper ticket retirement – paper ticket data is not included in the dataset)

8 August to 14 August 2016 (after paper ticket retirement).
e
Dataset for: Let’s stay in touch: Frequency (but not mode) of interaction...
b2find.eudat.eu
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
Dataset updated
Dec 6, 2022
Description
Successful leadership requires leaders to make their followers aware of expectations regarding the goals to achieve, norms to follow, and task responsibilities to take over. This awareness is often achieved through leader-follower communication. In times of economic globalization and digitalization, however, leader-follower communication has become both more digitalized (virtual, rather than face-to-face) and less frequent, making successful leader-follower-communication more challenging. The current research tested in four studies (three preregistered) whether digitalization and frequency of interaction predict task-related leadership success. In one cross-sectional (Study 1, N=200), one longitudinal (Study 2, N=305), and one quasi-experimental study (Study 3, N=178), as predicted, a higher frequency (but not a lower level of digitalization) of leader-follower interactions predicted better task-related leadership outcomes (i.e., stronger goal clarity, norm clarity, and task responsibility among followers). Via mediation and a causal chain approach, Study 3 and Study 4 (N=261) further targeted the mechanism; results showed that the relationship between (higher) interaction frequency and these outcomes is due to followers perceiving more opportunities to share work-related information with the leaders. These results improve our understanding of contextual factors contributing to leadership success in collaborations across hierarchies. They highlight that it is not the digitalization but rather the frequency of interacting with their leader that predicts whether followers gain clarity about the relevant goals and norms to follow and the task responsibilities to assume.
C
Commuter Mode Share
data.ccrpc.org
csv
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Champaign County Regional Planning Commission (2024). Commuter Mode Share [Dataset]. https://data.ccrpc.org/bg/dataset/commuter-mode-share
Explore at:
csvAvailable download formats
Dataset updated
Oct 2, 2024
Dataset authored and provided by
Champaign County Regional Planning Commission
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
This commuter mode share data shows the estimated percentages of commuters in Champaign County who traveled to work using each of the following modes: drove alone in an automobile; carpooled; took public transportation; walked; biked; went by motorcycle, taxi, or other means; and worked at home. Commuter mode share data can illustrate the use of and demand for transit services and active transportation facilities, as well as for automobile-focused transportation projects.

Driving alone in an automobile is by far the most prevalent means of getting to work in Champaign County, accounting for over 69 percent of all work trips in 2023. This is the same rate as 2019, and the first increase since 2017, both years being before the COVID-19 pandemic began.

The percentage of workers who commuted by all other means to a workplace outside the home also decreased from 2019 to 2021, most of these modes reaching a record low since this data first started being tracked in 2005. The percentage of people carpooling to work in 2023 was lower than every year except 2016 since this data first started being tracked in 2005. The percentage of people walking to work increased from 2022 to 2023, but this increase is not statistically significant.

Meanwhile, the percentage of people in Champaign County who worked at home more than quadrupled from 2019 to 2021, reaching a record high over 18 percent. It is a safe assumption that this can be attributed to the increase of employers allowing employees to work at home when the COVID-19 pandemic began in 2020.

The work from home figure decreased to 11.2 percent in 2023, but which is the first statistically significant decrease since the pandemic began. However, this figure is still about 2.5 times higher than 2019, even with the COVID-19 emergency ending in 2023.

Commuter mode share data was sourced from the U.S. Census Bureau’s American Community Survey (ACS) 1-Year Estimates, which are released annually.

As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.

Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data in 2020. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.

For interested data users, the 2020 ACS 1-Year Experimental data release includes a dataset on Means of Transportation to Work.

Sources: U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (18 September 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (10 October 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (14 October 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (26 March 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using data.census.gov; (26 March 2021).; U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (14 September 2017).; U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table S0801; generated by CCRPC staff; using American FactFinder; (16 March 2016).
Z
Estimated stand-off distance between ADS-B equipped aircraft and obstacles
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weinert, Andrew (2024). Estimated stand-off distance between ADS-B equipped aircraft and obstacles [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7741272
Explore at:
Dataset updated
Jul 12, 2024
Dataset authored and provided by
Weinert, Andrew
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Summary:

Estimated stand-off distance between ADS-B equipped aircraft and obstacles. Obstacle information was sourced from the FAA Digital Obstacle File and the FHWA National Bridge Inventory. Aircraft tracks were sourced from processed data curated from the OpenSky Network. Results are presented as histograms organized by aircraft type and distance away from runways.

Description:

For many aviation safety studies, aircraft behavior is represented using encounter models, which are statistical models of how aircraft behave during close encounters. They are used to provide a realistic representation of the range of encounter flight dynamics where an aircraft collision avoidance system would be likely to alert. These models currently and have historically have been limited to interactions between aircraft; they have not represented the specific interactions between obstacles and aircraft equipped transponders. In response, we calculated the standoff distance between obstacles and ADS-B equipped manned aircraft.

For robustness, this assessment considered two different datasets of manned aircraft tracks and two datasets of obstacles. For robustness, MIT LL calculated the standoff distance using two different datasets of aircraft tracks and two datasets of obstacles. This approach aligned with the foundational research used to support the ASTM F3442/F3442M-20 well clear criteria of 2000 feet laterally and 250 feet AGL vertically.

The two datasets of processed tracks of ADS-B equipped aircraft curated from the OpenSky Network. It is likely that rotorcraft were underrepresented in these datasets. There were also no considerations for aircraft equipped only with Mode C or not equipped with any transponders. The first dataset was used to train the v1.3 uncorrelated encounter models and referred to as the “Monday” dataset. The second dataset is referred to as the “aerodrome” dataset and was used to train the v2.0 and v3.x terminal encounter model. The Monday dataset consisted of 104 Mondays across North America. The other dataset was based on observations at least 8 nautical miles within Class B, C, D aerodromes in the United States for the first 14 days of each month from January 2019 through February 2020. Prior to any processing, the datasets required 714 and 847 Gigabytes of storage. For more details on these datasets, please refer to "Correlated Bayesian Model of Aircraft Encounters in the Terminal Area Given a Straight Takeoff or Landing" and “Benchmarking the Processing of Aircraft Tracks with Triples Mode and Self-Scheduling.”

Two different datasets of obstacles were also considered. First was point obstacles defined by the FAA digital obstacle file (DOF) and consisted of point obstacle structures of antenna, lighthouse, meteorological tower (met), monument, sign, silo, spire (steeple), stack (chimney; industrial smokestack), transmission line tower (t-l tower), tank (water; fuel), tramway, utility pole (telephone pole, or pole of similar height, supporting wires), windmill (wind turbine), and windsock. Each obstacle was represented by a cylinder with the height reported by the DOF and a radius based on the report horizontal accuracy. We did not consider the actual width and height of the structure itself. Additionally, we only considered obstacles at least 50 feet tall and marked as verified in the DOF.

The other obstacle dataset, termed as “bridges,” was based on the identified bridges in the FAA DOF and additional information provided by the National Bridge Inventory. Due to the potential size and extent of bridges, it would not be appropriate to model them as point obstacles; however, the FAA DOF only provides a point location and no information about the size of the bridge. In response, we correlated the FAA DOF with the National Bridge Inventory, which provides information about the length of many bridges. Instead of sizing the simulated bridge based on horizontal accuracy, like with the point obstacles, the bridges were represented as circles with a radius of the longest, nearest bridge from the NBI. A circle representation was required because neither the FAA DOF or NBI provided sufficient information about orientation to represent bridges as rectangular cuboid. Similar to the point obstacles, the height of the obstacle was based on the height reported by the FAA DOF. Accordingly, the analysis using the bridge dataset should be viewed as risk averse and conservative. It is possible that a manned aircraft was hundreds of feet away from an obstacle in actuality but the estimated standoff distance could be significantly less. Additionally, all obstacles are represented with a fixed height, the potentially flat and low level entrances of the bridge are assumed to have the same height as the tall bridge towers. The attached figure illustrates an example simulated bridge.

It would had been extremely computational inefficient to calculate the standoff distance for all possible track points. Instead, we define an encounter between an aircraft and obstacle as when an aircraft flying 3069 feet AGL or less comes within 3000 feet laterally of any obstacle in a 60 second time interval. If the criteria were satisfied, then for that 60 second track segment we calculate the standoff distance to all nearby obstacles. Vertical separation was based on the MSL altitude of the track and the maximum MSL height of an obstacle.

For each combination of aircraft track and obstacle datasets, the results were organized seven different ways. Filtering criteria were based on aircraft type and distance away from runways. Runway data was sourced from the FAA runways of the United States, Puerto Rico, and Virgin Islands open dataset. Aircraft type was identified as part of the em-processing-opensky workflow.

All: No filter, all observations that satisfied encounter conditions

nearRunway: Aircraft within or at 2 nautical miles of a runway

awayRunway: Observations more than 2 nautical miles from a runway

glider: Observations when aircraft type is a glider

fwme: Observations when aircraft type is a fixed-wing multi-engine

fwse: Observations when aircraft type is a fixed-wing single engine

rotorcraft: Observations when aircraft type is a rotorcraft

License

This dataset is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International(CC BY-NC-ND 4.0).

This license requires that reusers give credit to the creator. It allows reusers to copy and distribute the material in any medium or format in unadapted form and for noncommercial purposes only. Only noncommercial use of your work is permitted. Noncommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Exceptions are given for the not for profit standards organizations of ASTM International and RTCA.

MIT is releasing this dataset in good faith to promote open and transparent research of the low altitude airspace. Given the limitations of the dataset and a need for more research, a more restrictive license was warranted. Namely it is based only on only observations of ADS-B equipped aircraft, which not all aircraft in the airspace are required to employ; and observations were source from a crowdsourced network whose surveillance coverage has not been robustly characterized.

As more research is conducted and the low altitude airspace is further characterized or regulated, it is expected that a future version of this dataset may have a more permissive license.

Distribution Statement

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

© 2021 Massachusetts Institute of Technology.

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

This material is based upon work supported by the Federal Aviation Administration under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Federal Aviation Administration.

This document is derived from work done for the FAA (and possibly others); it is not the direct product of work done for the FAA. The information provided herein may include content supplied by third parties. Although the data and information contained herein has been produced or processed from sources believed to be reliable, the Federal Aviation Administration makes no warranty, expressed or implied, regarding the accuracy, adequacy, completeness, legality, reliability or usefulness of any information, conclusions or recommendations provided herein. Distribution of the information contained herein does not constitute an endorsement or warranty of the data or information provided herein by the Federal Aviation Administration or the U.S. Department of Transportation. Neither the Federal Aviation Administration nor the U.S. Department of Transportation shall be held liable for any improper or incorrect use of the information contained herein and assumes no responsibility for anyone’s use of the information. The Federal Aviation Administration and U.S. Department of Transportation shall not be liable for any claim for any loss, harm, or other damages arising from access to or use of data or information, including without limitation any direct, indirect, incidental, exemplary, special or consequential damages, even if advised of the possibility of such damages. The Federal Aviation Administration shall not be liable to anyone for any decision made or action taken, or not taken, in reliance on the information contained
n
RapidScat Level 2B Climate Ocean Wind Vectors in 12.5km Footprints
podaac.jpl.nasa.gov
cloud.csiss.gmu.edu
+5more
html
Updated May 6, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PO.DAAC (2016). RapidScat Level 2B Climate Ocean Wind Vectors in 12.5km Footprints [Dataset]. http://doi.org/10.5067/RSX12-L2C11
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.5067/RSX12-L2C11
Dataset updated
May 6, 2016
Dataset provided by
PO.DAAC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
SURFACE WINDS
Description
This dataset contains the RapidScat Level 2B 12.5km Version 1.0 Climate quality ocean surface wind vectors. The Level 2B wind vectors are binned on a 12.5 km Wind Vector Cell (WVC) grid and processed using the using the "full aperture" normalized radar cross-section (NRCS, a.k.a. Sigma-0) from the L1B dataset. RapidScat is a Ku-band dual beam circular rotating scatterometer retaining much of the same hardware and functionality of QuikSCAT, with exception of the antenna sub-system and digital interface to the International Space Station (ISS) Columbus module, which is where RapidScat is mounted. The NASA mission is officially referred to as ISS-RapidScat. Unlike QuikSCAT, ISS-RapidScat is not in sun-synchronous orbit, and flies at roughly half the altitude with a low inclination angle that restricts data coverage to the tropics and mid-latitude regions; the extent of latitudinal coverage stretches from approximately 61 degrees North to 61 degrees South. Furthermore, there is no consistent local time of day retrieval. This dataset is provided in a netCDF-3 file format that follows the netCDF-4 classic model (i.e., generated by the netCDF-4 API) and made available via Direct Download and OPeNDAP. For data access, please click on the "Data Access" tab above. This climate quality data set differs from the nominal "slice" L2B dataset as follows: 1) it uses full antenna footprint measurements (~20-km) without subdividing by range (~7-km) and 2) the absolute calibration has been modified for the two different low signal-to-noise ratio (SNR) mode data sets: LowSNR1 14 August 2015 to 18 September 2015; LowSNR2 6 October 2015 to 7 February 2016. The above enhancements allow this dataset to provide consistent calibration across all SNR states. Low SNR periods and other key quality control (QC) issues are tracked and kept up-to-date in PO.DAAC Drive at https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-docs/rapidscat/open/L1B/docs/revtime.csv. If you have any questions, please visit our user forums: https://podaac.jpl.nasa.gov/forum/.
B
Brazil Visitor Arrivals: Marine: North America: Canada
ceicdata.com
Updated Jul 20, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2018). Brazil Visitor Arrivals: Marine: North America: Canada [Dataset]. https://www.ceicdata.com/en/brazil/no-of-visitors-arrivals-by-mode-of-transport
Explore at:
Dataset updated
Jul 20, 2018
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2018 - Dec 1, 2018
Area covered
Brazil
Variables measured
Tourism Statistics
Description
Visitor Arrivals: Marine: North America: Canada data was reported at 0.503 Person th in Dec 2018. This records an increase from the previous number of 0.275 Person th for Nov 2018. Visitor Arrivals: Marine: North America: Canada data is updated monthly, averaging 0.038 Person th from Jan 1989 (Median) to Dec 2018, with 360 observations. The data reached an all-time high of 1.024 Person th in Feb 2009 and a record low of 0.000 Person th in Oct 2018. Visitor Arrivals: Marine: North America: Canada data remains active status in CEIC and is reported by Ministry of Tourism. The data is categorized under Brazil Premium Database’s Tourism Sector – Table BR.QB003: No of Visitors Arrivals: by Mode of Transport. According Ministry of Tourism, the monthly Visitor Arrivals will release in annual basis due to Tourism Ministry receive once in the year some input data from the Federal Policy, based on these data they perform the estimation for all the months in the year
Z
Wind WAVES TDSF Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wilson III, Lynn B (2024). Wind WAVES TDSF Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3911204
Explore at:
Dataset updated
Jul 10, 2024
Dataset authored and provided by
Wilson III, Lynn B
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wind Spacecraft:

The Wind spacecraft (https://wind.nasa.gov) was launched on November 1, 1994 and currently orbits the first Lagrange point between the Earth and sun. A comprehensive review can be found in Wilson et al. [2021]. It holds a suite of instruments from gamma ray detectors to quasi-static magnetic field instruments, Bo. The instruments used for this data product are the fluxgate magnetometer (MFI) [Lepping et al., 1995] and the radio receivers (WAVES) [Bougeret et al., 1995]. The MFI measures 3-vector Bo at ~11 samples per second (sps); WAVES observes electromagnetic radiation from ~4 kHz to >12 MHz which provides an observation of the upper hybrid line (also called the plasma line) used to define the total electron density and also takes time series snapshot/waveform captures of electric and magnetic field fluctuations, called TDS bursts herein.

WAVES Instrument:

The WAVES experiment [Bougeret et al., 1995] on the Wind spacecraft is composed of three orthogonal electric field antenna and three orthogonal search coil magnetometers. The electric fields are measured through five different receivers: Low Frequency FFT receiver called FFT (0.3 Hz to 11 kHz), Thermal Noise Receiver called TNR (4-256 kHz), Radio receiver band 1 called RAD1 (20-1040 kHz), Radio receiver band 2 called RAD2 (1.075-13.825 MHz), and the Time Domain Sampler (TDS). The electric field antenna are dipole antennas with two orthogonal antennas in the spin plane and one spin axis stacer antenna.

The TDS receiver allows one to examine the electromagnetic waves observed by Wind as time series waveform captures. There are two modes of operation, TDS Fast (TDSF) and TDS Slow (TDSS). TDSF returns 2048 data points for two channels of the electric field, typically Ex and Ey (i.e. spin plane components), with little to no gain below ~120 Hz (the data herein has been high pass filtered above ~150 Hz for this reason). TDSS returns four channels with three electric(magnetic) field components and one magnetic(electric) component. The search coils show a gain roll off ~3.3 Hz [e.g., see Wilson et al., 2010; Wilson et al., 2012; Wilson et al., 2013 and references therein for more details].

The original calibration of the electric field antenna found that the effective antenna lengths are roughly 41.1 m, 3.79 m, and 2.17 m for the X, Y, and Z antenna, respectively. The +Ex antenna was broken twice during the mission as of June 26, 2020. The first break occurred on August 3, 2000 around ~21:00 UTC and the second on September 24, 2002 around ~23:00 UTC. These breaks reduced the effective antenna length of Ex from ~41 m to 27 m after the first break and ~25 m after the second break [e.g., see Malaspina et al., 2014; Malaspina & Wilson, 2016].

TDS Bursts:

TDS bursts are waveform captures/snapshots of electric and magnetic field data. The data is triggered by the largest amplitude waves which exceed a specific threshold and are then stored in a memory buffer. The bursts are ranked according to a quality filter which mostly depends upon amplitude. Due to the age of the spacecraft and ubiquity of large amplitude electromagnetic and electrostatic waves, the memory buffer often fills up before dumping onto the magnetic tape drive. If the memory buffer is full, then the bottom ranked TDS burst is erased every time a new TDS burst is sampled. That is, the newest TDS burst sampled by the instrument is always stored and if it ranks higher than any other in the list, it will be kept. This results in the bottom ranked burst always being erased. Earlier in the mission, there were also so called honesty bursts, which were taken periodically to test whether the triggers were working properly. It was found that the TDSF triggered properly, but not the TDSS. So the TDSS was set to trigger off of the Ex signals.

A TDS burst from the Wind/WAVES instrument is always 2048 time steps for each channel. The sample rate for TDSF bursts ranges from 1875 samples/second (sps) to 120,000 sps. Every TDS burst is marked a unique set of numbers (unique on any given date) to help distinguish it from others and to ensure any set of channels are appropriately connected to each other. For instance, during one spacecraft downlink interval there may be 95% of the TDS bursts with a complete set of channels (i.e., TDSF has two channels, TDSS has four) while the remaining 5% can be missing channels (just example numbers, not quantitatively accurate). During another downlink interval, those missing channels may be returned if they are not overwritten. During every downlink, the flight operations team at NASA Goddard Space Fligth Center (GSFC) generate level zero binary files from the raw telemetry data. Those files are filled with data received on that date and the file name is labeled with that date. There is no attempt to sort chronologically the data within so any given level zero file can have data from multiple dates within. Thus, it is often necessary to load upwards of five days of level zero files to find as many full channel sets as possible. The remaining unmatched channel sets comprise a much smaller fraction of the total.

All data provided here are from TDSF, so only two channels. Most of the time channel 1 will be associated with the Ex antenna and channel 2 with the Ey antenna. The data are provided in the spinning instrument coordinate basis with associated angles necessary to rotate into a physically meaningful basis (e.g., GSE).

TDS Time Stamps:

Each TDS burst is tagged with a time stamp called a spacecraft event time or SCET. The TDS datation time is sampled after the burst is acquired which requires a delay buffer. The datation time requires two corrections. The first correction arises from tagging the TDS datation with an associated spacecraft major frame in house keeping (HK) data. The second correction removes the delay buffer duration. Both inaccuracies are essentially artifacts of on ground derived values in the archives created by the WINDlib software (K. Goetz, Personal Communication, 2008) found at https://github.com/lynnbwilsoniii/Wind_Decom_Code.

The WAVES instrument's HK mode sends relevant low rate science back to ground once every spacecraft major frame. If multiple TDS bursts occur in the same major frame, it is possible for the WINDlib software to assign them the same SCETs. The reason being that this top-level SCET is only accurate to within +300 ms (in 120,000 sps mode) due to the issues described above (at lower sample rates, the error can be slightly larger). The time stamp uncertainty is a positive definite value because it results from digitization rounding errors. One can correct these issues to within +10 ms if using the proper HK data.

*** The data stored here have not corrected the SCETs! ***

The 300 ms uncertainty, due to the HK corrections mentioned above, results from WINDlib trying to recreate the time stamp after it has been telemetered back to ground. If a burst stays in the TDS buffer for extended periods of time (i.e., >2 days), the interpolation done by WINDlib can make mistakes in the 11th significant digit. The positive definite nature of this uncertainty is due to rounding errors associated with the onboard DPU (digital processing unit) clock rollover. The DPU clock is a 24 bit integer clock sampling at ∼50,018.8 Hz. The clock rolls over at ∼5366.691244092221 seconds, i.e., (16*224)/50,018.8. The sample rate is a temperature sensitive issue and thus subject to change over time. From a sample of 384 different points on 14 different days, a statistical estimate of the rollover time is 5366.691124061162 ± 0.000478370049 seconds (calculated by Lynn B. Wilson III, 2008). Note that the WAVES instrument team used UR8 times, which are the number of 86,400 second days from 1982-01-01/00:00:00.000 UTC.

The method to correct the SCETs to within +10 ms, were one to do so, is given as follows:

Retrieve the DPU clock times, SCETs, UR8 times, and DPU Major Frame Numbers from the WINDlib libraries on the VAX/ALPHA systems for the TDSS(F) data of interest.

Retrieve the same quantities from the HK data.

Match the HK event number with the same DPU Major Frame Number as the TDSS(F) burst of interest.

Find the difference in DPU clock times between the TDSS(F) burst of interest and the HK event with matching major frame number (Note: The TDSS(F) DPU clock time will always be greater than the HK DPU clock if they are the same DPU Major Frame Number and the DPU clock has not rolled over).

Convert the difference to a UR8 time and add this to the HK UR8 time. The new UR8 time is the corrected UR8 time to within +10 ms.

Find the difference between the new UR8 time and the UR8 time WINDlib associates with the TDSS(F) burst. Add the difference to the DPU clock time assigned by WINDlib to get the corrected DPU clock time (Note: watch for the DPU clock rollover).

Convert the new UR8 time to a SCET using either the IDL WINDlib libraries or TMLib (STEREO S/WAVES software) libraries of available functions. This new SCET is accurate to within +10 ms.

One can find a UR8 to UTC conversion routine at https://github.com/lynnbwilsoniii/wind_3dp_pros in the ~/LYNN_PRO/Wind_WAVES_routines/ folder.

Examples of good waveforms can be found in the notes PDF at https://wind.nasa.gov/docs/wind_waves.pdf.

Data Set Description

Each Zip file contains 300+ IDL save files; one for each day of the year with available data. This data set is not complete as the software used to retrieve and calibrate these TDS bursts did not have sufficient error handling to handle some of the more nuanced bit errors or major frame errors in some of the level zero files. There is currently (as of June 27, 2020) an effort (by Keith Goetz et al.) to generate the entire TDSF and TDSS data set in one repository to be put on SPDF/CDAWeb as CDF files. Once that data set is available, it will supercede
B
Brazil Visitor Arrivals: Air: Central America & Caribbean: Costa Rica
ceicdata.com
Updated Jul 20, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2018). Brazil Visitor Arrivals: Air: Central America & Caribbean: Costa Rica [Dataset]. https://www.ceicdata.com/en/brazil/no-of-visitors-arrivals-by-mode-of-transport
Explore at:
Dataset updated
Jul 20, 2018
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2018 - Dec 1, 2018
Area covered
Brazil
Variables measured
Tourism Statistics
Description
Visitor Arrivals: Air: Central America & Caribbean: Costa Rica data was reported at 0.920 Person th in Dec 2018. This records a decrease from the previous number of 0.925 Person th for Nov 2018. Visitor Arrivals: Air: Central America & Caribbean: Costa Rica data is updated monthly, averaging 0.433 Person th from Jan 1989 (Median) to Dec 2018, with 360 observations. The data reached an all-time high of 5.177 Person th in Jun 2014 and a record low of 0.033 Person th in Apr 1993. Visitor Arrivals: Air: Central America & Caribbean: Costa Rica data remains active status in CEIC and is reported by Ministry of Tourism. The data is categorized under Brazil Premium Database’s Tourism Sector – Table BR.QB003: No of Visitors Arrivals: by Mode of Transport. According Ministry of Tourism, the monthly Visitor Arrivals will release in annual basis due to Tourism Ministry receive once in the year some input data from the Federal Policy, based on these data they perform the estimation for all the months in the year
P
How to Login DuckDuckGo Account? | A Step-By-Step Guide Dataset
paperswithcode.com
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). How to Login DuckDuckGo Account? | A Step-By-Step Guide Dataset [Dataset]. https://paperswithcode.com/dataset/how-to-login-duckduckgo-account-a-step-by
Explore at:
Dataset updated
Jun 17, 2025
Description
For Login DuckDuckGo Please Visit: 👉 DuckDuckGo Login Account

In today’s digital age, privacy has become one of the most valued aspects of online activity. With increasing concerns over data tracking, surveillance, and targeted advertising, users are turning to privacy-first alternatives for everyday browsing. One of the most recognized names in private search is DuckDuckGo. Unlike mainstream search engines, DuckDuckGo emphasizes anonymity and transparency. However, many people wonder: Is there such a thing as a "https://duckduckgo-account.blogspot.com/ ">DuckDuckGo login account ?

In this comprehensive guide, we’ll explore everything you need to know about the DuckDuckGo login account, what it offers (or doesn’t), and how to get the most out of DuckDuckGo’s privacy features.

Does DuckDuckGo Offer a Login Account? To clarify up front: DuckDuckGo does not require or offer a traditional login account like Google or Yahoo. The concept of a DuckDuckGo login account is somewhat misleading if interpreted through the lens of typical internet services.

DuckDuckGo's entire business model is built around privacy. The company does not track users, store personal information, or create user profiles. As a result, there’s no need—or intention—to implement a system that asks users to log in. This stands in stark contrast to other search engines that rely on login-based ecosystems to collect and use personal data for targeted ads.

That said, some users still search for the term DuckDuckGo login account, usually because they’re trying to save settings, sync devices, or use features that may suggest a form of account system. Let’s break down what’s possible and what alternatives exist within DuckDuckGo’s platform.

Saving Settings Without a DuckDuckGo Login Account Even without a traditional DuckDuckGo login account, users can still save their preferences. DuckDuckGo provides two primary ways to retain search settings:

Local Storage (Cookies) When you customize your settings on the DuckDuckGo account homepage, such as theme, region, or safe search options, those preferences are stored in your browser’s local storage. As long as you don’t clear cookies or use incognito mode, these settings will persist.

Cloud Save Feature To cater to users who want to retain settings across multiple devices without a DuckDuckGo login account, DuckDuckGo offers a feature called "Cloud Save." Instead of creating an account with a username or password, you generate a passphrase or unique key. This key can be used to retrieve your saved settings on another device or browser.

While it’s not a conventional login system, it’s the closest DuckDuckGo comes to offering account-like functionality—without compromising privacy.

Why DuckDuckGo Avoids Login Accounts Understanding why there is no DuckDuckGo login account comes down to the company’s core mission: to offer a private, non-tracking search experience. Introducing login accounts would:

Require collecting some user data (e.g., email, password)

Introduce potential tracking mechanisms

Undermine their commitment to full anonymity

By avoiding a login system, DuckDuckGo keeps user trust intact and continues to deliver on its promise of complete privacy. For users who value anonymity, the absence of a DuckDuckGo login account is actually a feature, not a flaw.

DuckDuckGo and Device Syncing One of the most commonly searched reasons behind the term DuckDuckGo login account is the desire to sync settings or preferences across multiple devices. Although DuckDuckGo doesn’t use accounts, the Cloud Save feature mentioned earlier serves this purpose without compromising security or anonymity.

You simply export your settings using a unique passphrase on one device, then import them using the same phrase on another. This offers similar benefits to a synced account—without the need for usernames, passwords, or emails.

DuckDuckGo Privacy Tools Without a Login DuckDuckGo is more than just a search engine. It also offers a range of privacy tools—all without needing a DuckDuckGo login account:

DuckDuckGo Privacy Browser (Mobile): Available for iOS and Android, this browser includes tracking protection, forced HTTPS, and built-in private search.

DuckDuckGo Privacy Essentials (Desktop Extension): For Chrome, Firefox, and Edge, this extension blocks trackers, grades websites on privacy, and enhances encryption.

Email Protection: DuckDuckGo recently launched a service that allows users to create "@duck.com" email addresses that forward to their real email—removing trackers in the process. Users sign up for this using a token or limited identifier, but it still doesn’t constitute a full DuckDuckGo login account.

Is a DuckDuckGo Login Account Needed? For most users, the absence of a DuckDuckGo login account is not only acceptable—it’s ideal. You can:

Use the search engine privately

Customize and save settings

Sync preferences across devices

Block trackers and protect email

—all without an account.

While some people may find the lack of a traditional login unfamiliar at first, it quickly becomes a refreshing break from constant credential requests, data tracking, and login fatigue.

The Future of DuckDuckGo Accounts As of now, DuckDuckGo maintains its position against traditional account systems. However, it’s clear the company is exploring privacy-preserving ways to offer more user features—like Email Protection and Cloud Save. These features may continue to evolve, but the core commitment remains: no tracking, no personal data storage, and no typical DuckDuckGo login account.

Final Thoughts While the term DuckDuckGo login account is frequently searched, it represents a misunderstanding of how the platform operates . Unlike other tech companies that monetize personal data, DuckDuckGo has stayed true to its promise of privacy .
B
Brazil Visitor Arrivals: Marine: Central America & Caribbean: Cuba
ceicdata.com
Updated Jul 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2018). Brazil Visitor Arrivals: Marine: Central America & Caribbean: Cuba [Dataset]. https://www.ceicdata.com/en/brazil/no-of-visitors-arrivals-by-mode-of-transport
Explore at:
Dataset updated
Jul 20, 2018
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2018 - Dec 1, 2018
Area covered
Brazil
Variables measured
Tourism Statistics
Description
Visitor Arrivals: Marine: Central America & Caribbean: Cuba data was reported at 0.002 Person th in Dec 2018. This records an increase from the previous number of 0.000 Person th for Nov 2018. Visitor Arrivals: Marine: Central America & Caribbean: Cuba data is updated monthly, averaging 0.000 Person th from Jan 1989 (Median) to Dec 2018, with 360 observations. The data reached an all-time high of 0.035 Person th in Jan 2008 and a record low of 0.000 Person th in Nov 2018. Visitor Arrivals: Marine: Central America & Caribbean: Cuba data remains active status in CEIC and is reported by Ministry of Tourism. The data is categorized under Brazil Premium Database’s Tourism Sector – Table BR.QB003: No of Visitors Arrivals: by Mode of Transport. According Ministry of Tourism, the monthly Visitor Arrivals will release in annual basis due to Tourism Ministry receive once in the year some input data from the Federal Policy, based on these data they perform the estimation for all the months in the year
w
Synthetic Data for an Imaginary Country, Sample, 2023 - World
microdata.worldbank.org
Updated Jul 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
Explore at:
Dataset updated
Jul 7, 2023
Dataset authored and provided by
Development Data Group, Data Analytics Unit
Time period covered
2023
Area covered
World, World
Description
Abstract

The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

The full-population dataset (with about 10 million individuals) is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

Analysis unit

Household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

ssd

Sampling procedure

The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

Response rate

This is a synthetic dataset; the "response rate" is 100%.
CH1ORB-L-SARA-2-NPO-EDR-CENA
esdcdoi.esac.esa.int
Updated Mar 31, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Space Agency (2010). CH1ORB-L-SARA-2-NPO-EDR-CENA [Dataset]. http://doi.org/10.5270/esa-1i7js7s
Explore at:
https://www.iana.org/assignments/media-types/application/fitsAvailable download formats
Unique identifier
https://doi.org/10.5270/esa-1i7js7s
Dataset updated
Mar 31, 2010
Dataset provided by
European Space Agencyhttp://www.esa.int/
Time period covered
Dec 8, 2008 - Aug 13, 2009
Description
Contents 1 Data set description 1.1 Data set overview 1.2 Parameters 1.3 Processing 1.4 Data 1.5 Ancillary data 1.6 Software 1.7 Media/Format 2 Confidence level note 2.1 Confidence level overview 2.2 Review 2.3 Data quality 1. Data set description 1.1. Data set overview The output data of CENA sensor is basically neutral particle counts.CENA sensor operates in 3 instrument modes (coincidence Mode, counter M ode and Engineering Mode) and the content of the data coming from the CENA sensor is dependent on the instrument mode and the format of the data depends on the telemetry mode. The telemetry modes are Mass Accumulation Mode, TOF Accumulation Mode and Count Accumulation Mode.In Mass accumulation mode, TOF accumulation mode and Count accumulation mode, data coming from the sensor is being sorted by lookup tables and is being summed up into two types of accumulation matrixes (the accumulation matrix and the accumulation scaling matrix) during a time period. The accumulation matrix size changes depending on the binning parameters (energy, channel, phase and mass bins).For details on the CENA sensor of the SARA experiment and the data products, see the EAICD in the DOCUMENT directory. 1.2. Parameters The measured parameter is basically raw neutral particle counts. 1.3. Processing No processing beyond unpacking has been applied to the telemetry data. 1.4. Data Each data product contains all data from one orbit. The data product contain housekeeping data as well as science data as scaling matrix(total counts) and accumulation matrix except for the counter mode operation of CENA where there will be no scaling matrix.The instrument mode and telemetry mode is reflected in the file name (refer to the EAICD in the DOCUMENT directory).CENA data is archived using the storage format of PDS ARRAY of COLLECTION objects.Each CENA PDS data product f ile contains an ARRAY of records of CENA measurements in one orbit. Ea ch record is [truncated!, Please see actual data for full text]
Data from: Data-driven analysis of oscillations in Hall thruster simulations...
zenodo.org
portaldelainvestigacion.uma.es
bin
Updated Oct 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davide Maddaloni; Davide Maddaloni; Adrián Domínguez Vázquez; Adrián Domínguez Vázquez; Filippo Terragni; Filippo Terragni; Mario Merino; Mario Merino (2024). Data from: Data-driven analysis of oscillations in Hall thruster simulations & Data-driven sparse modeling of oscillations in plasma space propulsion [Dataset]. http://doi.org/10.5281/zenodo.13908820
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13908820
Dataset updated
Oct 9, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Davide Maddaloni; Davide Maddaloni; Adrián Domínguez Vázquez; Adrián Domínguez Vázquez; Filippo Terragni; Filippo Terragni; Mario Merino; Mario Merino
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Data from: Data-driven analysis of oscillations in Hall thruster simulations

- Authors: Davide Maddaloni, Adrián Domínguez Vázquez, Filippo Terragni, Mario Merino

- Contact email: dmaddalo@ing.uc3m.es

- Date: 2022-03-24

- Keywords: higher order dynamic mode decomposition, hall effect thruster, breathing mode, ion transit time, data-driven analysis

- Version: 1.0.4

- Digital Object Identifier (DOI): 10.5281/zenodo.6359505

- License: This dataset is made available under the Open Data Commons Attribution License

Abstract

This dataset contains the outputs of the HODMD algorithm and the original simulations used in the journal publication:

Davide Maddaloni, Adrián Domínguez Vázquez, Filippo Terragni, Mario Merino, "Data-driven analysis of oscillations in Hall thruster simulations", 2022 Plasma Sources Sci. Technol. 31:045026. Doi: 10.1088/1361-6595/ac6444.

Additionally, the raw simulation data is also employed in the following journal publication:

Borja Bayón-Buján and Mario Merino, "Data-driven sparse modeling of oscillations in plasma space propulsion", 2024 Mach. Learn.: Sci. Technol. 5:035057. Doi: 10.1088/2632-2153/ad6d29

Dataset description

The simulations from which data stems have been produced using the full 2D hybrid PIC/fluid code HYPHEN, while the HODMD results have been produced using an adaptation of the original HODMD algorithm with an improved amplitude calculation routine.

Please refer to the relative article for further details regarding any of the parameters and/or configurations.

Data files

The data files are in standard Matlab .mat format. A recent version of Matlab is recommended.

The HODMD outputs are collected within 18 different files, subdivided into three groups, each one referring to a different case. For the file names, "case1" refers to the nominal case, "case2" refers to the low voltage case and "case3" refers to the high mass flow rate case. Following, the variables are referred as:

"n" for plasma density

"Te" for electron temperature

"phi" for plasma potential

"ji" for ion current density (both single and double charged ones)

"nn" for neutral density

"Ez" for axial electric field

"Si" for ionization production term

"vi1" for single charged ions axial velocity

In particular, axial electric field, ionization production term and single charged ions axial velocity are available only for the first case. Such files have a cell structure: the first row contains the frequencies (in Hz), the second row contains the normalized modes (alongside their complex conjugates), the third row collects the growth rates (in 1/s) while the amplitudes (dimensionalized) are collected within the last row. Additionally, the time vector is simply given as "t", common to all cases and all variables.

The raw simulation data are collected within additional 15 variables, following the same nomenclature as above, with the addition of the suffix "_raw" to differentiate them from the HODMD outputs.

Citation

Works using this dataset or any part of it in any form shall cite it as follows.

The preferred means of citation is to reference the publication associated to this dataset, as soon as it is available.

Optionally, the dataset may be cited directly by referencing the DOI: 10.5281/zenodo.6359505.

Acknowledgments

This work has been supported by the Madrid Government (Comunidad de Madrid) under the Multiannual Agreement with UC3M in the line of ‘Fostering Young Doctors Research’ (MARETERRA-CM-UC3M), and in the context of the V PRICIT (Regional Programme of Research and Technological Innovation). F. Terragni was also supported by the Fondo Europeo de Desarrollo Regional, Ministerio de Ciencia, Innovación y Universidades - Agencia Estatal de Investigación, under grants MTM2017-84446-C2-2-R and PID2020-112796RB-C22.
CMAPSS Jet Engine Simulated Data - Dataset - NASA Open Data Portal
data.staging.idas-ds1.appdat.jsc.nasa.gov
data.nasa.gov
Updated Oct 15, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2008). CMAPSS Jet Engine Simulated Data - Dataset - NASA Open Data Portal [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/cmapss-jet-engine-simulated-data
Explore at:
Dataset updated
Oct 15, 2008
Dataset provided by
NASAhttp://nasa.gov/
Description
Data sets consists of multiple multivariate time series. Each data set is further divided into training and test subsets. Each time series is from a different engine i.e., the data can be considered to be from a fleet of engines of the same type. Each engine starts with different degrees of initial wear and manufacturing variation which is unknown to the user. This wear and variation is considered normal, i.e., it is not considered a fault condition. There are three operational settings that have a substantial effect on engine performance. These settings are also included in the data. The data is contaminated with sensor noise. The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective of the competition is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate. Also provided a vector of true Remaining Useful Life (RUL) values for the test data. The data are provided as a zip-compressed text file with 26 columns of numbers, separated by spaces. Each row is a snapshot of data taken during a single operational cycle, each column is a different variable. The columns correspond to: 1) unit number 2) time, in cycles 3) operational setting 1 4) operational setting 2 5) operational setting 3 6) sensor measurement 1 7) sensor measurement 2 ... 26) sensor measurement 26 Data Set: FD001 Train trjectories: 100 Test trajectories: 100 Conditions: ONE (Sea Level) Fault Modes: ONE (HPC Degradation) Data Set: FD002 Train trjectories: 260 Test trajectories: 259 Conditions: SIX Fault Modes: ONE (HPC Degradation) Data Set: FD003 Train trjectories: 100 Test trajectories: 100 Conditions: ONE (Sea Level) Fault Modes: TWO (HPC Degradation, Fan Degradation) Data Set: FD004 Train trjectories: 248 Test trajectories: 249 Conditions: SIX Fault Modes: TWO (HPC Degradation, Fan Degradation) Reference: A. Saxena, K. Goebel, D. Simon, and N. Eklund, ‘Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation’, in the Proceedings of the 1st International Conference on Prognostics and Health Management (PHM08), Denver CO, Oct 2008.

Spotify Tracks Attributes and Popularity

kaggle.com

Updated Jul 9, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Melissa Monfared (2025). Spotify Tracks Attributes and Popularity [Dataset]. https://www.kaggle.com/datasets/melissamonfared/spotify-tracks-attributes-and-popularity

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 9, 2025

Dataset provided by

Kaggle

Authors

Melissa Monfared

Description

About Dataset

Overview:

This dataset provides detailed metadata and audio analysis for a wide collection of Spotify music tracks across various genres. It includes track-level information such as popularity, tempo, energy, danceability, and other musical features that can be used for music recommendation systems, genre classification, or trend analysis. The dataset is a rich source for exploring music consumption patterns and user preferences based on song characteristics.

Dataset Details:

This dataset contains rows of individual music tracks, each described by both metadata (such as track name, artist, album, and genre) and quantitative audio features. These features reflect different musical attributes such as energy, acousticness, instrumentalness, valence, and more, making it ideal for audio machine learning projects and exploratory data analysis.

Schema and Column Descriptions:

Column Name	Description
`index`	Unique index for each track (can be ignored for analysis)
`track_id`	Spotify's unique identifier for the track
`artists`	Name of the performing artist(s)
`album_name`	Title of the album the track belongs to
`track_name`	Title of the track
`popularity`	Popularity score on Spotify (0–100 scale)
`duration_ms`	Duration of the track in milliseconds
`explicit`	Indicates whether the track contains explicit content
`danceability`	How suitable the track is for dancing (0.0 to 1.0)
`energy`	Intensity and activity level of the track (0.0 to 1.0)
`key`	Musical key (0 = C, 1 = C♯/D♭, …, 11 = B)
`loudness`	Overall loudness of the track in decibels (dB)
`mode`	Modality (major = 1, minor = 0)
`speechiness`	Presence of spoken words in the track (0.0 to 1.0)
`acousticness`	Confidence measure of whether the track is acoustic (0.0 to 1.0)
`instrumentalness`	Predicts whether the track contains no vocals (0.0 to 1.0)
`liveness`	Presence of an audience in the recording (0.0 to 1.0)
`valence`	Musical positivity conveyed (0.0 = sad, 1.0 = happy)
`tempo`	Estimated tempo in beats per minute (BPM)
`time_signature`	Time signature of the track (e.g., 4 = 4/4)
`track_genre`	Assigned genre label for the track

Key Features:

Comprehensive Track Data: Metadata combined with detailed audio analysis.
Genre Diversity: Includes tracks from various music genres.
Audio Feature Rich: Suitable for audio classification, recommendation engines, or clustering.
Machine Learning Friendly: Clean and numerical format ideal for ML models.

Usage:

This dataset is valuable for:

🎵 Music Recommendation Systems: Building collaborative or content-based recommenders.
📊 Data Visualization & Dashboards: Analyzing genre or mood trends over time.
🤖 Machine Learning Projects: Predicting song popularity or clustering similar tracks.
🧠 Music Psychology & Behavioral Studies: Exploring how music features relate to emotions or behavior.

Data Maintenance:

Source: https://huggingface.co/datasets/ConquestAce/spotify-songs
Last Updated: 2025/04/26
License: Check original source for usage terms.

Additional Notes:

This dataset can be enhanced by merging it with user listening behavior data, lyrics datasets, or chart positions for more advanced analysis.
Some columns like key, mode, and explicit may need to be mapped for better readability in visualization.

Facebook

Twitter

Click to copy link

Link copied

Cite

Matthieu Dubarry (2020). Graphite//LFP synthetic training prognosis dataset [Dataset]. http://doi.org/10.17632/6s6ph9n8zg.1

Graphite//LFP synthetic training prognosis dataset

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.17632/6s6ph9n8zg.1

Dataset updated

May 6, 2020

Authors

Matthieu Dubarry

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This training dataset was calculated using the mechanistic modeling approach. See the “Benchmark Synthetic Training Data for Artificial Intelligence-based Li-ion Diagnosis and Prognosis“ publication for mode details. More details will be added when published. The prognosis dataset was harder to define as there are no limits on how the three degradation modes can evolve. For this proof of concept work, we considered eight parameters to scan. For each degradation mode, degradation was chosen to follow equation (1).

%degradation=a × cycle+ (exp^(b×cycle)-1) (1)

Considering the three degradation modes, this accounts for six parameters to scan. In addition, two other parameters were added, a delay for the exponential factor for LLI, and a parameter for the reversibility of lithium plating. The delay was introduced to reflect degradation paths where plating cannot be explained by an increase of LAMs or resistance [55]. The chosen parameters and their values are summarized in Table S1 and their evolution is represented in Figure S1. Figure S1(a,b) presents the evolution of parameters p1 to p7. At the worst, the cells endured 100% of one of the degradation modes in around 1,500 cycles. Minimal LLI was chosen to be 20% after 3,000 cycles. This is to guarantee at least 20% capacity loss for all the simulations. For the LAMs, conditions were less restrictive, and, after 3,000 cycles, the lowest degradation is of 3%. The reversibility factor p8 was calculated with equation (2) when LAMNE > PT.

%LLI=%LLI+p8 (LAM_PE-PT) (2)

Where PT was calculated with equation (3) from [60].

PT=100-((100-LAMPE)/(100×LRini-LAMPE ))×(100-OFSini-LLI) (3)

Varying all those parameters accounted for more than 130,000 individual duty cycles. With one voltage curve for every 100 cycles. 6 MATLAB© .mat files are included: The GIC-LFP_duty_other.mat file contains 12 variables Qnorm: normalize capacity scale for all voltage curves

P1 to p8: values used to generate the duty cycles

Key: index for which values were used for each degradation paths. 1 -p1, … 8 - p8

QL: capacity loss, one line per path, one column per 100 cycles.

File GIC-LFP_duty_LLI-LAMsvalues.mat contains the values for LLI, LAMPE and LAMNE for all cycles (1line per 100 cycles) and duty cycles (columns).

Files GIC-LFP_duty_1 to _4 files contains the voltage data split into 1GB chunks (40,000 simulations). Each cell corresponds to 1 line in the key variable. Inside each cell, one colunm per 100 cycles.

Clear search

Close search

Google apps

Main menu

Graphite//LFP synthetic training prognosis dataset

A geometric shape regularity effect in the human brain: fMRI dataset

A geometric shape regularity effect in the human brain: fMRI dataset

Abstract

Notes about this dataset

Zero Modes and Classification of Combinatorial Metamaterials

Data from: FISBe: A real-world benchmark dataset for instance segmentation...

General

Summary

Abstract

Dataset documentation:

Files

How to work with the image files

How to open zarr files

How to view zarr image files

Metrics

Baseline

License

Citation

Acknowledgments

Changelog

Contributing

Explainable AI (XAI) Drilling Dataset

Opal Tap On and Tap Off

Dataset for: Let’s stay in touch: Frequency (but not mode) of interaction...

Commuter Mode Share

Estimated stand-off distance between ADS-B equipped aircraft and obstacles

RapidScat Level 2B Climate Ocean Wind Vectors in 12.5km Footprints

Brazil Visitor Arrivals: Marine: North America: Canada

Wind WAVES TDSF Dataset

Brazil Visitor Arrivals: Air: Central America & Caribbean: Costa Rica

How to Login DuckDuckGo Account? | A Step-By-Step Guide Dataset

Brazil Visitor Arrivals: Marine: Central America & Caribbean: Cuba

Synthetic Data for an Imaginary Country, Sample, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

CH1ORB-L-SARA-2-NPO-EDR-CENA

Data from: Data-driven analysis of oscillations in Hall thruster simulations...

CMAPSS Jet Engine Simulated Data - Dataset - NASA Open Data Portal

Spotify Tracks Attributes and Popularity

About Dataset

Overview:

Dataset Details:

Schema and Column Descriptions:

Key Features:

Usage:

Data Maintenance:

Additional Notes:

Graphite//LFP synthetic training prognosis datasetSee More Versions

Graphite//LFP synthetic training prognosis dataset