39 datasets found

e
Swift XRT Combined Intensity Images - Dataset - B2FIND
b2find.eudat.eu
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Swift XRT Combined Intensity Images - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/5318a3b8-6f78-5681-b5c9-0c5d1c79d8f6
Explore at:
Dataset updated
Aug 13, 2024
Description
The Swift XRT (Burrows et al 2005, SSRv, 120, 165) is a sensitive, broad-band (0.2 - 10 keV) X-ray imager with an effective area of about 125 cm**2 at 1.5 keV. The 600 x 600 pixel CCD at the focus provides a 23.6' x 23.6' field of view with a pixel scale of 2.36". The point spread function is 18" (HPD) at 1.5 keV. These XRT surveys represent the data from the first 12.5 years of Swift X-ray observations. They include all data taken in photon counting mode. A total of just over 8% of the sky has some non-zero exposure. The fraction of sky exposed as a function of the exposure is given in the following table: Exposure>0 10 30 100 300 1000 3000 1000 30000 100000300000 Coverage 8.42 8.37 8.29 7.67 7.29 5.68 3.40 1.26 0.35 0.044 0.00118 The individual exposure and counts maps have been combined into a Hierarchical Progressive Survey (HiPS) where the data are stored in tiles in the HEALPix projection at a number of different resulutions. The highest resolution pixels (HEALPix order 17) have a size of roughly 1.6". Data are also stored at lower resolutions at factors of 1/2, 1/4, 1/8, 1/16, and 1/32, and in an all sky image with a resolution 1/256 of the higest resolution. An intensity map has been created as the ratio of the counts and exposure maps. These surveys combine the basic count and exposure maps provided as standard products in the Swift XRT archive in obsid/xrt/products/*xpc_(sk|ex).img.gz. The surveys were created as follows: All of the exposure maps available in the archive in mid-May 2017 were combined using the CDS-developed Hipsgen tool. This includes 129,063 observations for which both count and exposure files were found in PC mode. Three exposures where there was a counts map but no exposure map were ignored. A few exposure files had more than one exposure extension. 1,082 files had two extensions and 1 file had 3 extensions. The 1084 HDUs in extensions were extracted as separate files and included in the total exposure. The value of 0 was given to the Hipsgen software as the null value for the FITS files. This caused the CDS software to treat such pixels as missing rather than 0 exposure. The counts data was extracted from the counts maps for each observation using SkyView developed software. For any pixel in which a count was recorded, the corresponding exposure file was checked and if there was any exposure (in any of the associated extensions), then the count was retained. If there was no exposure in any of the extensions of the corresponding exposure file, the counts in the pixel were omitted. Once a count was accepted, the overlap between the counts map pixel and the pixels of the corresponding HiPS tile (or tiles) was computed. Each count was then assigned entirely to a single pixel in the HiPS tile randomly but with the destination pixel probabilities weighted by area of the overlap. Thus if several pixels were found in a given counts map pixel they might be assigned to different pixels in the output image. The HiPS pixels (~1.6") used were of substantially higher resolution than the XRT resolution of 18" and somewhat higher than the counts map resolution of 2.36". A total of 183,750,428 photons were extracted from the counts maps while 15,226 were rejected as being from pixels with 0 exposure. There were 501 pixels which required special treatment as straddling the boundaries of the HEALPix projection. The resulting counts tiles were then clipped using the exposure tiles that had been previously generated. Basically this transferred the coverage of the exposure tiles to the counts tiles. Any counts pixel where the corresponding exposure pixel was a NaN was changed to a NaN to indicate that there was no coverage in this region. During the clipping process 137,730 HiPS level 8 were clipped (of 786,432 over the entire sky). There were 12,236 tiles for which there was some exposure but no counts found. During the clipping process 2 photons were found on pixels where there was no corresponding exposure in the exposure tiles. This can happen when the pixel assignment process noted above shifts a photon just outside the exposed region but should be -- as it was -- rare. These photons were deleted. After creating the clipped level 8 counts maps, level 7 to 3 tiles and an all sky map where generated by averaging pixels 2x2 to decrease each level. When adding the four pixels in the level N map together only pixels whose value was not NaN were considered. Finally an intensity map was created by dividing the counts tiles by the exposure tiles. To eliminate gross fluctuations due to rare counts in regions with very low exposure, only regions with exposure > 1 second were retained. A total of 30 photons were deleted due to this criterion. Note that while any sampler may in principle be used with these data, the Spline sampler may give unexpected results. The spline computation propogates NaNs thought the image and means that even occasional NaNs can corrupt the output image completely. NaNs are very common in this dataset. Also, if the region straddles a boundary in the HEALPix projection, the size of the requested input region is likely to exceed memory limits since the HiPS data are considered a single very large image. Provenance: Data generated from public images at HEASARC archive. This is a service of NASA HEASARC.
m
Graphite//LFP synthetic training prognosis dataset
data.mendeley.com
Updated May 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthieu Dubarry (2020). Graphite//LFP synthetic training prognosis dataset [Dataset]. http://doi.org/10.17632/6s6ph9n8zg.1
Explore at:
Unique identifier
https://doi.org/10.17632/6s6ph9n8zg.1
Dataset updated
May 6, 2020
Authors
Matthieu Dubarry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This training dataset was calculated using the mechanistic modeling approach. See the “Benchmark Synthetic Training Data for Artificial Intelligence-based Li-ion Diagnosis and Prognosis“ publication for mode details. More details will be added when published. The prognosis dataset was harder to define as there are no limits on how the three degradation modes can evolve. For this proof of concept work, we considered eight parameters to scan. For each degradation mode, degradation was chosen to follow equation (1).

%degradation=a × cycle+ (exp^(b×cycle)-1) (1)

Considering the three degradation modes, this accounts for six parameters to scan. In addition, two other parameters were added, a delay for the exponential factor for LLI, and a parameter for the reversibility of lithium plating. The delay was introduced to reflect degradation paths where plating cannot be explained by an increase of LAMs or resistance [55]. The chosen parameters and their values are summarized in Table S1 and their evolution is represented in Figure S1. Figure S1(a,b) presents the evolution of parameters p1 to p7. At the worst, the cells endured 100% of one of the degradation modes in around 1,500 cycles. Minimal LLI was chosen to be 20% after 3,000 cycles. This is to guarantee at least 20% capacity loss for all the simulations. For the LAMs, conditions were less restrictive, and, after 3,000 cycles, the lowest degradation is of 3%. The reversibility factor p8 was calculated with equation (2) when LAMNE > PT.

%LLI=%LLI+p8 (LAM_PE-PT) (2)

Where PT was calculated with equation (3) from [60].

PT=100-((100-LAMPE)/(100×LRini-LAMPE ))×(100-OFSini-LLI) (3)

Varying all those parameters accounted for more than 130,000 individual duty cycles. With one voltage curve for every 100 cycles. 6 MATLAB© .mat files are included: The GIC-LFP_duty_other.mat file contains 12 variables Qnorm: normalize capacity scale for all voltage curves

P1 to p8: values used to generate the duty cycles

Key: index for which values were used for each degradation paths. 1 -p1, … 8 - p8

QL: capacity loss, one line per path, one column per 100 cycles.

File GIC-LFP_duty_LLI-LAMsvalues.mat contains the values for LLI, LAMPE and LAMNE for all cycles (1line per 100 cycles) and duty cycles (columns).

Files GIC-LFP_duty_1 to _4 files contains the voltage data split into 1GB chunks (40,000 simulations). Each cell corresponds to 1 line in the key variable. Inside each cell, one colunm per 100 cycles.
A geometric shape regularity effect in the human brain: fMRI dataset
openneuro.org
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathias Sablé-Meyer; Lucas Benjamin; Cassandra Potier Watkins; Chenxi He; Maxence Pajot; Théo Morfoisse; Fosca Al Roumi; Stanislas Dehaene (2025). A geometric shape regularity effect in the human brain: fMRI dataset [Dataset]. http://doi.org/10.18112/openneuro.ds006010.v1.0.1
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds006010.v1.0.1
Dataset updated
Mar 14, 2025
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Mathias Sablé-Meyer; Lucas Benjamin; Cassandra Potier Watkins; Chenxi He; Maxence Pajot; Théo Morfoisse; Fosca Al Roumi; Stanislas Dehaene
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A geometric shape regularity effect in the human brain: fMRI dataset

Authors:

Mathias Sablé-Meyer*

Lucas Benjamin

Cassandra Potier Watkins

Chenxi He

Maxence Pajot

Théo Morfoisse

Fosca Al Roumi

Stanislas Dehaene

*Corresponding author: mathias.sable-meyer@ucl.ac.uk

Abstract

The perception and production of regular geometric shapes is a characteristic trait of human cultures since prehistory, whose neural mechanisms are unknown. Behavioral studies suggest that humans are attuned to discrete regularities such as symmetries and parallelism, and rely on their combinations to encode regular geometric shapes in a compressed form. To identify the relevant brain systems and their dynamics, we collected functional MRI and magnetoencephalography data in both adults and six-year-olds during the perception of simple shapes such as hexagons, triangles and quadrilaterals. The results revealed that geometric shapes, relative to other visual categories, induce a hypoactivation of ventral visual areas and an overactivation of the intraparietal and inferior temporal regions also involved in mathematical processing, whose activation is modulated by geometric regularity. While convolutional neural networks captured the early visual activity evoked by geometric shapes, they failed to account for subsequent dorsal parietal and prefrontal signals, which could only be captured by discrete geometric features or by more advanced transformer models of vision. We propose that the perception of abstract geometric regularities engages an additional symbolic mode of visual perception.

Notes about this dataset

We separately share the MEG dataset at https://openneuro.org/datasets/ds006012. Below are some notes about the fMRI dataset of N=20 adult participants (sub-2xx, numbers between 204 and 223), and N=22 children (sub-3xx, numbers between 301 and 325).

The code for the analyses is provided at https://github.com/mathias-sm/AGeometricShapeRegularityEffectHumanBrain
However, the analyses work from already preprocessed data. Since there is no custom code per se for the preprocessing, I have not included it in the repository. To preprocess the data as was done in the published article, here is the command and software information:

fMRIPrep version: 20.0.5

fMRIPrep command: /usr/local/miniconda/bin/fmriprep /data /out participant --participant-label <label> --output-spaces MNI152NLin6Asym:res-2 MNI152NLin2009cAsym:res-2

Defacing has been performed with bidsonym running the pydeface masking, and nobrainer brain registraction pipeline.
The published analyses have been performed on the non-defaced data. I have checked for data quality on all participants after defacing. In specific cases, I may be able to request the permission to share the original, non-defaced dataset.

sub-325 was acquired by a different experimenter and defaced before being shared with the rest of the research team, hence why the slightly different defacing mask. That participant was also preprocessed separately, and using a more recent fMRIPrep version: 20.2.6.

The data associated with the children has a few missing files. Notably:

sub-313 and sub-316 are missing one run of the localizer each

sub-316 has no data at all for the geometry

sub-308 has eno useable data for the intruder task Since all of these still have some data to contribute to either task, all available files were kept on this dataset. The analysis code reflects these inconsistencies where required with specific exceptions.
SPD24 - Student Performance Data revised Features
kaggle.com
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DatasetEngineer (2024). SPD24 - Student Performance Data revised Features [Dataset]. http://doi.org/10.34740/kaggle/dsv/9083250
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9083250
Dataset updated
Aug 1, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DatasetEngineer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Student Performance Dataset 2024 Overview This dataset comprises detailed information about high school students in China, collected from various universities and schools. It is designed to analyze the factors influencing student performance, well-being, and engagement. The data includes a wide range of features such as demographic details, academic performance, health status, parental support, and more. The participating institutions include prominent universities such as Tsinghua University, Peking University, Fudan University, Shanghai Jiao Tong University, and Zhejiang University.

Dataset Description Features Student ID: Unique identifier for each student. Gender: Gender of the student (Male/Female). Age: Age of the student. Grade Level: The grade level of the student (e.g., 9, 10, 11, 12). Attendance Rate: The percentage of days the student attended school. Study Hours: Average number of hours the student spends studying daily. Parental Education Level: The highest level of education attained by the student's parents. Parental Involvement: The level of parental involvement in the student's education (High, Medium, Low). Extracurricular Activities: Whether the student participates in extracurricular activities (Yes/No). Socioeconomic Status: Socioeconomic status of the student's family (High, Medium, Low). Previous Academic Performance: Previous academic performance level (High, Medium, Low). Class Participation: The level of participation in class (High, Medium, Low). Health Status: General health status of the student (Good, Average, Poor). Access to Learning Resources: Whether the student has access to necessary learning resources (Yes/No). Internet Access: Whether the student has access to the internet (Yes/No). Learning Style: Preferred learning style of the student (Visual, Auditory, Kinesthetic). Teacher-Student Relationship: Quality of the relationship between the student and teachers (Positive, Neutral, Negative). Peer Influence: Influence of peers on the student's behavior and performance (Positive, Neutral, Negative). Motivation Level: Student's level of motivation (High, Medium, Low). Hours of Sleep: Average number of hours the student sleeps per night. Diet Quality: Quality of the student's diet (Good, Average, Poor). Transportation Mode: Mode of transportation used by the student to commute to school (Bus, Car, Walk, Bike). School Type: Type of school attended by the student (Public, Private). School Location: Location of the school (Urban, Rural). Homework Completion Rate: The rate at which the student completes homework assignments. Reading Proficiency: Proficiency level in reading. Math Proficiency: Proficiency level in mathematics. Science Proficiency: Proficiency level in science. Language Proficiency: Proficiency level in language. Physical Activity Level: The level of physical activity (High, Medium, Low). Screen Time: Average daily screen time in hours. Bullying Incidents: Number of bullying incidents the student has experienced. Special Education Services: Whether the student receives special education services (Yes/No). Counseling Services: Whether the student receives counseling services (Yes/No). Learning Disabilities: Whether the student has any learning disabilities (Yes/No). Behavioral Issues: Whether the student has any behavioral issues (Yes/No). Attendance of Tutoring Sessions: Whether the student attends tutoring sessions (Yes/No). School Climate: Overall perception of the school's environment (Positive, Neutral, Negative). Parental Employment Status: Employment status of the student's parents (Employed, Unemployed). Household Size: Number of people living in the student's household. Performance Score: Overall performance score of the student (Low, Medium, High).
Explainable AI (XAI) Drilling Dataset
kaggle.com
Updated Aug 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raphael Wallsberger (2023). Explainable AI (XAI) Drilling Dataset [Dataset]. https://www.kaggle.com/datasets/raphaelwallsberger/xai-drilling-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Raphael Wallsberger
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is part of the following publication at the TransAI 2023 conference: R. Wallsberger, R. Knauer, S. Matzka; "Explainable Artificial Intelligence in Mechanical Engineering: A Synthetic Dataset for Comprehensive Failure Mode Analysis" DOI: http://dx.doi.org/10.1109/TransAI60598.2023.00032

This is the original XAI Drilling dataset optimized for XAI purposes and it can be used to evaluate explanations of such algortihms. The dataset comprises 20,000 data points, i.e., drilling operations, stored as rows, 10 features, one binary main failure label, and 4 binary subgroup failure modes, stored in columns. The main failure rate is about 5.0 % for the whole dataset. The features that constitute this dataset are as follows:

ID: Every data point in the dataset is uniquely identifiable, thanks to the ID feature. This ensures traceability and easy referencing, especially when analyzing specific drilling scenarios or anomalies.

Cutting speed vc (m/min): The cutting speed is a pivotal parameter in drilling, influencing the efficiency and quality of the drilling process. It represents the speed at which the drill bit's cutting edge moves through the material.

Spindle speed n (1/min): This feature captures the rotational speed of the spindle or drill bit, respectively.

Feed f (mm/rev): Feed denotes the depth the drill bit penetrates into the material with each revolution. There is a balance between speed and precision, with higher feeds leading to faster drilling but potentially compromising hole quality.

Feed rate vf (mm/min): The feed rate is a measure of how quickly the material is fed to the drill bit. It is a determinant of the overall drilling time and influences the heat generated during the process.

Power Pc (kW): The power consumption during drilling can be indicative of the efficiency of the process and the wear state of the drill bit.

Cooling (%): Effective cooling is paramount in drilling, preventing overheating and reducing wear. This ordinal feature captures the cooling level applied, with four distinct states representing no cooling (0%), partial cooling (25% and 50%), and high to full cooling (75% and 100%).

Material: The type of material being drilled can significantly influence the drilling parameters and outcomes. This dataset encompasses three primary materials: C45K hot-rolled heat-treatable steel (EN 1.0503), cast iron GJL (EN GJL-250), and aluminum-silicon (AlSi) alloy (EN AC-42000), each presenting its unique challenges and considerations. The three materials are represented as “P (Steel)” for C45K, “K (Cast Iron)” for cast iron GJL and “N (Non-ferrous metal)” for AlSi alloy.

Drill Bit Type: Different materials often require specialized drill bits. This feature categorizes the type of drill bit used, ensuring compatibility with the material and optimizing the drilling process. It consists of three categories, which are based on the DIN 1836: “N” for C45K, “H” for cast iron and “W” for AlSi alloy [5].

Process time t (s): This feature captures the full duration of each drilling operation, providing insights into efficiency and potential bottlenecks.

Main failure: This binary feature indicates if any significant failure on the drill bit occurred during the drilling process. A value of 1 flags a drilling process that encountered issues, which in this case is true when any of the subgroup failure modes are 1, while 0 indicates a successful drilling operation without any major failures.

Subgroup failures: - Build-up edge failure (215x): Represented as a binary feature, a build-up edge failure indicates the occurrence of material accumulation on the cutting edge of the drill bit due to a combination of low cutting speeds and insufficient cooling. A value of 1 signifies the presence of this failure mode, while 0 denotes its absence. - Compression chips failure (344x): This binary feature captures the formation of compressed chips during drilling, resulting from the factors high feed rate, inadequate cooling and using an incompatible drill bit. A value of 1 indicates the occurrence of at least two of the three factors above, while 0 suggests a smooth drilling operation without compression chips. - Flank wear failure (278x): A binary feature representing the wear of the drill bit's flank due to a combination of high feed rates and low cutting speeds. A value of 1 indicates significant flank wear, affecting the drilling operation's accuracy and efficiency, while 0 denotes a wear-free operation. - Wrong drill bit failure (300x): As a binary feature, it indicates the use of an inappropriate drill bit for the material being drilled. A value of 1 signifies a mismatch, leading to potential drilling issues, while 0 indicates the correct drill bit usage.
Z
Data from: FISBe: A real-world benchmark dataset for instance segmentation...
data.niaid.nih.gov
zenodo.org
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reinke, Annika (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10875062
Explore at:
Dataset updated
Apr 2, 2024
Dataset provided by
Rumberger, Josef Lorenz
Ihrke, Gudrun
Kainmueller, Dagmar
Reinke, Annika
Managan, Claire
Kandarpa, Ramya
Hirsch, Peter
Mais, Lisa
Maier-Hein, Lena
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General

For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

Summary

A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

30 completely labeled (segmented) images

71 partly labeled images

altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

A set of metrics and a novel ranking score for respective meaningful method benchmarking

An evaluation of three baseline methods in terms of the above metrics and score

Abstract

Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

Dataset documentation:

We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

FISBe Datasheet

Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

Files

fisbe_v1.0_{completely,partly}.zip

contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

fisbe_v1.0_mips.zip

maximum intensity projections of all samples, for convenience.

sample_list_per_split.txt

a simple list of all samples and the subset they are in, for convenience.

view_data.py

a simple python script to visualize samples, see below for more information on how to use it.

dim_neurons_val_and_test_sets.json

a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

Readme.md

general information

How to work with the image files

Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.

We recommend to work in a virtual environment, e.g., by using conda:

conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env

How to open zarr files

Install the python zarr package:

pip install zarr

Opened a zarr file with:

import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")

optional:import numpy as npraw_np = np.array(raw)

Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.

How to view zarr image files

We recommend to use napari to view the image data.

Install napari:

pip install "napari[all]"

Save the following Python script:

import zarr, sys, napari

raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()

Execute:

python view_data.py /R9F03-20181030_62_B5.zarr

Metrics

S: Average of avF1 and C

avF1: Average F1 Score

C: Average ground truth coverage

clDice_TP: Average true positives clDice

FS: Number of false splits

FM: Number of false merges

tp: Relative number of true positives

For more information on our selected metrics and formal definitions please see our paper.

Baseline

To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.

License

The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Citation

If you use FISBe in your research, please use the following BibTeX entry:

@misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

Acknowledgments

We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.

Changelog

There have been no changes to the dataset so far.All future change will be listed on the changelog page.

Contributing

If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

All contributions are welcome!
DTDC Courier Dataset
kaggle.com
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravender Singh Rana (2025). DTDC Courier Dataset [Dataset]. https://www.kaggle.com/datasets/ravindrasinghrana/dtdc-courier-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravender Singh Rana
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
DTDC Courier Dataset

A real-world courier shipment dataset extracted from DTDC operations across India. This dataset captures the end-to-end flow of parcel logistics, covering sender/receiver details, weight, delivery mode, charges, and status — making it a valuable resource for analytics, machine learning, and logistics research.

Dataset Overview

Each row represents a courier consignment booked via DTDC, including details like:

Pickup and delivery location (city, pincode, state)

Shipment weight (actual, volumetric, chargeable)

Delivery mode and tariff breakdown

Signatures, timestamps, and GST details (where applicable)

Nature of parcel: documents or non-docs

Sample Use Cases

Route Optimization

Delivery Time Prediction

Tariff Modeling

Last-Mile Delivery Analytics

Geospatial Heatmaps of Courier Demand

Column-Wise Schema (Descriptions)

Origin: The city from which the parcel was originally booked or dispatched.

Destination: The city where the parcel is intended to be delivered.

Pouch No: A unique internal identifier (often UUID) used for grouping shipments in logistics.

Date: The date when the shipment was booked, represented in Excel serial number format.

Sender's Name: Full name of the individual or company sending the parcel.

Sender Phone: Mobile or landline contact number of the sender.

Sender Address: Complete address of the sender for pickup and records.

Sender City: The city of the sender may duplicate the "Origin" field.

Sender State: The Indian state in which the sender is located.

Sender Pincode: The 6-digit postal code (ZIP) of the sender's location.

Sender GSTIN: GST Identification Number of the sender (if applicable for business shipments).

Total Pieces: The number of individual physical packages included in the consignment.

Actual Wt: The actual physical weight (in kilograms) of the parcel(s).

Volumetric Wt: The weight calculated based on volume dimensions, used for billing purposes.

Chargeable Wt: The weight used for billing — typically the greater of actual or volumetric weight.

Paperwork: Indicates whether paperwork was submitted along with the parcel (Yes/No).

Sender Signature: Denotes whether the sender signed the dispatch documentation (Yes/No).

Sender Date: The date the sender completed handoff or signed documents, in Excel serial format.

Recipient Name: The full name of the person or organization meant to receive the parcel.

Recipient Phone: The contact number of the intended recipient.

Recipient Address: The complete delivery address where the parcel should be sent.

Recipient City: The city of the recipient may duplicate the "Destination" field.

Receiver State: The Indian state where the recipient resides.

Receiver Pincode: The 6-digit postal or ZIP code of the delivery address.

Description: A short textual description of the parcel contents (e.g., documents, items).

Value Added Services: Optional services added to the shipment, such as fragile handling, express delivery, insurance, etc.

Consignment No: The official DTDC tracking number assigned to the shipment.

Expiry Date: The last valid date for delivery, after which the shipment may expire or be returned. Stored as an Excel serial number.

Booking Code: The unique code identifying the DTDC branch or booking counter.

Recipient GSTIN: GST Identification Number of the recipient (if applicable).

Receiver Name: An alternate or duplicate field for the recipient's name.

Relationship: Describes the relationship of the receiver to the original recipient (e.g., Self, Staff, Family).

Company Stamp: Indicates whether a company stamp was used during the receipt (Yes/No).

Receiver Signature: Indicates whether the recipient signed the delivery confirmation (Yes/No).

Receive Date: The date the parcel was received, in Excel serial format.

Tariff: The base delivery charge in Indian Rupees (₹).

VAS Charges: Additional charges for any value-added services applied to the shipment.

Total Amount: The total amount billed for the consignment, including tariff and any add-ons.

Mode: The mode of shipment chosen — Surface, Air, or Express.

Risk Surcharge: Denotes who bears risk liability — either the Owner (sender) or Carrier (DTDC).

Mode of Payment: The payment method used at booking, such as Cash, Card, UPI, Wallet, etc.

Nature of Consignment: Indicates the type of parcel — "Dox" for documents or "Non-Dox" for goods/merchandise.

Note:

Please note that the examples provided...
e
Global conversion of the M2-tide (modes 1-10) and the role of supercritical...
b2find.eudat.eu
Updated Oct 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Global conversion of the M2-tide (modes 1-10) and the role of supercritical slopes - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/bb06e8e0-baa5-5cbc-99ab-95c96e005f45
Explore at:
Dataset updated
Oct 12, 2024
Description
This data set represents an extension of earlier computations of the internal M2-tide generation (for mode 1, Pollmann and Nycander, 2023; data set: 10.17882/92304) for the modes 2-10. The methodology is based on linear theory and is explained in Pollmann et al., 2019. In addition, we address the role of slope criticality in the open-ocean and identify which conversion rate estimates fall into (i) the linear regime, where linear theory is valid; (ii) the weakly nonlinear regime, where linear theory might underestimate, but does not strongly overestimate the nonlinear conversion; (iii) the strongly nonlinear regime, where linear theory might substantially overestimate the nonlinear conversion. Details on the procedure can be found in the associated manuscript (Geoffroy, Pollmann and Nycander, currently under revision for J. Phys. Oceanogr.). Please cite this as well as the papers mentioned above when using this data set. This data set includes the global conversion rate estimates of modes 1-10 below 400m, 700m, and 1000 m depth. We include the uncorrected estimates as well as those in the linear regime (i) and in the regime combining linear and weakly nonlinear conditions (i+ii). The associated masks, also differentiating between ridges and canyons, are also included. Note that there is no weakly nonlinear regime (ii) for canyons and that there is no strongly nonlinear regime (iii) for ridges for the first mode. Conversion rate estimates from topography patches with canyons deeper than the patch mean depth are also masked. For reasons of completeness, we add the mode-1 estimates and the mask for land and continental slopes of Pollmann & Nycander, 2023 (data set: 10.17882/92304).
Z
Estimated stand-off distance between ADS-B equipped aircraft and obstacles
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weinert, Andrew (2024). Estimated stand-off distance between ADS-B equipped aircraft and obstacles [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7741272
Explore at:
Dataset updated
Jul 12, 2024
Dataset authored and provided by
Weinert, Andrew
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Summary:

Estimated stand-off distance between ADS-B equipped aircraft and obstacles. Obstacle information was sourced from the FAA Digital Obstacle File and the FHWA National Bridge Inventory. Aircraft tracks were sourced from processed data curated from the OpenSky Network. Results are presented as histograms organized by aircraft type and distance away from runways.

Description:

For many aviation safety studies, aircraft behavior is represented using encounter models, which are statistical models of how aircraft behave during close encounters. They are used to provide a realistic representation of the range of encounter flight dynamics where an aircraft collision avoidance system would be likely to alert. These models currently and have historically have been limited to interactions between aircraft; they have not represented the specific interactions between obstacles and aircraft equipped transponders. In response, we calculated the standoff distance between obstacles and ADS-B equipped manned aircraft.

For robustness, this assessment considered two different datasets of manned aircraft tracks and two datasets of obstacles. For robustness, MIT LL calculated the standoff distance using two different datasets of aircraft tracks and two datasets of obstacles. This approach aligned with the foundational research used to support the ASTM F3442/F3442M-20 well clear criteria of 2000 feet laterally and 250 feet AGL vertically.

The two datasets of processed tracks of ADS-B equipped aircraft curated from the OpenSky Network. It is likely that rotorcraft were underrepresented in these datasets. There were also no considerations for aircraft equipped only with Mode C or not equipped with any transponders. The first dataset was used to train the v1.3 uncorrelated encounter models and referred to as the “Monday” dataset. The second dataset is referred to as the “aerodrome” dataset and was used to train the v2.0 and v3.x terminal encounter model. The Monday dataset consisted of 104 Mondays across North America. The other dataset was based on observations at least 8 nautical miles within Class B, C, D aerodromes in the United States for the first 14 days of each month from January 2019 through February 2020. Prior to any processing, the datasets required 714 and 847 Gigabytes of storage. For more details on these datasets, please refer to "Correlated Bayesian Model of Aircraft Encounters in the Terminal Area Given a Straight Takeoff or Landing" and “Benchmarking the Processing of Aircraft Tracks with Triples Mode and Self-Scheduling.”

Two different datasets of obstacles were also considered. First was point obstacles defined by the FAA digital obstacle file (DOF) and consisted of point obstacle structures of antenna, lighthouse, meteorological tower (met), monument, sign, silo, spire (steeple), stack (chimney; industrial smokestack), transmission line tower (t-l tower), tank (water; fuel), tramway, utility pole (telephone pole, or pole of similar height, supporting wires), windmill (wind turbine), and windsock. Each obstacle was represented by a cylinder with the height reported by the DOF and a radius based on the report horizontal accuracy. We did not consider the actual width and height of the structure itself. Additionally, we only considered obstacles at least 50 feet tall and marked as verified in the DOF.

The other obstacle dataset, termed as “bridges,” was based on the identified bridges in the FAA DOF and additional information provided by the National Bridge Inventory. Due to the potential size and extent of bridges, it would not be appropriate to model them as point obstacles; however, the FAA DOF only provides a point location and no information about the size of the bridge. In response, we correlated the FAA DOF with the National Bridge Inventory, which provides information about the length of many bridges. Instead of sizing the simulated bridge based on horizontal accuracy, like with the point obstacles, the bridges were represented as circles with a radius of the longest, nearest bridge from the NBI. A circle representation was required because neither the FAA DOF or NBI provided sufficient information about orientation to represent bridges as rectangular cuboid. Similar to the point obstacles, the height of the obstacle was based on the height reported by the FAA DOF. Accordingly, the analysis using the bridge dataset should be viewed as risk averse and conservative. It is possible that a manned aircraft was hundreds of feet away from an obstacle in actuality but the estimated standoff distance could be significantly less. Additionally, all obstacles are represented with a fixed height, the potentially flat and low level entrances of the bridge are assumed to have the same height as the tall bridge towers. The attached figure illustrates an example simulated bridge.

It would had been extremely computational inefficient to calculate the standoff distance for all possible track points. Instead, we define an encounter between an aircraft and obstacle as when an aircraft flying 3069 feet AGL or less comes within 3000 feet laterally of any obstacle in a 60 second time interval. If the criteria were satisfied, then for that 60 second track segment we calculate the standoff distance to all nearby obstacles. Vertical separation was based on the MSL altitude of the track and the maximum MSL height of an obstacle.

For each combination of aircraft track and obstacle datasets, the results were organized seven different ways. Filtering criteria were based on aircraft type and distance away from runways. Runway data was sourced from the FAA runways of the United States, Puerto Rico, and Virgin Islands open dataset. Aircraft type was identified as part of the em-processing-opensky workflow.

All: No filter, all observations that satisfied encounter conditions

nearRunway: Aircraft within or at 2 nautical miles of a runway

awayRunway: Observations more than 2 nautical miles from a runway

glider: Observations when aircraft type is a glider

fwme: Observations when aircraft type is a fixed-wing multi-engine

fwse: Observations when aircraft type is a fixed-wing single engine

rotorcraft: Observations when aircraft type is a rotorcraft

License

This dataset is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International(CC BY-NC-ND 4.0).

This license requires that reusers give credit to the creator. It allows reusers to copy and distribute the material in any medium or format in unadapted form and for noncommercial purposes only. Only noncommercial use of your work is permitted. Noncommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Exceptions are given for the not for profit standards organizations of ASTM International and RTCA.

MIT is releasing this dataset in good faith to promote open and transparent research of the low altitude airspace. Given the limitations of the dataset and a need for more research, a more restrictive license was warranted. Namely it is based only on only observations of ADS-B equipped aircraft, which not all aircraft in the airspace are required to employ; and observations were source from a crowdsourced network whose surveillance coverage has not been robustly characterized.

As more research is conducted and the low altitude airspace is further characterized or regulated, it is expected that a future version of this dataset may have a more permissive license.

Distribution Statement

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

© 2021 Massachusetts Institute of Technology.

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

This material is based upon work supported by the Federal Aviation Administration under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Federal Aviation Administration.

This document is derived from work done for the FAA (and possibly others); it is not the direct product of work done for the FAA. The information provided herein may include content supplied by third parties. Although the data and information contained herein has been produced or processed from sources believed to be reliable, the Federal Aviation Administration makes no warranty, expressed or implied, regarding the accuracy, adequacy, completeness, legality, reliability or usefulness of any information, conclusions or recommendations provided herein. Distribution of the information contained herein does not constitute an endorsement or warranty of the data or information provided herein by the Federal Aviation Administration or the U.S. Department of Transportation. Neither the Federal Aviation Administration nor the U.S. Department of Transportation shall be held liable for any improper or incorrect use of the information contained herein and assumes no responsibility for anyone’s use of the information. The Federal Aviation Administration and U.S. Department of Transportation shall not be liable for any claim for any loss, harm, or other damages arising from access to or use of data or information, including without limitation any direct, indirect, incidental, exemplary, special or consequential damages, even if advised of the possibility of such damages. The Federal Aviation Administration shall not be liable to anyone for any decision made or action taken, or not taken, in reliance on the information contained
Microsoft Geolife GPS Trajectory Dataset
kaggle.com
Updated Jun 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Möbius (2022). Microsoft Geolife GPS Trajectory Dataset [Dataset]. https://www.kaggle.com/datasets/arashnic/microsoft-geolife-gps-trajectory-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 27, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Möbius
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This GPS trajectory dataset was collected in (Microsoft Research) Geolife project by 178 users in a period of over four years (from April 2007 to October 2011). A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of 1,251,654 kilometers and a total duration of 48,203 hours. These trajectories were recorded by different GPS loggers and GPS-phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point.

Content

This dataset recoded a broad range of users’ outdoor movements, including not only life routines like go home and go to work but also some entertainments and sports activities, such as shopping, sightseeing, dining, hiking, and cycling.

Data Format - Trajectory file Every single folder of this dataset stores a user’s GPS log files, which were converted to PLT format. Each PLT file contains a single trajectory and is named by its starting time. To avoid potential confusion of time zone, we use GMT in the date/time property of each point, which is different from our previous release. - PLT format: Line 1…6 are useless in this dataset, and can be ignored. Points are described in following lines, one for each line. Field 1: Latitude in decimal degrees. Field 2: Longitude in decimal degrees. Field 3: All set to 0 for this dataset. Field 4: Altitude in feet (-777 if not valid). Field 5: Date - number of days (with fractional part) that have passed since 12/30/1899. Field 6: Date as a string. Field 7: Time as a string. Note that field 5 and field 6&7 represent the same date/time in this dataset. You may use either of them. Example: 39.906631,116.385564,0,492,40097.5864583333,2009-10-11,14:04:30 39.906554,116.385625,0,492,40097.5865162037,2009-10-11,14:04:35 - Transportation mode labels Possible transportation modes are: walk, bike, bus, car, subway, train, airplane, boat, run and motorcycle. Again, we have converted the date/time of all labels to GMT, even though most of them were created in China. Example: Start Time End TimeTransportation Mode 2008/04/02 11:24:21 2008/04/02 11:50:45 bus 2008/04/03 01:07:03 2008/04/03 11:31:55 train 2008/04/03 11:32:24 2008/04/03 11:46:14 walk 2008/04/03 11:47:14 2008/04/03 11:55:07 car

First, you can regard the label of both taxi and car as driving although we set them with different labels for future usage. Second, a user could label the transportation mode of a light rail as train while others may use subway as the label. Actually, no trajectory can be recorded in an underground subway system since a GPS logger cannot receive any signal there. In Beijing, the light rails and subway systems are seamlessly connected, e.g., line 13 (a light rail) is connected with line 10 and line 2, which are subway systems. Sometimes, a line (like line 5) is comprised of partial subways and partial light rails. So, users may have a variety of understanding in their transportation modes. You can differentiate the real train trajectories (connecting two cities) from the light rail trajectory (generating in a city) according to their distances. Or, just treat them the same.

More: User Guide: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/User20Guide-1.2.pdf

Citation

Please cite the following papers when using this GPS dataset. [1] Yu Zheng, Lizhu Zhang, Xing Xie, Wei-Ying Ma. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of International conference on World Wild Web (WWW 2009), Madrid Spain. ACM Press: 791-800.

[2] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, Wei-Ying Ma. Understanding Mobility Based on GPS Data. In Proceedings of ACM conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press: 312-321. [3] Yu Zheng, Xing Xie, Wei-Ying Ma, GeoLife: A Collaborative Social Networking Service among User, location and trajectory. Invited paper, in IEEE Data Engineering Bulletin. 33, 2, 2010, pp. 32-40.

Inspiration

This trajectory dataset can be used in many research fields, such as mobility pattern mining, user activity recognition, location-based social networks, location privacy, and location recommendation.
Retail dataset
kaggle.com
Updated Jul 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samyak (2022). Retail dataset [Dataset]. https://www.kaggle.com/datasets/braniac2000/retail-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 1, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Samyak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Sales records for the year 2011-2014 with 3 Product, 17 sub-categories over different segments is recorded. Objective is to expand the business in profitable regions based on the growth percentage and profits.

Data Dictionary

Order ID: A unique ID given to each order placed. Order Date: The date at which the order was placed. Customer Name: Name of the customer placing the order. Country: The country to which the customer belongs to. State: The state to which the customer belongs from the country. City:Detail about the city to which the customer resides in. Region: Contains the region details. Segment:The ordered product belongs to what segment. Ship Mode: The mode of shipping of the order to the customer location. Category: Contains the details about what category the product belongs to. Sub : Category: Contains the details about what sub - category the product belongs to. Product Name:The name of the product ordered by the customer. Discount: The discount applicable on a product. Sales: The actual sales happened for a particular order. Profit: Profit earned on an order. Quantity:The total quantity of the product ordered in a single order. Feedback: The feedback given by the customer on the complete shopping experience. If feedback provided, then TRUE. If no feedback provided, then FALSE.

Inspiration

This data-set can be helpful to analyze data to develop marketing strategies and to measure parameters like customer retention rate,churn rate etc.

Up-Vote⬆️ for more such dataset
w
Synthetic Data for an Imaginary Country, Sample, 2023 - World
microdata.worldbank.org
nada-demo.ihsn.org
Updated Jul 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
Explore at:
Dataset updated
Jul 7, 2023
Dataset authored and provided by
Development Data Group, Data Analytics Unit
Time period covered
2023
Area covered
World, World
Description
Abstract

The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

The full-population dataset (with about 10 million individuals) is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

Analysis unit

Household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

ssd

Sampling procedure

The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

Response rate

This is a synthetic dataset; the "response rate" is 100%.
Z
Wind WAVES TDSF Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wilson III, Lynn B (2024). Wind WAVES TDSF Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3911204
Explore at:
Dataset updated
Jul 10, 2024
Dataset authored and provided by
Wilson III, Lynn B
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wind Spacecraft:

The Wind spacecraft (https://wind.nasa.gov) was launched on November 1, 1994 and currently orbits the first Lagrange point between the Earth and sun. A comprehensive review can be found in Wilson et al. [2021]. It holds a suite of instruments from gamma ray detectors to quasi-static magnetic field instruments, Bo. The instruments used for this data product are the fluxgate magnetometer (MFI) [Lepping et al., 1995] and the radio receivers (WAVES) [Bougeret et al., 1995]. The MFI measures 3-vector Bo at ~11 samples per second (sps); WAVES observes electromagnetic radiation from ~4 kHz to >12 MHz which provides an observation of the upper hybrid line (also called the plasma line) used to define the total electron density and also takes time series snapshot/waveform captures of electric and magnetic field fluctuations, called TDS bursts herein.

WAVES Instrument:

The WAVES experiment [Bougeret et al., 1995] on the Wind spacecraft is composed of three orthogonal electric field antenna and three orthogonal search coil magnetometers. The electric fields are measured through five different receivers: Low Frequency FFT receiver called FFT (0.3 Hz to 11 kHz), Thermal Noise Receiver called TNR (4-256 kHz), Radio receiver band 1 called RAD1 (20-1040 kHz), Radio receiver band 2 called RAD2 (1.075-13.825 MHz), and the Time Domain Sampler (TDS). The electric field antenna are dipole antennas with two orthogonal antennas in the spin plane and one spin axis stacer antenna.

The TDS receiver allows one to examine the electromagnetic waves observed by Wind as time series waveform captures. There are two modes of operation, TDS Fast (TDSF) and TDS Slow (TDSS). TDSF returns 2048 data points for two channels of the electric field, typically Ex and Ey (i.e. spin plane components), with little to no gain below ~120 Hz (the data herein has been high pass filtered above ~150 Hz for this reason). TDSS returns four channels with three electric(magnetic) field components and one magnetic(electric) component. The search coils show a gain roll off ~3.3 Hz [e.g., see Wilson et al., 2010; Wilson et al., 2012; Wilson et al., 2013 and references therein for more details].

The original calibration of the electric field antenna found that the effective antenna lengths are roughly 41.1 m, 3.79 m, and 2.17 m for the X, Y, and Z antenna, respectively. The +Ex antenna was broken twice during the mission as of June 26, 2020. The first break occurred on August 3, 2000 around ~21:00 UTC and the second on September 24, 2002 around ~23:00 UTC. These breaks reduced the effective antenna length of Ex from ~41 m to 27 m after the first break and ~25 m after the second break [e.g., see Malaspina et al., 2014; Malaspina & Wilson, 2016].

TDS Bursts:

TDS bursts are waveform captures/snapshots of electric and magnetic field data. The data is triggered by the largest amplitude waves which exceed a specific threshold and are then stored in a memory buffer. The bursts are ranked according to a quality filter which mostly depends upon amplitude. Due to the age of the spacecraft and ubiquity of large amplitude electromagnetic and electrostatic waves, the memory buffer often fills up before dumping onto the magnetic tape drive. If the memory buffer is full, then the bottom ranked TDS burst is erased every time a new TDS burst is sampled. That is, the newest TDS burst sampled by the instrument is always stored and if it ranks higher than any other in the list, it will be kept. This results in the bottom ranked burst always being erased. Earlier in the mission, there were also so called honesty bursts, which were taken periodically to test whether the triggers were working properly. It was found that the TDSF triggered properly, but not the TDSS. So the TDSS was set to trigger off of the Ex signals.

A TDS burst from the Wind/WAVES instrument is always 2048 time steps for each channel. The sample rate for TDSF bursts ranges from 1875 samples/second (sps) to 120,000 sps. Every TDS burst is marked a unique set of numbers (unique on any given date) to help distinguish it from others and to ensure any set of channels are appropriately connected to each other. For instance, during one spacecraft downlink interval there may be 95% of the TDS bursts with a complete set of channels (i.e., TDSF has two channels, TDSS has four) while the remaining 5% can be missing channels (just example numbers, not quantitatively accurate). During another downlink interval, those missing channels may be returned if they are not overwritten. During every downlink, the flight operations team at NASA Goddard Space Fligth Center (GSFC) generate level zero binary files from the raw telemetry data. Those files are filled with data received on that date and the file name is labeled with that date. There is no attempt to sort chronologically the data within so any given level zero file can have data from multiple dates within. Thus, it is often necessary to load upwards of five days of level zero files to find as many full channel sets as possible. The remaining unmatched channel sets comprise a much smaller fraction of the total.

All data provided here are from TDSF, so only two channels. Most of the time channel 1 will be associated with the Ex antenna and channel 2 with the Ey antenna. The data are provided in the spinning instrument coordinate basis with associated angles necessary to rotate into a physically meaningful basis (e.g., GSE).

TDS Time Stamps:

Each TDS burst is tagged with a time stamp called a spacecraft event time or SCET. The TDS datation time is sampled after the burst is acquired which requires a delay buffer. The datation time requires two corrections. The first correction arises from tagging the TDS datation with an associated spacecraft major frame in house keeping (HK) data. The second correction removes the delay buffer duration. Both inaccuracies are essentially artifacts of on ground derived values in the archives created by the WINDlib software (K. Goetz, Personal Communication, 2008) found at https://github.com/lynnbwilsoniii/Wind_Decom_Code.

The WAVES instrument's HK mode sends relevant low rate science back to ground once every spacecraft major frame. If multiple TDS bursts occur in the same major frame, it is possible for the WINDlib software to assign them the same SCETs. The reason being that this top-level SCET is only accurate to within +300 ms (in 120,000 sps mode) due to the issues described above (at lower sample rates, the error can be slightly larger). The time stamp uncertainty is a positive definite value because it results from digitization rounding errors. One can correct these issues to within +10 ms if using the proper HK data.

*** The data stored here have not corrected the SCETs! ***

The 300 ms uncertainty, due to the HK corrections mentioned above, results from WINDlib trying to recreate the time stamp after it has been telemetered back to ground. If a burst stays in the TDS buffer for extended periods of time (i.e., >2 days), the interpolation done by WINDlib can make mistakes in the 11th significant digit. The positive definite nature of this uncertainty is due to rounding errors associated with the onboard DPU (digital processing unit) clock rollover. The DPU clock is a 24 bit integer clock sampling at ∼50,018.8 Hz. The clock rolls over at ∼5366.691244092221 seconds, i.e., (16*224)/50,018.8. The sample rate is a temperature sensitive issue and thus subject to change over time. From a sample of 384 different points on 14 different days, a statistical estimate of the rollover time is 5366.691124061162 ± 0.000478370049 seconds (calculated by Lynn B. Wilson III, 2008). Note that the WAVES instrument team used UR8 times, which are the number of 86,400 second days from 1982-01-01/00:00:00.000 UTC.

The method to correct the SCETs to within +10 ms, were one to do so, is given as follows:

Retrieve the DPU clock times, SCETs, UR8 times, and DPU Major Frame Numbers from the WINDlib libraries on the VAX/ALPHA systems for the TDSS(F) data of interest.

Retrieve the same quantities from the HK data.

Match the HK event number with the same DPU Major Frame Number as the TDSS(F) burst of interest.

Find the difference in DPU clock times between the TDSS(F) burst of interest and the HK event with matching major frame number (Note: The TDSS(F) DPU clock time will always be greater than the HK DPU clock if they are the same DPU Major Frame Number and the DPU clock has not rolled over).

Convert the difference to a UR8 time and add this to the HK UR8 time. The new UR8 time is the corrected UR8 time to within +10 ms.

Find the difference between the new UR8 time and the UR8 time WINDlib associates with the TDSS(F) burst. Add the difference to the DPU clock time assigned by WINDlib to get the corrected DPU clock time (Note: watch for the DPU clock rollover).

Convert the new UR8 time to a SCET using either the IDL WINDlib libraries or TMLib (STEREO S/WAVES software) libraries of available functions. This new SCET is accurate to within +10 ms.

One can find a UR8 to UTC conversion routine at https://github.com/lynnbwilsoniii/wind_3dp_pros in the ~/LYNN_PRO/Wind_WAVES_routines/ folder.

Examples of good waveforms can be found in the notes PDF at https://wind.nasa.gov/docs/wind_waves.pdf.

Data Set Description

Each Zip file contains 300+ IDL save files; one for each day of the year with available data. This data set is not complete as the software used to retrieve and calibrate these TDS bursts did not have sufficient error handling to handle some of the more nuanced bit errors or major frame errors in some of the level zero files. There is currently (as of June 27, 2020) an effort (by Keith Goetz et al.) to generate the entire TDSF and TDSS data set in one repository to be put on SPDF/CDAWeb as CDF files. Once that data set is available, it will supercede
Trips by Distance
catalog.data.gov
s.cnmilf.com
Updated Feb 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Transportation Statistics (2023). Trips by Distance [Dataset]. https://catalog.data.gov/dataset/trips-by-distance
Explore at:
Dataset updated
Feb 1, 2023
Dataset provided by
Bureau of Transportation Statisticshttp://www.rita.dot.gov/bts
Description
Updates are delayed due to technical difficulties. How many people are staying at home? How far are people traveling when they don’t stay home? Which states and counties have more people taking trips? The Bureau of Transportation Statistics (BTS) now provides answers to those questions through our new mobility statistics. The Trips by Distance data and number of people staying home and not staying home are estimated for the Bureau of Transportation Statistics by the Maryland Transportation Institute and Center for Advanced Transportation Technology Laboratory at the University of Maryland. The travel statistics are produced from an anonymized national panel of mobile device data from multiple sources. All data sources used in the creation of the metrics contain no personal information. Data analysis is conducted at the aggregate national, state, and county levels. A weighting procedure expands the sample of millions of mobile devices, so the results are representative of the entire population in a nation, state, or county. To assure confidentiality and support data quality, no data are reported for a county if it has fewer than 50 devices in the sample on any given day. Trips are defined as movements that include a stay of longer than 10 minutes at an anonymized location away from home. Home locations are imputed on a weekly basis. A movement with multiple stays of longer than 10 minutes before returning home is counted as multiple trips. Trips capture travel by all modes of transportation. including driving, rail, transit, and air. The daily travel estimates are from a mobile device data panel from merged multiple data sources that address the geographic and temporal sample variation issues often observed in a single data source. The merged data panel only includes mobile devices whose anonymized location data meet a set of data quality standards, which further ensures the overall data quality and consistency. The data quality standards consider both temporal frequency and spatial accuracy of anonymized location point observations, temporal coverage and representativeness at the device level, spatial representativeness at the sample and county level, etc. A multi-level weighting method that employs both device and trip-level weights expands the sample to the underlying population at the county and state levels, before travel statistics are computed. These data are experimental and may not meet all of our quality standards. Experimental data products are created using new data sources or methodologies that benefit data users in the absence of other relevant products. We are seeking feedback from data users and stakeholders on the quality and usefulness of these new products. Experimental data products that meet our quality standards and demonstrate sufficient user demand may enter regular production if resources permit.
Fuτure - dataset for studies, development, and training of algorithms for...
data.europa.eu
unknown
Updated Jul 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). Fuτure - dataset for studies, development, and training of algorithms for reconstructing and identifying hadronically decaying tau leptons [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-12664634?locale=lt
Explore at:
unknown(37947)Available download formats
Dataset updated
Jul 17, 2024
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data description MC Simulation The Fuτure dataset is intended for studies, development, and training of algorithms for reconstructing and identifying hadronically decaying tau leptons. The dataset is generated with Pythia 8, with the full detector simulation being performed by Geant4 with the CLIC-like detector setup CLICdet (CLIC_o3_v14) setup. Events are reconstructed using the Marlin reconstruction framework and interfaced with Key4HEP. Particle candidates in the reconstructed events are reconstructed using the PandoraPF algorithm. In this version of the dataset no γγ -> hadrons background is included. Samples This dataset contains e+e- samples with Z->ττ, ZH->Zττ and Z->qq events, with approximately 2 million events simulated in each category. The following processes e+e- were simulated with Pythia 8 at sqrt(s) = 380 GeV: p8_ee_qq_ecm380 [Z -> qq events] p8_ee_ZH_Htautau [ZH -> Ztautau] p8_ee_Z_Ztautau_ecm380 [ZH -> Ztautau] The .root files from the MC simulation chain are eventually processed by the software found in Github in order to create flat ntuples as the final product. Features The basis of the ntuples are the particle flow (PF) candidates from PandoraPF. Each PF candidate has four momenta, charge and particle label (electron / muon / photon / charged hadron / neutral hadron). The PF candidates in a given event are clustered into jets using generalized kt algorithm for ee collisions, with parameters p=-1 and R=0.4. The minimum pT is set to be 0 GeV for genJets and 5 GeV for reconstructed jets. The dataset contains the four momenta of the jets, with the PF candidates in the jets with the above listed properties. Additionally, a set of variables describing the tau lifetime are calculated using the software in Github. As tau lifetime is very short, these variables are sensitive to true tau decays. In the calculation of these lifetime variables, we use a linear approximation. In summary, the features found in the flat ntuples are: Name Description reco_cand_p4s 4-momenta per particle in the reco jet. reco_cand_charge Charge per particle in the jet. reco_cand_pdg PDGid per particle in the jet. reco_jet_p4s RecoJet 4-momenta. reco_cand_dz Longitudinal impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated. reco_cand_dz_err Uncertainty of the longitudinal impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated. reco_cand_dxy Transverse impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated. reco_cand_dxy_err Uncertainty of the transverse impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated. gen_jet_p4s GenJet 4-momenta. Matched with RecoJet within a cone of radius dR < 0.3. gen_jet_tau_decaymode Decay mode of the associated genTau. Jets that have associated leptonically decaying taus are removed, so there are no DM=16 jets. If no GenTau can be matched to GenJet within dR < 0.4, a fill value is used. gen_jet_tau_p4s Visible 4-momenta of the genTau. If no GenTau can be matched to GenJet within dR<0.4, a fill value is used. The ground truth is based on stable particles at the generator level, before detector simulation. These particles are clustered into generator-level jets and are matched to generator-level τ leptons as well as reconstructed jets. In order for a generator-level jet to be matched to generator-level τ lepton, the τ lepton needs to be inside a cone of dR = 0.4. The same applies for the reconstructed jet, with the requirement on dR being set to dR = 0.3. For each reconstructed jet, we define three target values related to τ lepton reconstruction: a binary flag isTau if it was matched to a generator-level hadronically decaying τ lepton. gen_jet_tau_decaymode of value -1 indicates no match to generator-level hadronically decaying τ. the categorical decay mode of the τ gen_jet_tau_decaymode in terms of the number of generator level charged and neutral hadrons. Possible gen_jet_tau_decaymode are {0, 1, . . . , 15}. if matched, the visible (neglecting neutrinos), reconstructable pT of the τ lepton. This is inferred from the gen_jet_tau_p4s Contents: qq_test.parquet qq_train.parquet zh_test.parquet zh_train.parquet z_test.parquet z_train.parquet data_intro.ipynb Dataset characteristics File # Jets Size z_test.parquet 460 382 101.00 MB z_train.parquet 1 841 526 404.01M B zh_test.parquet 521 977 116.44 MB zh_train.parquet 2 087 907 466.26 MB qq_test.parquet 949 958 496.89 MB qq_train.parquet 3 799 829 1.99 GB The dataset consists of 6 files of 3.4 GB in total. How can you use these data? The .parquet files can be directly loaded with the Awkward Array Python library.An example how one might use the dataset and the features is given in data_intro.ipynb
Bank dataset
kaggle.com
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Santhosh (2023). Bank dataset [Dataset]. https://www.kaggle.com/datasets/santhoshs623/bank-dataset/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Santhosh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description: The dataset is intentionally provided for data cleansing and applying EDA techniques. This brings fun exploring and wrangling for data geeks. The data is very original so dive-in and Happy Exploring.

Features: In total the dataset contains 121 Features. Details given below.

SK_ID_CURR ID of loan in our sample TARGET Target variable (1 - client with payment difficulties: he/she had late payment more than X days on at least one of the first Y installments of the loan in our sample, 0 - all other cases) NAME_CONTRACT_TYPE Identification if loan is cash or revolving CODE_GENDER Gender of the client FLAG_OWN_CAR Flag if the client owns a car FLAG_OWN_REALTY Flag if client owns a house or flat CNT_CHILDREN Number of children the client has AMT_INCOME_TOTAL Income of the client AMT_CREDIT Credit amount of the loan AMT_ANNUITY Loan annuity AMT_GOODS_PRICE For consumer loans it is the price of the goods for which the loan is given NAME_TYPE_SUITE Who was accompanying client when he was applying for the loan NAME_INCOME_TYPE Clients income type (businessman, working, maternity leave,…) NAME_EDUCATION_TYPE Level of highest education the client achieved NAME_FAMILY_STATUS Family status of the client NAME_HOUSING_TYPE What is the housing situation of the client (renting, living with parents, ...) REGION_POPULATION_RELATIVE Normalized population of region where client lives (higher number means the client lives in more populated region) DAYS_BIRTH Client's age in days at the time of application DAYS_EMPLOYED How many days before the application the person started current employment DAYS_REGISTRATION How many days before the application did client change his registration DAYS_ID_PUBLISH How many days before the application did client change the identity document with which he applied for the loan OWN_CAR_AGE Age of client's car FLAG_MOBIL Did client provide mobile phone (1=YES, 0=NO) FLAG_EMP_PHONE Did client provide work phone (1=YES, 0=NO) **FLAG_WORK_PHONE ** Did client provide home phone (1=YES, 0=NO) FLAG_CONT_MOBILE Was mobile phone reachable (1=YES, 0=NO) FLAG_PHONE Did client provide home phone (1=YES, 0=NO) FLAG_EMAIL Did client provide email (1=YES, 0=NO) OCCUPATION_TYPE What kind of occupation does the client have CNT_FAM_MEMBERS How many family members does client have REGION_RATING_CLIENT Our rating of the region where client lives (1,2,3) REGION_RATING_CLIENT_W_CITY Our rating of the region where client lives with taking city into account (1,2,3) WEEKDAY_APPR_PROCESS_START On which day of the week did the client apply for the loan HOUR_APPR_PROCESS_START Approximately at what hour did the client apply for the loan REG_REGION_NOT_LIVE_REGION Flag if client's permanent address does not match contact address (1=different, 0=same, at region level) REG_REGION_NOT_WORK_REGION Flag if client's permanent address does not match work address (1=different, 0=same, at region level) LIVE_REGION_NOT_WORK_REGION Flag if client's contact address does not match work address (1=different, 0=same, at region level) REG_CITY_NOT_LIVE_CITY Flag if client's permanent address does not match contact address (1=different, 0=same, at city level) REG_CITY_NOT_WORK_CITY Flag if client's permanent address does not match work address (1=different, 0=same, at city level) LIVE_CITY_NOT_WORK_CITY Flag if client's contact address does not match work address (1=different, 0=same, at city level) ORGANIZATION_TYPE Type of organization where client works EXT_SOURCE_1 Normalized score from external data source EXT_SOURCE_2 Normalized score from external data source EXT_SOURCE_3 Normalized score from external data source APARTMENTS_AVG Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor BASEMENTAREA_AVG Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor YEARS_BEGINEXPLUATATION_AVG Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MEDI suffix) apartment size, common area, living area, age of building, number of elevators, number of entrances, state of the building, number of floor YEARS_BUILD_AVG Normalized information about building where the client lives, What is average (_AVG suffix), modus (_MODE suffix), median (_MED...
ECOSTRESS Gridded Evaporative Stress Index PT-JPL Instantaneous L4 Global 70...
data.nasa.gov
Updated May 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2024). ECOSTRESS Gridded Evaporative Stress Index PT-JPL Instantaneous L4 Global 70 m - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/ecostress-gridded-evaporative-stress-index-pt-jpl-instantaneous-l4-global-70-m
Explore at:
Dataset updated
May 10, 2024
Dataset provided by
NASAhttp://nasa.gov/
Description
The ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station (ECOSTRESS) mission measures the temperature of plants to better understand how much water plants need and how they respond to stress. ECOSTRESS is attached to the International Space Station (ISS) and collects data globally between 52° N and 52° S latitudes. A map of the acquisition coverage can be found on the ECOSTRESS website. The ECOSTRESS Gridded Evaporative Stress Index PT-JPL Instantaneous L4 Global 70 m (ECO_L4G_ESI) Version 2 data product uses the Priestley-Taylor Jet Propulsion Laboratory Soil Moisture (PT-JPL-SM) model to generate estimates of both actual and potential instantaneous evapotranspiration (ET). The potential evapotranspiration (PET) estimate represents the maximum expected ET if there were no water stress to plants on the ground. The ratio of the actual ET estimate to the PET estimate forms an index representing the water stress of plants. The ECO_L4G_ESI Version 2 data product is available globally and is projected to a globally snapped 0.0006° grid with a 70 meter spatial resolution and is distributed in HDF5. Each granule contains layers of Evaporative Stress Index (ESI), PET, cloud mask, and water mask. A low-resolution browse is also available showing daily ESI as a stretched image with a color ramp in JPEG format. Known Issues: Data acquisition gap: ECOSTRESS was launched on June 29, 2018, and moved to autonomous science operations on August 20, 2018, following a successful in-orbit checkout period. On September 29, 2018, ECOSTRESS experienced an anomaly with its primary mass storage unit (MSU). ECOSTRESS has a primary and secondary MSU (A and B). On December 5, 2018, the instrument was switched to the secondary MSU, and science operations resumed. On March 14, 2019, the secondary MSU experienced a similar anomaly, temporarily halting science acquisitions. On May 15, 2019, a new data acquisition approach was implemented, and science acquisitions resumed. To optimize the new acquisition approach, only Thermal Infrared (TIR) bands 2, 4, and 5 are being downloaded. The data products are the same as before, but the bands not downloaded contain fill values (L1 radiance and L2 emissivity). This approach was implemented from May 15, 2019, through April 28, 2023. Data acquisition gap: From February 8 to February 16, 2020, an ECOSTRESS instrument issue resulted in a data anomaly that created striping in band 4 (10.5 micron). These data products have been reprocessed and are available for download. No ECOSTRESS data were acquired on February 17, 2020, due to the instrument being in SAFEHOLD. Data acquired following the anomaly have not been affected. *Data acquisition: ECOSTRESS has now successfully returned to 5-band mode after being in 3-band mode since 2019. This feature was successfully enabled following a Data Processing Unit firmware update (version 4.1) to the payload on April 28, 2023. To better balance contiguous science data scene variables, 3-band collection is currently being interleaved with 5-band acquisitions over the orbital day/night periods.
E
maracoos_02-20210503T1937 Delayed Mode Raw Time Series
slocum-data.marine.rutgers.edu
Updated Jun 13, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Kerfoot (2021). maracoos_02-20210503T1937 Delayed Mode Raw Time Series [Dataset]. https://slocum-data.marine.rutgers.edu/erddap/info/maracoos_02-20210503T1937-trajectory-raw-delayed/index.html
Explore at:
Dataset updated
Jun 13, 2021
Authors
John Kerfoot
Time period covered
May 3, 2021 - May 13, 2021
Area covered

Variables measured
crs, time, c_fin, depth, m_fin, m_lat, m_lon, c_roll, m_roll, c_pitch, and 136 more
Description
This project supports the deployment and realtime data delivery of autonomous underwater gliders in the coastal ocean to better resolve and understand essential ocean features and processes that contribute to hurricane intensification or weakening prior to making landfall. This is a partnership between NOAA Ocean and Atmospheric Research (OAR) through the Atlantic Oceanographic and Meteorological Laboratory (AOML) and Integrated Ocean Observing System (IOOS) regional associations such as MARACOOS, SECOORA, CariCOOS and institutions including the University of Puerto Rico, University of the Virgin Islands, Skidaway Institute of Oceanography, University of Delaware, and Rutgers University. The goal of the project is to provide realtime data for ocean model validation and assimilation throughout hurricane season. This project is supported by the Disaster Recovery Act. The glider was deployed out of Tuckerton, NJ and will transect offshore along the Endurance line to the shelf break, south along the break then back toward Tuckerton, NJ with an inshore point at about 30 meters depth. This dataset contains CTD measurements from both a Seabird Scientific pumped CTD and an RBRlegato3 inductive CTD. This is the fourth deployment of this glider of the 2020 Atlantic hurricane season with the same mission objectives as the previous. This is a ‘check-out’ deployment for the newly purchased glider to determine if there are any hardware issues that need to be addressed ahead of hurricane season. Delayed mode dataset _NCProperties=version=1|netcdflibversion=4.6.1|hdf5libversion=1.10.3 acknowledgment=This deployment is supported by funding from NOAA Award #NA19OAR0220189, and is coordinated between a partnership of IOOS, OAR, AOML, Navy, academic researchers, and private industry. cdm_data_type=Trajectory cdm_trajectory_variables=trajectory comment=Deployed by Nicole Waite and Chip Haldeman aboard the R/V Rutgers out of Tuckerton, NJ, USA contributor_name=Travis Miles,Scott Glenn,John Kerfoot contributor_role=Principal Investigator,Principal Investigator,Data Management Conventions=CF-1.6, COARDS, ACDD-1.3 defaultGraphQuery=longitude,latitude,time&.draw=markers&.marker=6%7C3&.color=0xFFFFFF&.colorBar=Rainbow2%7C%7C%7C%7C%7C&.bgColor=0xffccccff deployment_name=maracoos_02-20210503T1937 Easternmost_Easting=-73.852945 featureType=Trajectory geospatial_bounds=POLYGON ((39.52076166666667 -74.29428, 39.52076166666667 -74.28723333333333, 39.51332 -74.28723333333333, 39.51332 -74.29428, 39.52076166666667 -74.29428)) geospatial_bounds_crs=EPSG:4326 geospatial_bounds_vertical_crs=EPSG:5831 geospatial_lat_max=39.520761666666665 geospatial_lat_min=39.241443333333336 geospatial_lat_resolution=0.00001 degree geospatial_lat_units=degrees_north geospatial_lon_max=-73.852945 geospatial_lon_min=-74.29428 geospatial_lon_resolution=0.00001 degree geospatial_lon_units=degrees_east geospatial_verical_resolution=0 geospatial_vertical_max=35.57464 geospatial_vertical_min=-0.07939248 geospatial_vertical_positive=down geospatial_vertical_units=m gts_ingest=True history=2021-06-13T00:41:03Z: /tmp/tmpwfl4ghpe/TrajectoryNetCDFWriter.py8tcj45wf.nc created 2021-06-13T00:41:03Z: /home/kerfoot/code/glider-proc/scripts/proc_deployment_trajectories_to_nc.py /home/coolgroup/slocum/deployments/2021/maracoos_02-20210503T1937/data/in/ascii/dbd/maracoos_02_2021_132_0_0_dbd.dat

id=maracoos_02-20210503T1937 infoUrl=https://www.udel.edu/academics/colleges/ceoe/,https://rucool.marine.rutgers.edu institution=Rutgers University instrument=In Situ/Laboratory Instruments > Profilers/Sounders > CTD instrument_vocabulary=NASA/GCMD Instrument Keywords Version 8.5 keywords_vocabulary=NASA/GCMD Earth Sciences Keywords Version 8.5 naming_authority=edu.rutgers.rucool ncei_template_version=NCEI_NetCDF_Trajectory_Template_v2.0 Northernmost_Northing=39.520761666666665 platform=In Situ Ocean-based Platforms > AUVS > Autonomous Underwater Vehicles platform_type=Slocum Glider platform_vocabulary=NASA/GCMD Platforms Keywords Version 8.5 processing_level=Raw Slocum glider time-series dataset from the native data file format. No quality control provided. Delayed mode dataset program=Hurricane Glider Program project=Sustained Underwater Glider Observations for Improving Atlantic Tropical Cyclone Intensity Forecast references=https://sites.udel.edu/ceoe-moliver/,https://www.udel.edu/academics/colleges/ceoe/departments/smsp/,https://rucool.marine.rutgers.edu sea_name=Mid-Atlantic Bight source=Observational Slocum glider data from source dba file maracoos_02-2021-132-0-0-dbd(00300000) sourceUrl=(local files) Southernmost_Northing=39.241443333333336 standard_name_vocabulary=CF Standard Name Table v27 subsetVariables=source_file time_coverage_duration=PT03M07.487S time_coverage_end=2021-05-13T16:40:30Z time_coverage_resolution=PTS time_coverage_start=2021-05-03T19:37:44Z uuid=b6c49372-d166-4ccb-95a4-087c804188a0 Westernmost_Easting=-74.29428 wmo_id=4802995 wmo_platform_code=4802995
d
Data from: Modeling of failure mode in knee ligaments depending on the...
catalog.data.gov
odgavaprod.ogopendata.com
+1more
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). Modeling of failure mode in knee ligaments depending on the strain rate [Dataset]. https://catalog.data.gov/dataset/modeling-of-failure-mode-in-knee-ligaments-depending-on-the-strain-rate
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
National Institutes of Health
Description
Background The failure mechanism of the knee ligament (bone-ligament-bone complex) at different strain rates is an important subject in the biomechanics of the knee. This study reviews and summarizes the literature describing ligament injury as a function of stain rate, which has been published during the last 30 years. Methods Three modes of injury are presented as a function of strain rate, and they are used to analyze the published cases. The number of avulsions is larger than that of ligament tearing in mode I. There is no significant difference between the number of avulsions and ligament tearing in mode II. Ligament tearing happens more frequently than avulsion in mode III. Results When the strain rate increases, the order of mode is mode I, II, III, I, and II. Analytical models of ligament behavior as a function of strain rate are also presented and used to provide an integrated framework for describing all of the failure regimes. In addition, this study showed the failure mechanisms with different specimens, ages, and strain rates. Conclusion There have been several a numbers of studies of ligament failure under various conditions including widely varying strain rates. One issue in these studies is whether ligament failure occurs mid-ligament or at the bone attachment point, with assertions that this is a function of the strain rate. However, over the range of strain rates and other conditions reported, there has appeared to be discrepancies in the conclusions on the effect of strain rate. The analysis and model presented here provides a unifying assessment of the previous disparities, emphasizing the differential effect of strain rate on the relative strengths of the ligament and the attachment.
i
Vadu HDSS INDEPTH Core Dataset 2009 - 2015 (Release 2017) - India
datacatalog.ihsn.org
catalog.ihsn.org
Updated Mar 29, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Siddhivinayak Hirve (Founding Investigator: from 2002-2009) (2019). Vadu HDSS INDEPTH Core Dataset 2009 - 2015 (Release 2017) - India [Dataset]. https://datacatalog.ihsn.org/catalog/study/IND_2009-2015_INDEPTH-VHDSS_v01_M
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
Dr. Siddhivinayak Hirve (Founding Investigator: from 2002-2009)
Dr. Sanjay Juvekar (Founding Co-Investigator and presently Investigator: 2002 to date)
Time period covered
2009 - 2015
Area covered
India
Description
Abstract

Vadu Rural Health Program, KEM Hospital Research Centre Pune has a rich tradition in health care and development being in the forefront of needs-based, issue-driven research over almost 35 years. During the decades of 1980 and 1990 the research at Vadu focused on mother and child with epidemiological and social science research exploring low birth weight, child survival, maternal mortality, safe abortion and domestic violence. The research portfolio has ever since expanded to include adult health and aging, non-communicable and communicable diseases and to clinical trials in recent years. It started with establishment of Health and Demographic Surveillance System at Vadu (HDSS Vadu) in August, 2002 that seeks to establish a quasi-experimental design setting to allow evaluation of impact of health interventions as well as monitor secular trends in diseases, risk factors and health behavior of humans.

The term "demographic surveillance" means to keep close track of the population dynamics. Vadu HDSS deals with keeping track of health issues and demographic changes in Vadu rural health program (VRHP) area. It is one of the most promising projects of national relevance that aims at establishing a quasi-experimental intervention research setting with the following objectives: 1) To create a longitudinal data base for efficient service delivery, future research, and linking all past micro-studies in Vadu area 2) Monitoring trends in public health problems 3) Keeping track of population dynamics 4) Evaluating intervention services

This dataset contains the events of all individuals ever resident during the study period (1 Jan. 2009 to 31 Dec. 2015).

Geographic coverage

Vadu HDSS falls in two administrative blocks: (1) Shirur and (2) Haweli of Pune district in Maharashtra in western India. It covers an area of approximately 232 square kilometers.

Analysis unit

Individual

Universe

Vadu HDSS covers as many as 50,000 households having 140,000 population spread across 22 villages.

Kind of data

Event history data

Frequency of data collection

Two rounds per year

Sampling procedure

Vadu area including 22 villages in two administrative blocks is the study area. This area was selected as this is primarily coverage area of Vadu Rural Health Program which is in function since more than four decade. Every individual household is included in HDSS. There is no sampling strategy employed as 100% population coverage in the area is expected.

Mode of data collection

Proxy Respondent [proxy]

Research instrument

Language of communication is in Marath or Hindi. The form labels are multilingual - in English and Marathi, but the data entered through the forms are in English only.

The following forms were used: - Field Worker Checklist Form - The checklist provides a guideline to ensure that all the households are covered during the round and the events occurred in each household are captured. - Enumeration Form: To capture the population details at the start of the HDSS or any addition of villages afterwards. - Pregnancy Form: To capture pregnancy details of women in the age group 15 to 49. - Birth Form: To capture the details of the birth events.
- Inmigration Form: To capture inward population movement from outside the HDSS area and also for movement within the HDSS area. - Outmigration Form: To capture outward population movement from inside the HDSS area and also for movement within the HDSS area. - Death Form: To capture death events.

Cleaning operations

Entered data undergo a data cleaning process. During the cleaning process all error data are either corrected in consultaiton with the data QC team or the respective forms are sent back to the field for re collection of correct data. Data editors have the access to the raw dataset for making necessary editing after corrected data are bought from the field.

For all individuals whose enumeration (ENU), Inmigration (IMG) or Birth (BTH) have occurred before the left censoring date (2009-01-01) and have not outmigrated (OMG) or not died (DTH) before the left censoring date (2009-01-01) are included in the dataset as Enumeration (ENU) with EventDate as the left censored date (2009-01-01). But the actual date of observation of the event (ENU, BTH, IMG) is retained in the dataset as observation date for these left censored ENU events. The individual is dropped from the dataset if their end event (OMG or DTH) is prior to the left censoring date (2009-01-01)

Response rate

On an average the response rate is 99.99% in all rounds over the years.

Sampling error estimates

Not Applicable

Data appraisal

Data is cleaned to an acceptable level against the standard data rules using Pentaho Data Integration Comminity Edition (PDI CE) tool. After the cleaning process, quality metrics were as follows:

CentreId MetricTable QMetric Illegal Legal Total Metric RunDate IN021 MicroDataCleaned Starts 1 301112 301113 0. 2017-05-31 20:06
IN021 MicroDataCleaned Transitions 0 667010 667010 0. 2017-05-31 20:07
IN021 MicroDataCleaned Ends 301113 2017-05-31 20:07
IN021 MicroDataCleaned SexValues 29 666981 667010 0. 2017-05-31 20:07
IN021 MicroDataCleaned DoBValues 575 666435 667010 0. 2017-05-31 20:07

Note: Except lower under five mortality in 2012 and lower adult mortality among females in 2013, all other estimates are fairly within expected range. Data underwent additional review in terms of electronic data capture, data cleaning and management to look for reasons for lower under five mortality rates in 2013 and lower female adult mortality in 2013. The additional review returned marginally higher rates and this supplements the validity of collected data. Further field related review of 2012 and 2013 data are underway and any revisions to published data/figures will be shared at a later stage.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). Swift XRT Combined Intensity Images - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/5318a3b8-6f78-5681-b5c9-0c5d1c79d8f6

Swift XRT Combined Intensity Images - Dataset - B2FIND

Explore at:

Dataset updated

Aug 13, 2024

Description

The Swift XRT (Burrows et al 2005, SSRv, 120, 165) is a sensitive, broad-band (0.2 - 10 keV) X-ray imager with an effective area of about 125 cm**2 at 1.5 keV. The 600 x 600 pixel CCD at the focus provides a 23.6' x 23.6' field of view with a pixel scale of 2.36". The point spread function is 18" (HPD) at 1.5 keV. These XRT surveys represent the data from the first 12.5 years of Swift X-ray observations. They include all data taken in photon counting mode. A total of just over 8% of the sky has some non-zero exposure. The fraction of sky exposed as a function of the exposure is given in the following table: Exposure>0 10 30 100 300 1000 3000 1000 30000 100000300000 Coverage 8.42 8.37 8.29 7.67 7.29 5.68 3.40 1.26 0.35 0.044 0.00118 The individual exposure and counts maps have been combined into a Hierarchical Progressive Survey (HiPS) where the data are stored in tiles in the HEALPix projection at a number of different resulutions. The highest resolution pixels (HEALPix order 17) have a size of roughly 1.6". Data are also stored at lower resolutions at factors of 1/2, 1/4, 1/8, 1/16, and 1/32, and in an all sky image with a resolution 1/256 of the higest resolution. An intensity map has been created as the ratio of the counts and exposure maps. These surveys combine the basic count and exposure maps provided as standard products in the Swift XRT archive in obsid/xrt/products/*xpc_(sk|ex).img.gz. The surveys were created as follows: All of the exposure maps available in the archive in mid-May 2017 were combined using the CDS-developed Hipsgen tool. This includes 129,063 observations for which both count and exposure files were found in PC mode. Three exposures where there was a counts map but no exposure map were ignored. A few exposure files had more than one exposure extension. 1,082 files had two extensions and 1 file had 3 extensions. The 1084 HDUs in extensions were extracted as separate files and included in the total exposure. The value of 0 was given to the Hipsgen software as the null value for the FITS files. This caused the CDS software to treat such pixels as missing rather than 0 exposure. The counts data was extracted from the counts maps for each observation using SkyView developed software. For any pixel in which a count was recorded, the corresponding exposure file was checked and if there was any exposure (in any of the associated extensions), then the count was retained. If there was no exposure in any of the extensions of the corresponding exposure file, the counts in the pixel were omitted. Once a count was accepted, the overlap between the counts map pixel and the pixels of the corresponding HiPS tile (or tiles) was computed. Each count was then assigned entirely to a single pixel in the HiPS tile randomly but with the destination pixel probabilities weighted by area of the overlap. Thus if several pixels were found in a given counts map pixel they might be assigned to different pixels in the output image. The HiPS pixels (~1.6") used were of substantially higher resolution than the XRT resolution of 18" and somewhat higher than the counts map resolution of 2.36". A total of 183,750,428 photons were extracted from the counts maps while 15,226 were rejected as being from pixels with 0 exposure. There were 501 pixels which required special treatment as straddling the boundaries of the HEALPix projection. The resulting counts tiles were then clipped using the exposure tiles that had been previously generated. Basically this transferred the coverage of the exposure tiles to the counts tiles. Any counts pixel where the corresponding exposure pixel was a NaN was changed to a NaN to indicate that there was no coverage in this region. During the clipping process 137,730 HiPS level 8 were clipped (of 786,432 over the entire sky). There were 12,236 tiles for which there was some exposure but no counts found. During the clipping process 2 photons were found on pixels where there was no corresponding exposure in the exposure tiles. This can happen when the pixel assignment process noted above shifts a photon just outside the exposed region but should be -- as it was -- rare. These photons were deleted. After creating the clipped level 8 counts maps, level 7 to 3 tiles and an all sky map where generated by averaging pixels 2x2 to decrease each level. When adding the four pixels in the level N map together only pixels whose value was not NaN were considered. Finally an intensity map was created by dividing the counts tiles by the exposure tiles. To eliminate gross fluctuations due to rare counts in regions with very low exposure, only regions with exposure > 1 second were retained. A total of 30 photons were deleted due to this criterion. Note that while any sampler may in principle be used with these data, the Spline sampler may give unexpected results. The spline computation propogates NaNs thought the image and means that even occasional NaNs can corrupt the output image completely. NaNs are very common in this dataset. Also, if the region straddles a boundary in the HEALPix projection, the size of the requested input region is likely to exceed memory limits since the HiPS data are considered a single very large image. Provenance: Data generated from public images at HEASARC archive. This is a service of NASA HEASARC.

Clear search

Close search

Google apps

Main menu

Swift XRT Combined Intensity Images - Dataset - B2FIND

Graphite//LFP synthetic training prognosis dataset

A geometric shape regularity effect in the human brain: fMRI dataset

A geometric shape regularity effect in the human brain: fMRI dataset

Abstract

Notes about this dataset

SPD24 - Student Performance Data revised Features

Explainable AI (XAI) Drilling Dataset

Data from: FISBe: A real-world benchmark dataset for instance segmentation...

optional:import numpy as npraw_np = np.array(raw)

DTDC Courier Dataset

DTDC Courier Dataset

Dataset Overview

Sample Use Cases

Column-Wise Schema (Descriptions)

Note:

Global conversion of the M2-tide (modes 1-10) and the role of supercritical...

Estimated stand-off distance between ADS-B equipped aircraft and obstacles

Microsoft Geolife GPS Trajectory Dataset

Context

Content

Citation

Inspiration

Retail dataset

Context

Data Dictionary

Inspiration

Up-Vote⬆️ for more such dataset

Synthetic Data for an Imaginary Country, Sample, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Wind WAVES TDSF Dataset

Trips by Distance

Fuτure - dataset for studies, development, and training of algorithms for...

Bank dataset

ECOSTRESS Gridded Evaporative Stress Index PT-JPL Instantaneous L4 Global 70...

maracoos_02-20210503T1937 Delayed Mode Raw Time Series

Data from: Modeling of failure mode in knee ligaments depending on the...

Vadu HDSS INDEPTH Core Dataset 2009 - 2015 (Release 2017) - India

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Frequency of data collection

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data appraisal

Swift XRT Combined Intensity Images - Dataset - B2FIND