7 datasets found

Z
Synthetic Data for Neutrophil Analysis: Sets with irregular shapes and...
data.niaid.nih.gov
Updated Aug 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reyes-Aldasoro, Constantino Carlos (2024). Synthetic Data for Neutrophil Analysis: Sets with irregular shapes and Poisson noise [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1221161
Explore at:
Dataset updated
Aug 2, 2024
Dataset provided by
City, University of London
Authors
Reyes-Aldasoro, Constantino Carlos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Synthetic Datasets with irregular shapes and Poisson noise.

Part of the PhagoSight neutrophil tracking and analysis package (Henry, et al., PLOS ONE, 2013):

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0072636

http://www.phagosight.org

https://github.com/phagosight/phagosight

A series of synthetic data sets that reproduce different behaviour characteristics of migrating neutrophils were generated in MATLAB. The data sets consisted of six artificial neutrophils that travelled along paths that presented different conditions of tortuosity, times to activation and proximity to other neutrophils during 98 time frames.

Numerous data sets of neutrophils in zebrafish were carefully observed before setting the characteristics. Six trajectories were manually determined by setting the row, column positions of the centroids at every time point for 98 time frames. Each trajectory was designed so that it would represent different neutrophil behaviours: some trajectories were very oriented and had movements with uniform distance between time frames, whilst others were less uniform and would move at different velocities, some were tortuous whilst others were straight. The trajectories of cells 1 and 2 collided several times in the second half of the time frames whilst cells 3 and 4 collided at the beginning of the movement. Cell 6 migrated without meandering and then stopped at the end (which represents the wound area of an inflammation-based experiment) whilst 5 presented a delayed activation.

Each time frame consisted of 11 slices of z-stack each with 275 x 275 pixels, where the neutrophils were formed by irregular shapes (sum of Gaussians) and Poisson Noise (check the corresponding sets with regular shapes, i.e. Gaussians with Gaussian noise plus another set with a single large neutrophil and Poisson noise) distributions of higher intensities than the background. The orientation of the Poisson varied according to the displacement of the artificial neutrophils, i.e.they were round when the cells were static, or elongated when in movement. The tracks with the Shapes were saved as the gold standard and five different data sets were generated by adding varying levels of white Poisson noise resulting in data sets with distributions with increasing similarity between the neutrophils and the background reflected by the decreasing values of the Bhattacharyya Distance (1.61, 1.25, 1, 0.66, 0.45) as defined by Coleman 1979.

Files corresponding to the sets with irregular shapes and Poisson noise (noise increases from 1 to 5):

x,y,t trajectories ThreeDTracks Ground Truth syntheticData_P_mat_La First data set syntheticData_P1_mat_Re Second data set syntheticData_P2_mat_Re Third data set syntheticData_P3_mat_Re Fourth data set syntheticData_P4_mat_Re Fifth data set syntheticData_P5_mat_Re

Corresponding GIF files are also included as illustrations of the cells in motion.

Main Reference:

PhagoSight: An Open-Source MATLAB® Package for the Analysis of Fluorescent Neutrophil and Macrophage Migration in a Zebrafish Model Henry KM, Pase L, Ramos-Lopez CF, Lieschke GJ, Renshaw SA, Reyes-Aldasoro CC. (2013) PhagoSight: An Open-Source MATLAB® Package for the Analysis of Fluorescent Neutrophil and Macrophage Migration in a Zebrafish Model. PLOS ONE 8(8): e72636. https://doi.org/10.1371/journal.pone.0072636
Three variants of synthetic benchmarks time series of GPS and ERA-Interim...
figshare.com
zip
Updated Jan 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Kłos; Eric Pottiaux; Roeland Van Malderen (2020). Three variants of synthetic benchmarks time series of GPS and ERA-Interim IWV differences [Dataset]. http://doi.org/10.6084/m9.figshare.11733615.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11733615.v1
Dataset updated
Jan 28, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Anna Kłos; Eric Pottiaux; Roeland Van Malderen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of the synthetic datasetsDaily synthetic series of 6000 samples (i.e. length of 16y*365d) for 120 IGS sites for GPS-retrieved (IGS repro1) Integrated Water Vapour (IWV) values, and IWV differences between ERA-Interim model and GPS (IGS repro1) were simulated based on the characterisation (signal and noises) derived from the real datasets. The real ERAI-GPS IWV differences were firstly homogenized with a manual detection of breaks to provide the most consistent series. All manually detected epochs of breaks were cross-validated with information included in the log-files of the stations. If manually detected breaks were not reported as change in a log-file, then they were not corrected for, unless the offset is clearly seen in differences (ERAI-GPS). We assume the ERA-Interim model as an absolute reference with no artificial breaks. Under this assumption, only climate signals should be responsible for jumps in the time series.We tested different approaches, as we generated synthetic datasets for the IGS repro1, the ERA-Interim, and their differences. We found that generating directly the synthetic differences is closer to the real differences than building the differences afterwards based on generated synthetic ERA-Interim and the generated synthetic IGS repro1 IWV time series separately, so we proceeded with the synthetic differences. Specifications of the synthetic datasets availableThree different variants of those synthetic datasets were constructed:Variant 1: The ‘Easy’ dataset: it includes seasonal signals (annual, semi-annual, 3 & 4 months if present for a particular station) + offsets + white noise.Variant 2: The ‘Moderate’ dataset: seasonal signals (annual, semi-annual, 3 & 4 months) + offsets + autoregressive process of the first order + white noise (AR(1)+WH).Variant 3: The ‘Complex’ dataset: trend + seasonal signals (annual, semi-annual, 3 & 4 months) + offsets + AR(1)+WH + gaps.Variant 1 was created only for the ERA-Interim - GPS differences while Variants 2 and 3 were created both for 1) differences of IWV between ERAI and GPS (ERAI-GPS), and 2) GPS itself. The values of trends, amplitudes of seasonal signals, noise process and percentage of gaps were directly modelled taking into account the derived characteristics from the real datasets. The epochs of offsets were simulated randomly, separately for each variant, but the number and amplitudes of the offsets are characteristic for the real datasets. File format of the synthetic datasets Each of the simulated series are stored in a separated file. As for the real dataset, each file includes three columns: “year, y-x, x”. “Year” is a date formatted as YYMMDD.HHMMSS (e.g. 950101.120000 for 1st January 1995 at 12:00 UTC), column “y-x” includes differences between the ERAI and GPS synthetic values (in that order), and “x” means the ‘GPS-retrieved’ IWV synthetic values.
🌍 Air Quality and Health Impact Dataset🌍
kaggle.com
zip
Updated Jun 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabie El Kharoua (2024). 🌍 Air Quality and Health Impact Dataset🌍 [Dataset]. https://www.kaggle.com/datasets/rabieelkharoua/air-quality-and-health-impact-dataset
Explore at:
zip(523152 bytes)Available download formats
Dataset updated
Jun 12, 2024
Authors
Rabie El Kharoua
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains comprehensive information on the air quality and its impact on public health for 5,811 records. It includes variables such as air quality index (AQI), concentrations of various pollutants, weather conditions, and health impact metrics. The target variable is the health impact class, which categorizes the health impact based on the air quality and other related factors.

Table of Contents

Record Information

Record ID

Air Quality Metrics

Weather Conditions

Health Impact Metrics

Target Variable: Health Impact Class

Record Information

Record ID

RecordID: A unique identifier assigned to each record (1 to 2392).

Air Quality Metrics

AQI: Air Quality Index, a measure of how polluted the air currently is or how polluted it is forecast to become.

PM10: Concentration of particulate matter less than 10 micrometers in diameter (μg/m³).

PM2_5: Concentration of particulate matter less than 2.5 micrometers in diameter (μg/m³).

NO2: Concentration of nitrogen dioxide (ppb).

SO2: Concentration of sulfur dioxide (ppb).

O3: Concentration of ozone (ppb).

Weather Conditions

Temperature: Temperature in degrees Celsius (°C).

Humidity: Humidity percentage (%).

WindSpeed: Wind speed in meters per second (m/s).

Health Impact Metrics

RespiratoryCases: Number of respiratory cases reported.

CardiovascularCases: Number of cardiovascular cases reported.

HospitalAdmissions: Number of hospital admissions reported.

Target Variable: Health Impact Class

HealthImpactScore: A score indicating the overall health impact based on air quality and other related factors, ranging from 0 to 100.

HealthImpactClass: Classification of the health impact based on the health impact score:

0: 'Very High' (HealthImpactScore >= 80)

1: 'High' (60 <= HealthImpactScore < 80)

2: 'Moderate' (40 <= HealthImpactScore < 60)

3: 'Low' (20 <= HealthImpactScore < 40)

4: 'Very Low' (HealthImpactScore < 20)

Conclusion

This dataset offers a comprehensive view of the relationship between air quality and public health, making it ideal for research, predictive modeling, and statistical analysis.

Dataset Usage and Attribution Notice

This dataset, shared by Rabie El Kharoua, is original and has never been shared before. It is made available under the CC BY 4.0 license, allowing anyone to use the dataset in any form as long as proper citation is given to the author. A DOI is provided for proper referencing. Please note that duplication of this work within Kaggle is not permitted.

Exclusive Synthetic Dataset

This dataset is synthetic and was generated for educational purposes, making it ideal for data science and machine learning projects. It is an original dataset, owned by Mr. Rabie El Kharoua, and has not been previously shared. You are free to use it under the license outlined on the data card. The dataset is offered without any guarantees. Details about the data provider will be shared soon.
Synthetic matrix ensemble for nestedness analysis
figshare.com
zip
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Beckett; Hywel Williams (2016). Synthetic matrix ensemble for nestedness analysis [Dataset]. http://doi.org/10.6084/m9.figshare.1320818.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1320818.v1
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Stephen Beckett; Hywel Williams
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
README====== The data and code in this dataset was used to evaluate nestedness measures and null models (Beckett and Williams, submitted). 500 initial 'perfectly nested' matrices were created using Latin hypercube sampling to choose the number of rows [5,60] , columns [5,60] and curvature. We 'rewired' each of these matrices to evaluate how significance testing of nestedness alters between highly nested (low rewiring) and less nested (high rewiring) networks. In rewiring, a probability of rewiring occuring is assigned to each element in a matrix. If rewiring is judged to occur in a matrix element that has an edge (is a 1) - this edge is removed (turned to 0) and then randomly repositioned in one of the empty positions (one of the 0's becomes a 1), such that the number of total edges is conserved. We used 6 rewiring levels, such that the probability of rewiring was 0.01, 0.05, 0.1, 0.15, 0.2 and 0.5 . Ten replicates of the initial 500 matrices were made for each rewiring level. The entire ensemble is then 500x10x6 = 30,000 networks. Each of these 30,000 networks was then analysed for nestedness using FALCON (Beckett et al., 2014). Six nestedness measures and five null models were used. Details of these analyses can be found in Beckett and Williams (submitted). The dataset contains:code: MATLAB code used to create the synthetic ensemble.- SHAPE_MATRIX.m a MATLAB function for creating a 'perfectly nested' bipartite network with given rows, columns and curvature parameters.- makeBenchmarkEnsemble.m a MATLAB function for creating a set of X matrices rewired from an initial matrix with probability P.- randomiseMatrix.m a MATLAB function for rewiring a given input matrix with probability P. networks: The set of networks used in the synthetic ensemble.- A total of 30,000 binary matrices each saved as a separate csv file. output: The output data from nestedness analysis for each measure.- Five csv files corresponding to output from the five null models used (SS,FF,CC,DD,EE).- For each measure (NODF, MD, SR, JDM, BR, NTC) the measure score, p-value, z-score and adjusted normalised temperature(AnT) scores are given. Beckett S.J., Williams H.T.P. Brooding on nestedness: nestedness analyses are confounded by sensitivity to measurement choices and network properties. submitted.

Beckett SJ, Boulton CA and Williams HTP. FALCON: a software package for analysis of nestedness in bipartite networks [v1; ref status: indexed, http://f1000r.es/3z8] F1000Research 2014, 3:185 (doi: 10.12688/f1000research.4831.1)
VoroCrack3d: An annotated data set of 3d CT concrete images with synthetic...
zenodo.org
data.niaid.nih.gov
zip
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Jung; Claudia Redenbach; Katja Schladitz; Christian Jung; Claudia Redenbach; Katja Schladitz (2025). VoroCrack3d: An annotated data set of 3d CT concrete images with synthetic crack structures [Dataset]. http://doi.org/10.5281/zenodo.10518576
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10518576
Dataset updated
May 28, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christian Jung; Claudia Redenbach; Katja Schladitz; Christian Jung; Claudia Redenbach; Katja Schladitz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2023
Description
VoroCrack3d is an annotated data set of 3d CT images of concrete with synthetic crack structures. Its main purpose is the training and testing of machine learning models for 3d crack segmentation. The data set comprises 1344 images together with their corresponding ground truths. The concrete backgrounds are cropped out sections of size 400x400x400 voxels of CT images of concrete. To this end, several different concrete samples were scanned (normal concrete (NC), high-performance concrete (HPC), ultra-high-performance concrete (UHPC), air pore concrete; without and with reinforcements (straight steel fibers, crimped steel fibers, hooked-end steel fibers, polypropylene fibers, fibers made of glass fiber-reinforced polymer). The original concrete images have a resolution between 2.8 and 106 micrometers.

The crack structures are modeled via minimum-weight surfaces in Voronoi diagrams according to the paper

[1] C. Jung, C. Redenbach, Crack Modeling via Minimum-Weight Surfaces in 3d Voronoi Diagrams, Journal of Mathematics in Industry, 13, 10 (2023). https://doi.org/10.1186/s13362-023-00138-1.

The surfaces are discretized, dilated and superimposed on the concrete backgrounds.

The data set offers a high variety regarding concrete types, noise levels and crack widths, shapes, regularity and branching. This makes it suitable for studying the generalizability and robustness of 3d crack segmentation methods.

_

The folder 'data' contains seven subfolders, each containing the data generated from a specific concrete type (NC, HPC, air pore concrete, polypropylene fiber-reinforced concrete, steel fiber-reinforced concrete (straight, crimped and hooked-end steel fibers)).

Each subfolder again contains four subfolders according to the point process model that was used for generating the 3d Voronoi diagrams. The point processes and Voronoi diagrams are restricted to windows of size 400x150x400.

- 'hc': Hard core point process with 60% volume density and intensity 0.000025 obtained from force-biased sphere packing.
- 'matclust': Matérn cluster process with parent intensity 0.0002/50, offspring intensity 50 and cluster radius 20.
- 'ppp': Poisson point process with intensity 0.0002.
- 'ppp-scaled': Poisson point process with intensity 0.0002 (but inside 200x150x200 window). The resulting Voronoi diagram is stretched in x- and z- direction by a factor of 2.

Each of these contains five subfolders: one for the 3d input images, two for the corresponding labels (ground truths; one with and one without pores/fibers), one for the input and label previews (slice z=200 for each of the images) and a misc folder containing the concrete background without crack and, if applicable, the pore/fiber segmentation image.

The data itself then contains 48 images:
1a-1d: crack with up to seven branches; fixed crack width (~1 voxel).
2a-2d: crack with up to four branches; fixed crack width (~1 voxel).
3a-3d: crack with up to one branch; fixed crack width (~1 voxel).
4a-4d: crack with no branches; fixed crack width (~1 voxel).
5a-5d: crack with no branches; fixed crack width (~3 voxels).
6a-6d: crack with no branches; fixed crack width (~5 voxels).
7a-7d: crack with no branches; fixed crack width (~7 voxels).
8a-8d: crack with up to seven branches; multiscale crack (bernoulli parameter 0.01);
9a-9d: crack with up to seven branches; multiscale crack (bernoulli parameter 0.02);
10a-10d: crack with up to seven branches; multiscale crack (bernoulli parameter 0.05);
11a-11d: crack with up to seven branches; multiscale crack (bernoulli parameter 0.1);
12a-12d: crack with up to seven branches; multiscale crack (bernoulli parameter 0.2);

The names 'a'-'d' indicate level of added noise added to the image:
a: None.
b: Uniformly on [-sigma,sigma]
c: Uniformly on [-2*sigma,2*sigma]
d: Uniformly on [-4*sigma,4*sigma]
Negative values are mapped to 0.
For inputs of type int, noise values are rounded to the nearest integer.
(sigma = standard deviation of voxel greyvalues in image)

Note that the grey values in the ground truths correspond to the local crack width. They can be thresholded to obtain binary masks.

For more details, we refer to [1].
Gas sensor array temperature modulation
kaggle.com
zip
Updated Apr 15, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Javi (2019). Gas sensor array temperature modulation [Dataset]. https://www.kaggle.com/javi2270784/gas-sensor-array-temperature-modulation
Explore at:
zip(382968146 bytes)Available download formats
Dataset updated
Apr 15, 2019
Authors
Javi
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Gas sensor array exposed to dynamic mixtures of carbon monoxide and humidity

Data Set Information

A chemical detection platform composed of 14 temperature-modulated metal oxide semiconductor (MOX) gas sensors was exposed to dynamic mixtures of carbon monoxide (CO) and humid synthetic air in a gas chamber.

The acquired time series of the sensors and the measured values of CO concentration, humidity and temperature inside the gas chamber are provided.

a) Chemical detection platform. The chemical detection platform is composed of 14 MOX gas sensors that generate a time-dependent multivariate response to the different gas stimuli. The utilized sensors are commercially available by Figaro Engineering (7 units of TGS 3870-A04) and FIS (7 units of SB-500-12). The operating temperature of the sensors is controlled by modulating the voltage of the built-in heater in the range 0.2-0.9 V in cycles of 20 and 25 s, following the manufacturer recommendations (0.9 V for 5s, 0.2 V for 20s, 0.9 V for 5 s, 0.2 V for 25 s, ...). The sensors were pre-heated for one week before starting the experiments. The MOX read-out circuits consist of voltage dividers with 1 MOhm load resistors and powered at 5V. The output voltage of the sensors is sampled at 3.5 Hz using an Agilent HP34970A/34901A DAQ configured at 15 bits of precision and input impedance greater than 10 GOhm.

b) Generator of dynamic gas mixtures. Dynamic mixtures of CO and humid synthetic air were delivered from high purity gases in cylinders to a small-sized polytetrafluoroethylene (PTFE) test chamber (250 cm3 internal volume), by means of a piping system and mass flow controllers (MFCs). Gas mixing was performed using mass flow controllers (MFC),which controlled three different gas streams (CO, wet air and dry air). These streams were delivered from high quality pressurized gases in cylinders. The selected MFCs (EL-FLOW Select, Bronkhorst) had full scale flow rates of 1000 mln/min for the dry and wet air streams and 3 mln/min for the CO channel. The CO bottle contained 1600 ppm of CO diluted in synthetic air with 21 ± 1% O2. The relative uncertainty in the generated CO concentration was below 5.5%. The wet and dry air streams were both delivered from a synthetic air bottle with 99.995% purity and 21 ± 1% O2. Humidification of the wet stream was based on the saturation method using a glass bubbler (Drechsler bottles).

c) Temperature/humidity values. A temperature/humidity sensor (SHT75, from Sensirion) with tolerance below 1.8% r.h. and 0.5 ºC provides reference humidity and temperature values inside the test chamber every 5 s. The temperature variations inside the gas chamber, for each experiment, were below 3 ºC.

d) Experimental protocol. Each experiment consists on 100 measurements: 10 experimental concentrations uniformly distributed in the range 0-20 ppm and 10 replicates per concentration. Each replicate has a relative humidity randomly chosen from a uniform distribution between 15% and 75% r.h. At the beginning of each experiment, the gas chamber is cleaned for 15 min using a stream of synthetic air at a flow rate of 240 mln/min. After that, the gas mixtures arereleased in random order at a constant flow rate of 240 mln/min for 15 min each. A single experiment lasts 25 hours (100 samples x 15 minutes/sample) and was replicated on 13 working days spanning a natural period of 17 days.

Attribute Information

The dataset is presented in 13 text files, where each file corresponds to a different measurement day. The filenames indicate the timestamp (yyyymmdd_HHMMSS) of the start of the measurements. Each file includes the acquired time series, presented in 20 columns: Time (s), CO concentration (ppm), Humidity (%r.h.), Temperature (ºC), Flow rate (mL/min), Heater voltage (V), and the resistance of the 14 gas sensors: R1 (MOhm),R2 (MOhm),R3 (MOhm),R4 (MOhm),R5 (MOhm),R6 (MOhm),R7 (MOhm),R8 (MOhm),R9 (MOhm),R10 (MOhm),R11 (MOhm),R12 (MOhm),R13 (MOhm),R14 (MOhm) Resistance values R1-R7 correspond to FIGARO TGS 3870 A-04 sensors, whereas R8-R14 correspond to FIS SB-500-12 units. The time series are sampled at 3.5 Hz.

Relevant Papers

The description of the experimental setup and chemical detection platform can be found in [1-2]. The dataset has been used also in [3].

[1] Burgués, Javier, Juan Manuel Jiménez-Soto, and Santiago Marco. "Estimation of the limit of detection in semiconductor gas sensors through linearized calibration models." Analytica chimica acta 1013 (2018): 13-25.

[2] Burgués, Javier, and Santiago Marco. "Multivariate estimation of the limit of detection by orthogonal partial least squares in temperature-modulated MOX sensors." Analytica chimica acta 1019 (2018): 49-64.

[3] Fernandez, Luis, Jia Yan, Jordi Fonollosa, Javier Burgués, Agustin Gutierrez, and Santiago Marco. "A practical method to estimate ...
Z
SD4EO: AI-based synthetic satellite multispectral agricultural textures in...
data.niaid.nih.gov
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miraut, David (2024). SD4EO: AI-based synthetic satellite multispectral agricultural textures in Spain (Oct 2017 - Sep 2018) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11220859
Explore at:
Dataset updated
Oct 18, 2024
Dataset provided by
GMV
Authors
Miraut, David
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Spain
Description
This dataset has been created as part of the deliverables for ESA’s SD4EO project. It consists of textures generated using a multispectral variant of a still unpublished high-order statistical constraint synthesis method for each of the following crop types:

Barley.

Wheat.

Other grain leguminous.

Peas.

Fallow & Bare soil.

Vetch.

Alfalfa.

Sunflower.

Oats.

The initial data was sampled from satellite images, specifically from Copernicus’ Sentinel-1 and Sentinel-2 satellites. The images were acquired over a period from October 2017 to September 2018 on the central-east region of northern Spain (Castile and León and Catalonia). From these images, the corresponding crops were extracted and used as samples for assembling large puzzles that have been applied as input reference images to generate the synthetic images that make up this dataset.

The datasets of assembled crop field "puzzles" used as reference images combine the largest crop areas to create a square multispectral texture of the largest possible size that is a power of 2 (or nearly a power of 2). Each base image combines data from all available Sentinel-2 satellite passes for the same month and a previous monthly composition from Sentinel-1. Due to cloud masks influence, the shape and number of crops vary for each time sample, preventing the reuse of element disposition in the “puzzles” across different months. Therefore, we have a base image (puzzle) for each month and crop type, with a size dependent on the number and area of crops not covered by clouds. These base image sizes range between 256, 384, 512, 768, 1024, 1536, and 2048 pixels per side, influenced by weather conditions and crop type each year season.

In this dataset, the synthetic texture sizes match the corresponding base image sizes to facilitate debugging the method implementation and enable subsequent comparisons. For crops with a base image size of 1536 pixels or larger, the generated synthetic images have been reduced to half their size to reduce computational costs and RAM requirements, thereby completing the synthesis faster. Consequently, there remains some diversity in file sizes, generally smaller for crop types with less cultivated area.

Additionally, to increase the amount of available data, six variants have been synthesized from each base multispectral image. This number can be arbitrarily increased, as initialization with noise (random numbers) ensures the distinction among the generated data.

File names are structured as follows:

Prefix "HO" indicating the synthesis method

The crop type name:

Barley

Wheat

OtherGrainLeguminous

Peas

FallowAndBareSoil

Vetch

Alfalfa

Sunflower

Oats

Year/Month/01 (representing the start of the month period)

Side length of the multispectral texture in pixels (based on the highest precision instrument of Sentinel-2: 10m x 10m)

Number of the synthesis variant

The generation parameters for all images include:

Normalized and weighted bands (VH band influence increased by a factor of 3 compared to others)

4 levels of depth in the Steerable pyramid

6 orientations in the Steerable pyramid

14 joint statistics of the wavelet coefficients corresponding to basis functions at adjacent spatial locations, orientations, and scales. This parameter is crucial for capturing local dependencies between wavelet coefficients, essential for the visual perception of texture.

30 iterations

A significant effort has been made to stabilize the algorithm, and to eliminate artifacts in the generated textures, resulting in much more robust outcomes. However, in rare cases, the initial white noise distribution can be statistically unfavorable, leading to instabilities. Files have been left as generated, without correcting these effects, to make them visible despite their low frequency. Specifically, among the 657 generated multispectral textures, this phenomenon has occurred prominently in only two and is relatively noticeable in another two, leaving the rest free of this effect (affecting less than 1% of the syntheses).

Thus, the following files can be considered partially failed syntheses:

HO_Alfalfa_20180801_768_1.nc

HO_FallowAndBareSoil_20180101_768_3.nc

HO_OtherGrainLeguminous_20171201_256_4.nc

HO_Vetch_20180301_384_3.nc

Files are encoded in the standardized net4CDF format [link], each containing a single xarray with metadata corresponding to a 3D array with the synthesized texture of the indicated crop type and satellite passes for the regions of Castilla y León and Catalonia for the corresponding monthly period.

The most important data structure is the 3D array, where the first two dimensions correspond to the pixel extent indicated in the file name as square textures ('x' and 'y' labels in the xarray). The third dimension denotes the spectral band of the satellite, ordered by constellation and pixel size:

'B02' 10m (Sentinel-2)

'B03' 10m (Sentinel-2)

'B04' 10m (Sentinel-2)

'B08' 10m (Sentinel-2)

'B05' originally 20m, resampled to 10m (Sentinel-2)

'B06' originally 20m, resampled to 10m (Sentinel-2)

'B07' originally 20m, resampled to 10m (Sentinel-2)

'B11' originally 20m, resampled to 10m (Sentinel-2)

'B12' originally 20m, resampled to 10m (Sentinel-2)

'B8A' originally 20m, resampled to 10m (Sentinel-2)

'VH' also resampled to 10m (Sentinel-1)

The original dynamic range is preserved in all bands, and they have been synthesized together using our multispectral algorithm variant. The new band combination may result in slightly unusual values in vegetation indices since restrictions were not considered in their transformed space, but in the latent space of the decorrelated Steerable pyramid.

Additionally, the following metadata are stored as xarray attributes:

"long_name": corresponding to the crop type name

"date": the period of the original data used as the base image for synthesis

"dataset": denotes the combination of the initial Castilla y León dataset and the extended 6 Tiles from Catalonia

"synthetic_method": corresponds to the high-order constrained method

"max_visible_value": a reference value to maintain the same dynamic range when comparing with base images, avoiding distortions in color space and contrast

A total of:

9 types of crops x 12 months x 6 variants = 648 synthetized multispectral textures

occupying 34.5GB, have been organized and uploaded into 9 ZIP files (one per crop type) on the Zenodo website for distribution under Creative Commons Attribution 4.0 International license.

The SD4EO Project is funded by the ESA’s FutureEO programme under contract no. 4000142334/23/I-DT and supervised by ESA Φ-lab.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Reyes-Aldasoro, Constantino Carlos (2024). Synthetic Data for Neutrophil Analysis: Sets with irregular shapes and Poisson noise [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1221161

Synthetic Data for Neutrophil Analysis: Sets with irregular shapes and Poisson noise

Explore at:

Dataset updated

Aug 2, 2024

Dataset provided by

City, University of London

Authors

Reyes-Aldasoro, Constantino Carlos

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Synthetic Datasets with irregular shapes and Poisson noise.

Part of the PhagoSight neutrophil tracking and analysis package (Henry, et al., PLOS ONE, 2013):

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0072636

http://www.phagosight.org

https://github.com/phagosight/phagosight

A series of synthetic data sets that reproduce different behaviour characteristics of migrating neutrophils were generated in MATLAB. The data sets consisted of six artificial neutrophils that travelled along paths that presented different conditions of tortuosity, times to activation and proximity to other neutrophils during 98 time frames.

Numerous data sets of neutrophils in zebrafish were carefully observed before setting the characteristics. Six trajectories were manually determined by setting the row, column positions of the centroids at every time point for 98 time frames. Each trajectory was designed so that it would represent different neutrophil behaviours: some trajectories were very oriented and had movements with uniform distance between time frames, whilst others were less uniform and would move at different velocities, some were tortuous whilst others were straight. The trajectories of cells 1 and 2 collided several times in the second half of the time frames whilst cells 3 and 4 collided at the beginning of the movement. Cell 6 migrated without meandering and then stopped at the end (which represents the wound area of an inflammation-based experiment) whilst 5 presented a delayed activation.

Each time frame consisted of 11 slices of z-stack each with 275 x 275 pixels, where the neutrophils were formed by irregular shapes (sum of Gaussians) and Poisson Noise (check the corresponding sets with regular shapes, i.e. Gaussians with Gaussian noise plus another set with a single large neutrophil and Poisson noise) distributions of higher intensities than the background. The orientation of the Poisson varied according to the displacement of the artificial neutrophils, i.e.they were round when the cells were static, or elongated when in movement. The tracks with the Shapes were saved as the gold standard and five different data sets were generated by adding varying levels of white Poisson noise resulting in data sets with distributions with increasing similarity between the neutrophils and the background reflected by the decreasing values of the Bhattacharyya Distance (1.61, 1.25, 1, 0.66, 0.45) as defined by Coleman 1979.

Files corresponding to the sets with irregular shapes and Poisson noise (noise increases from 1 to 5):

x,y,t trajectories  ThreeDTracks

Ground Truth    syntheticData_P_mat_La 

First data set     syntheticData_P1_mat_Re

Second data set   syntheticData_P2_mat_Re

Third data set    syntheticData_P3_mat_Re

Fourth data set   syntheticData_P4_mat_Re

Fifth data set    syntheticData_P5_mat_Re

Corresponding GIF files are also included as illustrations of the cells in motion.

Main Reference:

PhagoSight: An Open-Source MATLAB® Package for the Analysis of Fluorescent Neutrophil and Macrophage Migration in a Zebrafish Model Henry KM, Pase L, Ramos-Lopez CF, Lieschke GJ, Renshaw SA, Reyes-Aldasoro CC. (2013) PhagoSight: An Open-Source MATLAB® Package for the Analysis of Fluorescent Neutrophil and Macrophage Migration in a Zebrafish Model. PLOS ONE 8(8): e72636. https://doi.org/10.1371/journal.pone.0072636

Clear search

Close search

Google apps

Main menu

Synthetic Data for Neutrophil Analysis: Sets with irregular shapes and...

Three variants of synthetic benchmarks time series of GPS and ERA-Interim...

🌍 Air Quality and Health Impact Dataset🌍

Table of Contents

Record Information

Record ID

Air Quality Metrics

Weather Conditions

Health Impact Metrics

Target Variable: Health Impact Class

Conclusion

Dataset Usage and Attribution Notice

Exclusive Synthetic Dataset

Synthetic matrix ensemble for nestedness analysis

VoroCrack3d: An annotated data set of 3d CT concrete images with synthetic...

Gas sensor array temperature modulation

Data Set Information

Attribute Information

Relevant Papers

SD4EO: AI-based synthetic satellite multispectral agricultural textures in...

Synthetic Data for Neutrophil Analysis: Sets with irregular shapes and Poisson noiseSee More Versions

Synthetic Data for Neutrophil Analysis: Sets with irregular shapes and Poisson noise