Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Community homogeneity measures for θ varyingCommunity homogeneity measures for θ varying.
This repository contains scripts, input files, and some example output files for the Residential Population Generator, an R-based tool to generate synthetic human residental populations to use in making estimates of near-field chemical exposures. This tool is most readily adapted for using in the workflow for CHEM, the Combined Human Exposure Model, avaialable in two other GitHub repositories in the HumanExposure project, including ProductUseScheduler and source2dose. CHEM is currently best suited to estimating exposure to product use. Outputs from RPGen are translated into ProductUseScheduler, which with subsequent outputs used in source2dose.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed subset of raw HEFS hindcast data for Lake Mendocino (LAM) configured for compatibility with the repository structure of the versions 1 and 2 synthetic forecast model contained here: https://github.com/zpb4/Synthetic-Forecast-v1-FIRO-DISES and here: https://github.com/zpb4/Synthetic-Forecast-v2-FIRO-DISES. The data are pre-structured for the repository setup and instructions are included in README files for both GitHub repos on how to setup the data contained in this resource.
Contains HEFS hindcast .csv files and observed full-natural-flow files for the following sites: LAMC1 - main reservoir inflow to Lake Mendocino UKAC1 - downstream flows at Ukiah junction HOPC1L - downstream local flows at Hopland junction
Data also contains R scripts used to preprocess the raw HEFS data contained in the associated public Hydroshare resource here: https://www.hydroshare.org/resource/ccffddde118f4145854c960295f520cb/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The synthetic event log described in this document consists of 25,000 traces, generated using the process model outlined in Geyer et al. (2024) [1] and the DALG tool [2]. This event log simulates the treatment process of malignant melanoma patients, adhering to clinical guidelines. Each trace in the log represents a unique patient journey through various stages of melanoma treatment, providing detailed insights into decision points, treatments, and outcomes.
The DALG tool [2] was employed to generate this data-aware event log, ensuring realistic data distribution and variability.
DALG: https://github.com/DavidJilg/DALG
[1] Geyer, T., Grüger, J., & Kuhn, M. (2024). Clinical Guideline-based Model for the Treatment of Malignant Melanoma (Data Petri Net) (1.0). Zenodo. https://doi.org/10.5281/zenodo.10785431
[2] Jilg, D., Grüger, J., Geyer, T., Bergmann, R.: DALG: the data aware event log generator. In: BPM 2023 - Demos & Resources. CEUR Workshop Proceedings, vol. 3469, pp. 142–146. CEUR-WS.org (2023)
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Reference Paper:
General Description:
The "Tango Spacecraft Dataset for Region of Interest Estimation and Semantic Segmentation" dataset here published should be used for Region of Interest (ROI) and/or semantic segmentation tasks. It is split into 30002 train images and 3002 test images representing the Tango spacecraft from Prisma mission, being the largest publicly available dataset of synthetic space-borne noise-free images tailored to ROI extraction and Semantic Segmentation tasks (up to our knowledge). The label of each image gives, for the Bounding Box annotations, the filename of the image, the ROI top-left corner (minimum x, minimum y) in pixels, the ROI bottom-right corner (maximum x, maximum y) in pixels, and the center point of the ROI in pixels. The annotation are taken in image reference frame with the origin located at the top-left corner of the image, positive x rightward and positive y downward. Concerning the Semantic Segmentation, RGB masks are provided. Each RGB mask correspond to a single image in both train and test dataset. The RGB images are such that the R channel corresponds to the spacecraft, the G channel corresponds to the Earth (if present), and the B channel corresponds to the background (deep space). Per each channel the pixels have non-zero value only in correspondence of the object that they represent (Tango, Earth, Deep Space). More information on the dataset split and on the label format are reported below.
Images Information:
The dataset comprises 30002 synthetic grayscale images of Tango spacecraft from Prisma mission that serves as train set, while the test set is formed by 3002 synthetic grayscale images of Tango spacecraft from Prisma mission in PNG format. About 1/6 of the images both in the train and in the test set have a non-black background, obtained by rendering an Earth-like model in the raytracing process used to define the images reported. The images are noise-free to increase the flexibility of the dataset. The illumination direction of the spacecraft in the scene is uniformly distributed in the 3D space in agreement with the Sun position constraints.
Labels Information:
Labels for the bounding box extraction are here provided in separated JSON files. The files are formatted per each image as in the following example:
Notice that the annotation are taken in image reference frame with the origin located at the top-left corner of the image, positive x rightward and positive y downward.To make the usage of the dataset easier, both the training set and the test set are split in two folders containing the images with earth as background and without background.
Concerning the Semantic Segmentation Labels, they are provided as RGB masks named as "filename_mask.png" where "filename" is the filename of the image of the training set or the test set to which a specific mask is referred. The RGB images are such that the R channel corresponds to the spacecraft, the G channel corresponds to the Earth (if present), and the B channel corresponds to the background (deep space). Per each channel the pixels have non-zero value only in correspondence of the object that they represent (Tango, Earth, Deep Space).
VERSION CONTROL
Note: this dataset contains the same images of the "Tango Spacecraft Wireframe Dataset Model for Line Segments Detection" v2.0 full-scale (DOI: https://doi.org/10.5281/zenodo.6372848) and also "Tango Spacecraft Dataset for Monocular Pose Estimation" v1.0 (DOI: https://doi.org/10.5281/zenodo.6499007) and they can be used together by combining the annotations of the relative pose and the ones of the reprojected wireframe model of Tango, with also the ones of the ROI. These three datasets give the most comprehensive dataset of space borne synthetic images ever published (up to our knowledge).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Algorithm 2Algorithm 2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the synthetic gene expression data along with the underlying gene network used in the simulation studies of Hu and Szymczak (2023) for evaluating network-guided random forest.
In this dataset we consider the situation of 1000 genes and 1000 samples each for training and testing sets. Each file contains a list of 100 replications of the considered scenario which can be identified via the file name. In particular, we consider 6 different scenarios depending on the number of disease modules and how are the effects of disease genes distributed within the disease module. When there are disease genes, we also consider 3 different levels of effect sizes. The binary responses are then generated via a logistic regression model. More details on these scenarios and the data generation mechanism can be found in Hu and Szymczak (2023).
The data is generated by the function gen_data in R package networkRF which can be accessed at https://github.com/imbs-hl/networkRF. To obtain the datasets with 3000 genes, which is the other part of the data used in the simulation studies of Hu and Szymczak (2023), simply modify the num.var argument of the function gen_data. More descriptions on the implementation and the format of the output can be found in the help page of the R package.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pre-processed subset of raw HEFS hindcast data for New Hogan lake (NHG) configured for compatibility with the repository structure of the versions 1 and 2 synthetic forecast model contained here: https://github.com/zpb4/Synthetic-Forecast-v1-FIRO-DISES and here: https://github.com/zpb4/Synthetic-Forecast-v2-FIRO-DISES. The data are pre-structured for the repository setup and instructions are included in README files for both GitHub repos on how to setup the data contained in this resource.
Contains HEFS hindcast .csv files and observed full-natural-flow files for the following sites: NHGC1 - main reservoir inflow to New Hogan lake MSGC1L - downstream local flows from Mud Slough
Data also contains R scripts used to preprocess the raw HEFS data contained in the associated public Hydroshare resource here: https://www.hydroshare.org/resource/f63ead2d62414940a7d90acdc234a5d1/
Introduction
Vessel segmentation in fundus images is essential in the diagnosis and prognosis of retinal diseases and the identification of image-based biomarkers. However, creating a vessel segmentation map can be a tedious and time consuming process, requiring careful delineation of the vasculature, which is especially hard for microcapillary plexi in fundus images. Optical coherence tomography angiography (OCT-A) is a relatively novel modality visualizing blood flow and microcapillary plexi not clearly observed in fundus photography. Unfortunately, current commercial OCT-A cameras have various limitations due to their complex optics making them more expensive, less portable, and with a reduced field of view (FOV) compared to fundus cameras. Moreover, the vast majority of population health data collection efforts do not include OCT-A data.
We believe that strategies able to map fundus images to en-face OCT-A can create precise vascular vessel segmentation with less effort.
In this dataset, called UTHealth - Fundus and Synthetic OCT-A Dataset (UT-FSOCTA), we include fundus images and en-face OCT-A images for 112 subjects. The two modalities have been manually aligned to allow for training of medical imaging machine learning pipelines. This dataset is accompanied by a manuscript that describes an approach to generate fundus vessel segmentations using OCT-A for training (Coronado et al., 2022). We refer to this approach as "Synthetic OCT-A".
Fundus Imaging
We include 45 degree macula centered fundus images that cover both macula and optic disc. All images were acquired using a OptoVue iVue fundus camera without pupil dilation.
The full images are available at the fov45/fundus
directory. In addition, we extracted the FOVs corresponding to the en-face OCT-A images collected in cropped/fundus/disc
and cropped/fundus/macula
.
Enface OCT-A
We include the en-face OCT-A images of the superficial capillary plexus. All images were acquired using an OptoVue Avanti OCT camera with OCT-A reconstruction software (AngioVue). Low quality images with errors in the retina layer segmentations were not included.
En-face OCTA images are located in cropped/octa/disc
and cropped/octa/macula
. In addition, we include a denoised version of these images where only vessels are included. This has been performed automatically using the ROSE algorithm (Ma et al. 2021). These can be found in cropped/GT_OCT_net/noThresh
and cropped/GT_OCT_net/Thresh
, the former contains the probabilities of the ROSE algorithm the latter a binary map.
Synthetic OCT-A
We train a custom conditional generative adversarial network (cGAN) to map a fundus image to an en face OCT-A image. Our model consists of a generator synthesizing en face OCT-A images from corresponding areas in fundus photographs and a discriminator judging the resemblance of the synthesized images to the real en face OCT-A samples. This allows us to avoid the use of manual vessel segmentation maps altogether.
The full images are available at the fov45/synthetic_octa
directory. Then, we extracted the FOVs corresponding to the en-face OCT-A images collected in cropped/synthetic_octa/disc
and cropped/synthetic_octa/macula
. In addition, we performed the same denoising ROSE algorithm (Ma et al. 2021) used for the original enface OCT-A images, the results are available in cropped/denoised_synthetic_octa/noThresh
and cropped/denoised_synthetic_octa/Thresh
, the former contains the probabilities of the ROSE algorithm the latter a binary map.
Other Fundus Vessel Segmentations Included
In this dataset, we have also included the output of two recent vessel segmentation algorithms trained on external datasets with manual vessel segmentations. SA-Unet (Li et. al, 2020) and IterNet (Guo et. al, 2021).
SA-Unet. The full images are available at the fov45/SA_Unet
directory. Then, we extracted the FOVs corresponding to the en-face OCT-A images collected in cropped/SA_Unet/disc
and cropped/SA_Unet/macula
.
IterNet. The full images are available at the fov45/Iternet
directory. Then, we extracted the FOVs corresponding to the en-face OCT-A images collected in cropped/Iternet/disc
and cropped/Iternet/macula
.
Train/Validation/Test Replication
In order to replicate or compare your model to the results of our paper, we report below the data split used.
Training subjects IDs: 1 - 25
Validation subjects IDs: 26 - 30
Testing subjects IDs: 31 - 112
Data Acquisition
This dataset was acquired at the Texas Medical Center - Memorial Hermann Hospital in accordance with the guidelines from the Helsinki Declaration and it was approved by the UTHealth IRB with protocol HSC-MS-19-0352.
User Agreement
The UT-FSOCTA dataset is free to use for non-commercial scientific research only. In case of any publication the following paper needs to be cited
Coronado I, Pachade S, Trucco E, Abdelkhaleq R, Yan J, Salazar-Marioni S, Jagolino-Cole A, Bahrainian M, Channa R, Sheth SA, Giancardo L. Synthetic OCT-A blood vessel maps using fundus images and generative adversarial networks. Sci Rep 2023;13:15325. https://doi.org/10.1038/s41598-023-42062-9.
Funding
This work is supported by the Translational Research Institute for Space Health through NASA Cooperative Agreement NNX16AO69A.
Research Team and Acknowledgements
Here are the people behind this data acquisition effort:
Ivan Coronado, Samiksha Pachade, Rania Abdelkhaleq, Juntao Yan, Sergio Salazar-Marioni, Amanda Jagolino, Mozhdeh Bahrainian, Roomasa Channa, Sunil Sheth, Luca Giancardo
We would also like to acknowledge for their support: the Institute for Stroke and Cerebrovascular Diseases at UTHealth, the VAMPIRE team at University of Dundee, UK and Memorial Hermann Hospital System.
References
Coronado I, Pachade S, Trucco E, Abdelkhaleq R, Yan J, Salazar-Marioni S, Jagolino-Cole A, Bahrainian M, Channa R, Sheth SA, Giancardo L. Synthetic OCT-A blood vessel maps using fundus images and generative adversarial networks. Sci Rep 2023;13:15325. https://doi.org/10.1038/s41598-023-42062-9.
C. Guo, M. Szemenyei, Y. Yi, W. Wang, B. Chen, and C. Fan, "SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation," in 2020 25th International Conference on Pattern Recognition (ICPR), Jan. 2021, pp. 1236–1242. doi: 10.1109/ICPR48806.2021.9413346.
L. Li, M. Verma, Y. Nakashima, H. Nagahara, and R. Kawasaki, "IterNet: Retinal Image Segmentation Utilizing Structural Redundancy in Vessel Networks," 2020 IEEE Winter Conf. Appl. Comput. Vis. WACV, 2020, doi: 10.1109/WACV45572.2020.9093621.
Y. Ma et al., "ROSE: A Retinal OCT-Angiography Vessel Segmentation Dataset and New Model," IEEE Trans. Med. Imaging, vol. 40, no. 3, pp. 928–939, Mar. 2021, doi: 10.1109/TMI.2020.3042802.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Algorithm 1Algorithm 1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Structural measures for Ewthmax varyingStructural measures for Ewthmax varying.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Structural measures for Ebtwmax varying and Ewthmax=14Structural measures for Ebtwmax varying and Ewthmax=14.
The MATHWELL Human Annotation Dataset contains 5,084 synthetic word problems and answers generated by MATHWELL, a reference-free educational grade school math word problem generator released in MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations, and comparison models (GPT-4, GPT-3.5, Llama-2, MAmmoTH, and LLEMMA) with expert human annotations for solvability, accuracy, appropriateness, and meets all criteria (MaC). Solvability means the problem is mathematically possible to solve, accuracy means the Program of Thought (PoT) solution arrives at the correct answer, appropriateness means that the mathematical topic is familiar to a grade school student and the question's context is appropriate for a young learner, and MaC denotes questions which are labeled as solvable, accurate, and appropriate. Null values for accuracy and appropriateness indicate a question labeled as unsolvable, which means it cannot have an accurate solution and is automatically inappropriate. Based on our annotations, 82.2% of the question/answer pairs are solvable, 87.3% have accurate solutions, 78.1% are appropriate, and 58.4% meet all criteria.
This dataset is designed to train text classifiers to automatically label word problem generator outputs for solvability, accuracy, and appropriateness. More details about the dataset can be found in our paper.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Community homogeneity measures for θ varyingCommunity homogeneity measures for θ varying.