13 datasets found

Community homogeneity measures for θ varying.
plos.figshare.com
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane (2023). Community homogeneity measures for θ varying. [Dataset]. http://doi.org/10.1371/journal.pone.0122777.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0122777.t005
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Community homogeneity measures for θ varyingCommunity homogeneity measures for θ varying.
The Residential Population Generator (RPGen): A tool to parameterize...
catalog.data.gov
Updated Mar 9, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). The Residential Population Generator (RPGen): A tool to parameterize residential, demographic, and physiological data to model intraindividual exposure, dose, and risk [Dataset]. https://catalog.data.gov/dataset/the-residential-population-generator-rpgen-a-tool-to-parameterize-residential-demographic-
Explore at:
Dataset updated
Mar 9, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This repository contains scripts, input files, and some example output files for the Residential Population Generator, an R-based tool to generate synthetic human residental populations to use in making estimates of near-field chemical exposures. This tool is most readily adapted for using in the workflow for CHEM, the Combined Human Exposure Model, avaialable in two other GitHub repositories in the HumanExposure project, including ProductUseScheduler and source2dose. CHEM is currently best suited to estimating exposure to product use. Outputs from RPGen are translated into ProductUseScheduler, which with subsequent outputs used in source2dose.
H
LAM Synthetic Forecast Generation Dataset
beta.hydroshare.org
hydroshare.org
zip
Updated May 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zachary Paul Brodeur (2024). LAM Synthetic Forecast Generation Dataset [Dataset]. https://beta.hydroshare.org/resource/e51d9821c8d84682b642eb0818ac3137/
Explore at:
zip(600.0 MB)Available download formats
Dataset updated
May 6, 2024
Dataset provided by
HydroShare
Authors
Zachary Paul Brodeur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 3, 1984 - Sep 10, 2019
Area covered

Description
Pre-processed subset of raw HEFS hindcast data for Lake Mendocino (LAM) configured for compatibility with the repository structure of the versions 1 and 2 synthetic forecast model contained here: https://github.com/zpb4/Synthetic-Forecast-v1-FIRO-DISES and here: https://github.com/zpb4/Synthetic-Forecast-v2-FIRO-DISES. The data are pre-structured for the repository setup and instructions are included in README files for both GitHub repos on how to setup the data contained in this resource.

Contains HEFS hindcast .csv files and observed full-natural-flow files for the following sites: LAMC1 - main reservoir inflow to Lake Mendocino UKAC1 - downstream flows at Ukiah junction HOPC1L - downstream local flows at Hopland junction

Data also contains R scripts used to preprocess the raw HEFS data contained in the associated public Hydroshare resource here: https://www.hydroshare.org/resource/ccffddde118f4145854c960295f520cb/
Z
Synthetic XES Event Log of Malignant Melanoma Treatment
data.niaid.nih.gov
zenodo.org
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grüger, Joscha (2024). Synthetic XES Event Log of Malignant Melanoma Treatment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13828518
Explore at:
Dataset updated
Sep 23, 2024
Dataset provided by
Grüger, Joscha
Kuhn, Martin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The synthetic event log described in this document consists of 25,000 traces, generated using the process model outlined in Geyer et al. (2024) [1] and the DALG tool [2]. This event log simulates the treatment process of malignant melanoma patients, adhering to clinical guidelines. Each trace in the log represents a unique patient journey through various stages of melanoma treatment, providing detailed insights into decision points, treatments, and outcomes.

The DALG tool [2] was employed to generate this data-aware event log, ensuring realistic data distribution and variability.

DALG: https://github.com/DavidJilg/DALG

[1] Geyer, T., Grüger, J., & Kuhn, M. (2024). Clinical Guideline-based Model for the Treatment of Malignant Melanoma (Data Petri Net) (1.0). Zenodo. https://doi.org/10.5281/zenodo.10785431

[2] Jilg, D., Grüger, J., Geyer, T., Bergmann, R.: DALG: the data aware event log generator. In: BPM 2023 - Demos & Resources. CEUR Workshop Proceedings, vol. 3469, pp. 142–146. CEUR-WS.org (2023)
Tango Spacecraft Dataset for Region of Interest Estimation and Semantic...
zenodo.org
data.niaid.nih.gov
zip
Updated May 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bechini Michele; Bechini Michele; Lunghi Paolo; Lunghi Paolo; Lavagna Michèle; Lavagna Michèle (2023). Tango Spacecraft Dataset for Region of Interest Estimation and Semantic Segmentation [Dataset]. http://doi.org/10.5281/zenodo.6507864
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6507864
Dataset updated
May 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bechini Michele; Bechini Michele; Lunghi Paolo; Lunghi Paolo; Lavagna Michèle; Lavagna Michèle
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Reference Paper:

M. Bechini, M. Lavagna, P. Lunghi, Dataset generation and validation for spacecraft pose estimation via monocular images processing, Acta Astronautica 204 (2023) 358–369

M. Bechini, P. Lunghi, M. Lavagna. "Spacecraft Pose Estimation via Monocular Image Processing: Dataset Generation and Validation". In 9th European Conference for Aeronautics and Aerospace Sciences (EUCASS)

General Description:

The "Tango Spacecraft Dataset for Region of Interest Estimation and Semantic Segmentation" dataset here published should be used for Region of Interest (ROI) and/or semantic segmentation tasks. It is split into 30002 train images and 3002 test images representing the Tango spacecraft from Prisma mission, being the largest publicly available dataset of synthetic space-borne noise-free images tailored to ROI extraction and Semantic Segmentation tasks (up to our knowledge). The label of each image gives, for the Bounding Box annotations, the filename of the image, the ROI top-left corner (minimum x, minimum y) in pixels, the ROI bottom-right corner (maximum x, maximum y) in pixels, and the center point of the ROI in pixels. The annotation are taken in image reference frame with the origin located at the top-left corner of the image, positive x rightward and positive y downward. Concerning the Semantic Segmentation, RGB masks are provided. Each RGB mask correspond to a single image in both train and test dataset. The RGB images are such that the R channel corresponds to the spacecraft, the G channel corresponds to the Earth (if present), and the B channel corresponds to the background (deep space). Per each channel the pixels have non-zero value only in correspondence of the object that they represent (Tango, Earth, Deep Space). More information on the dataset split and on the label format are reported below.

Images Information:

The dataset comprises 30002 synthetic grayscale images of Tango spacecraft from Prisma mission that serves as train set, while the test set is formed by 3002 synthetic grayscale images of Tango spacecraft from Prisma mission in PNG format. About 1/6 of the images both in the train and in the test set have a non-black background, obtained by rendering an Earth-like model in the raytracing process used to define the images reported. The images are noise-free to increase the flexibility of the dataset. The illumination direction of the spacecraft in the scene is uniformly distributed in the 3D space in agreement with the Sun position constraints.

Labels Information:

Labels for the bounding box extraction are here provided in separated JSON files. The files are formatted per each image as in the following example:

filename : tango_img_1 # name of the image to which the data are referred

rol_tl : [x, y] # ROI top-left corner (minimum x, minimum y) in pixels

roi_br : [x, y] # ROI bottom-right corner (maximum x, maximum y) in pixels

roi_cc : [x, y] # center point of the ROI in pixels

Notice that the annotation are taken in image reference frame with the origin located at the top-left corner of the image, positive x rightward and positive y downward.To make the usage of the dataset easier, both the training set and the test set are split in two folders containing the images with earth as background and without background.

Concerning the Semantic Segmentation Labels, they are provided as RGB masks named as "filename_mask.png" where "filename" is the filename of the image of the training set or the test set to which a specific mask is referred. The RGB images are such that the R channel corresponds to the spacecraft, the G channel corresponds to the Earth (if present), and the B channel corresponds to the background (deep space). Per each channel the pixels have non-zero value only in correspondence of the object that they represent (Tango, Earth, Deep Space).

VERSION CONTROL

v1.0: This version contains the dataset (both train and test) of full scale images with ROI annotations and RGB masks for Semantic Segmentation tasks. These images have width=height=1024 pixels. The position of tango with respect to the camera is randomly selected from a uniform distribution, but it is ensured the full visibility in all the images.

Note: this dataset contains the same images of the "Tango Spacecraft Wireframe Dataset Model for Line Segments Detection" v2.0 full-scale (DOI: https://doi.org/10.5281/zenodo.6372848) and also "Tango Spacecraft Dataset for Monocular Pose Estimation" v1.0 (DOI: https://doi.org/10.5281/zenodo.6499007) and they can be used together by combining the annotations of the relative pose and the ones of the reprojected wireframe model of Tango, with also the ones of the ROI. These three datasets give the most comprehensive dataset of space borne synthetic images ever published (up to our knowledge).
f
Data from: Algorithm 2.
figshare.com
xls
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane (2023). Algorithm 2. [Dataset]. http://doi.org/10.1371/journal.pone.0122777.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0122777.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Algorithm 2Algorithm 2.
Synthetic gene expression data with underlying gene network
zenodo.org
data.niaid.nih.gov
bin
Updated Aug 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jianchang Hu; Jianchang Hu; Silke Szymczak; Silke Szymczak (2023). Synthetic gene expression data with underlying gene network [Dataset]. http://doi.org/10.5281/zenodo.8242661
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8242661
Dataset updated
Aug 15, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jianchang Hu; Jianchang Hu; Silke Szymczak; Silke Szymczak
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the synthetic gene expression data along with the underlying gene network used in the simulation studies of Hu and Szymczak (2023) for evaluating network-guided random forest.

In this dataset we consider the situation of 1000 genes and 1000 samples each for training and testing sets. Each file contains a list of 100 replications of the considered scenario which can be identified via the file name. In particular, we consider 6 different scenarios depending on the number of disease modules and how are the effects of disease genes distributed within the disease module. When there are disease genes, we also consider 3 different levels of effect sizes. The binary responses are then generated via a logistic regression model. More details on these scenarios and the data generation mechanism can be found in Hu and Szymczak (2023).

The data is generated by the function gen_data in R package networkRF which can be accessed at https://github.com/imbs-hl/networkRF. To obtain the datasets with 3000 genes, which is the other part of the data used in the simulation studies of Hu and Szymczak (2023), simply modify the num.var argument of the function gen_data. More descriptions on the implementation and the format of the output can be found in the help page of the R package.
H
NHG Synthetic Forecast generation dataset
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zachary Paul Brodeur (2024). NHG Synthetic Forecast generation dataset [Dataset]. https://www.hydroshare.org/resource/dfa02b83bbde4ae3888ffafeb4446a5b
Explore at:
zip(308.6 MB)Available download formats
Dataset updated
Apr 17, 2024
Dataset provided by
HydroShare
Authors
Zachary Paul Brodeur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 1, 1979 - Sep 30, 2019
Description
Pre-processed subset of raw HEFS hindcast data for New Hogan lake (NHG) configured for compatibility with the repository structure of the versions 1 and 2 synthetic forecast model contained here: https://github.com/zpb4/Synthetic-Forecast-v1-FIRO-DISES and here: https://github.com/zpb4/Synthetic-Forecast-v2-FIRO-DISES. The data are pre-structured for the repository setup and instructions are included in README files for both GitHub repos on how to setup the data contained in this resource.

Contains HEFS hindcast .csv files and observed full-natural-flow files for the following sites: NHGC1 - main reservoir inflow to New Hogan lake MSGC1L - downstream local flows from Mud Slough

Data also contains R scripts used to preprocess the raw HEFS data contained in the associated public Hydroshare resource here: https://www.hydroshare.org/resource/f63ead2d62414940a7d90acdc234a5d1/
UTHealth - Fundus and Synthetic OCT-A Dataset (UT-FSOCTA)
zenodo.org
bin, zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan Coronado; Samiksha Pachade; Rania Abdelkhaleq; Juntao Yan; Sergio Salazar-Marioni; Amanda Jagolino; Mozhdeh Bahrainian; Roomasa Channa; Sunil Sheth; Luca Giancardo; Luca Giancardo; Ivan Coronado; Samiksha Pachade; Rania Abdelkhaleq; Juntao Yan; Sergio Salazar-Marioni; Amanda Jagolino; Mozhdeh Bahrainian; Roomasa Channa; Sunil Sheth (2023). UTHealth - Fundus and Synthetic OCT-A Dataset (UT-FSOCTA) [Dataset]. http://doi.org/10.5281/zenodo.6476639
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6476639
Dataset updated
Dec 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ivan Coronado; Samiksha Pachade; Rania Abdelkhaleq; Juntao Yan; Sergio Salazar-Marioni; Amanda Jagolino; Mozhdeh Bahrainian; Roomasa Channa; Sunil Sheth; Luca Giancardo; Luca Giancardo; Ivan Coronado; Samiksha Pachade; Rania Abdelkhaleq; Juntao Yan; Sergio Salazar-Marioni; Amanda Jagolino; Mozhdeh Bahrainian; Roomasa Channa; Sunil Sheth
Description
Introduction

Vessel segmentation in fundus images is essential in the diagnosis and prognosis of retinal diseases and the identification of image-based biomarkers. However, creating a vessel segmentation map can be a tedious and time consuming process, requiring careful delineation of the vasculature, which is especially hard for microcapillary plexi in fundus images. Optical coherence tomography angiography (OCT-A) is a relatively novel modality visualizing blood flow and microcapillary plexi not clearly observed in fundus photography. Unfortunately, current commercial OCT-A cameras have various limitations due to their complex optics making them more expensive, less portable, and with a reduced field of view (FOV) compared to fundus cameras. Moreover, the vast majority of population health data collection efforts do not include OCT-A data.

We believe that strategies able to map fundus images to en-face OCT-A can create precise vascular vessel segmentation with less effort.

In this dataset, called UTHealth - Fundus and Synthetic OCT-A Dataset (UT-FSOCTA), we include fundus images and en-face OCT-A images for 112 subjects. The two modalities have been manually aligned to allow for training of medical imaging machine learning pipelines. This dataset is accompanied by a manuscript that describes an approach to generate fundus vessel segmentations using OCT-A for training (Coronado et al., 2022). We refer to this approach as "Synthetic OCT-A".

Fundus Imaging

We include 45 degree macula centered fundus images that cover both macula and optic disc. All images were acquired using a OptoVue iVue fundus camera without pupil dilation.

The full images are available at the fov45/fundus directory. In addition, we extracted the FOVs corresponding to the en-face OCT-A images collected in cropped/fundus/disc and cropped/fundus/macula.

Enface OCT-A

We include the en-face OCT-A images of the superficial capillary plexus. All images were acquired using an OptoVue Avanti OCT camera with OCT-A reconstruction software (AngioVue). Low quality images with errors in the retina layer segmentations were not included.

En-face OCTA images are located in cropped/octa/disc and cropped/octa/macula. In addition, we include a denoised version of these images where only vessels are included. This has been performed automatically using the ROSE algorithm (Ma et al. 2021). These can be found in cropped/GT_OCT_net/noThresh and cropped/GT_OCT_net/Thresh, the former contains the probabilities of the ROSE algorithm the latter a binary map.

Synthetic OCT-A

We train a custom conditional generative adversarial network (cGAN) to map a fundus image to an en face OCT-A image. Our model consists of a generator synthesizing en face OCT-A images from corresponding areas in fundus photographs and a discriminator judging the resemblance of the synthesized images to the real en face OCT-A samples. This allows us to avoid the use of manual vessel segmentation maps altogether.

The full images are available at the fov45/synthetic_octa directory. Then, we extracted the FOVs corresponding to the en-face OCT-A images collected in cropped/synthetic_octa/disc and cropped/synthetic_octa/macula. In addition, we performed the same denoising ROSE algorithm (Ma et al. 2021) used for the original enface OCT-A images, the results are available in cropped/denoised_synthetic_octa/noThresh and cropped/denoised_synthetic_octa/Thresh, the former contains the probabilities of the ROSE algorithm the latter a binary map.

Other Fundus Vessel Segmentations Included

In this dataset, we have also included the output of two recent vessel segmentation algorithms trained on external datasets with manual vessel segmentations. SA-Unet (Li et. al, 2020) and IterNet (Guo et. al, 2021).

SA-Unet. The full images are available at the fov45/SA_Unet directory. Then, we extracted the FOVs corresponding to the en-face OCT-A images collected in cropped/SA_Unet/disc and cropped/SA_Unet/macula.

IterNet. The full images are available at the fov45/Iternet directory. Then, we extracted the FOVs corresponding to the en-face OCT-A images collected in cropped/Iternet/disc and cropped/Iternet/macula.

Train/Validation/Test Replication

In order to replicate or compare your model to the results of our paper, we report below the data split used.

Training subjects IDs: 1 - 25

Validation subjects IDs: 26 - 30

Testing subjects IDs: 31 - 112

Data Acquisition

This dataset was acquired at the Texas Medical Center - Memorial Hermann Hospital in accordance with the guidelines from the Helsinki Declaration and it was approved by the UTHealth IRB with protocol HSC-MS-19-0352.

User Agreement

The UT-FSOCTA dataset is free to use for non-commercial scientific research only. In case of any publication the following paper needs to be cited

Coronado I, Pachade S, Trucco E, Abdelkhaleq R, Yan J, Salazar-Marioni S, Jagolino-Cole A, Bahrainian M, Channa R, Sheth SA, Giancardo L. Synthetic OCT-A blood vessel maps using fundus images and generative adversarial networks. Sci Rep 2023;13:15325. https://doi.org/10.1038/s41598-023-42062-9.

Funding

This work is supported by the Translational Research Institute for Space Health through NASA Cooperative Agreement NNX16AO69A.

Research Team and Acknowledgements

Here are the people behind this data acquisition effort:

Ivan Coronado, Samiksha Pachade, Rania Abdelkhaleq, Juntao Yan, Sergio Salazar-Marioni, Amanda Jagolino, Mozhdeh Bahrainian, Roomasa Channa, Sunil Sheth, Luca Giancardo

We would also like to acknowledge for their support: the Institute for Stroke and Cerebrovascular Diseases at UTHealth, the VAMPIRE team at University of Dundee, UK and Memorial Hermann Hospital System.

References

Coronado I, Pachade S, Trucco E, Abdelkhaleq R, Yan J, Salazar-Marioni S, Jagolino-Cole A, Bahrainian M, Channa R, Sheth SA, Giancardo L. Synthetic OCT-A blood vessel maps using fundus images and generative adversarial networks. Sci Rep 2023;13:15325. https://doi.org/10.1038/s41598-023-42062-9. C. Guo, M. Szemenyei, Y. Yi, W. Wang, B. Chen, and C. Fan, "SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation," in 2020 25th International Conference on Pattern Recognition (ICPR), Jan. 2021, pp. 1236–1242. doi: 10.1109/ICPR48806.2021.9413346. L. Li, M. Verma, Y. Nakashima, H. Nagahara, and R. Kawasaki, "IterNet: Retinal Image Segmentation Utilizing Structural Redundancy in Vessel Networks," 2020 IEEE Winter Conf. Appl. Comput. Vis. WACV, 2020, doi: 10.1109/WACV45572.2020.9093621. Y. Ma et al., "ROSE: A Retinal OCT-Angiography Vessel Segmentation Dataset and New Model," IEEE Trans. Med. Imaging, vol. 40, no. 3, pp. 928–939, Mar. 2021, doi: 10.1109/TMI.2020.3042802.
Data from: Algorithm 1.
plos.figshare.com
xls
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane (2023). Algorithm 1. [Dataset]. http://doi.org/10.1371/journal.pone.0122777.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0122777.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Algorithm 1Algorithm 1.
Structural measures for Ewthmax varying.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane (2023). Structural measures for Ewthmax varying. [Dataset]. http://doi.org/10.1371/journal.pone.0122777.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0122777.t007
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Structural measures for Ewthmax varyingStructural measures for Ewthmax varying.
f
Structural measures for Ebtwmax varying and Ewthmax=14.
figshare.com
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane (2023). Structural measures for Ebtwmax varying and Ewthmax=14. [Dataset]. http://doi.org/10.1371/journal.pone.0122777.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0122777.t006
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Structural measures for Ebtwmax varying and Ewthmax=14Structural measures for Ebtwmax varying and Ewthmax=14.
P
MATHWELL Human Annotation Dataset Dataset
paperswithcode.com
Updated Feb 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bryan R Christ; Jonathan Kropko; Thomas Hartvigsen (2024). MATHWELL Human Annotation Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/mathwell-human-annotation-dataset
Explore at:
Dataset updated
Feb 23, 2024
Authors
Bryan R Christ; Jonathan Kropko; Thomas Hartvigsen
Description
The MATHWELL Human Annotation Dataset contains 5,084 synthetic word problems and answers generated by MATHWELL, a reference-free educational grade school math word problem generator released in MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations, and comparison models (GPT-4, GPT-3.5, Llama-2, MAmmoTH, and LLEMMA) with expert human annotations for solvability, accuracy, appropriateness, and meets all criteria (MaC). Solvability means the problem is mathematically possible to solve, accuracy means the Program of Thought (PoT) solution arrives at the correct answer, appropriateness means that the mathematical topic is familiar to a grade school student and the question's context is appropriate for a young learner, and MaC denotes questions which are labeled as solvable, accurate, and appropriate. Null values for accuracy and appropriateness indicate a question labeled as unsolvable, which means it cannot have an accurate solution and is automatically inappropriate. Based on our annotations, 82.2% of the question/answer pairs are solvable, 87.3% have accurate solutions, 78.1% are appropriate, and 58.4% meet all criteria.

This dataset is designed to train text classifiers to automatically label word problem generator outputs for solvability, accuracy, and appropriateness. More details about the dataset can be found in our paper.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane (2023). Community homogeneity measures for θ varying. [Dataset]. http://doi.org/10.1371/journal.pone.0122777.t005

Community homogeneity measures for θ varying.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0122777.t005

Dataset updated

Jun 1, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Christine Largeron; Pierre-Nicolas Mougel; Reihaneh Rabbany; Osmar R. Zaïane

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Community homogeneity measures for θ varyingCommunity homogeneity measures for θ varying.

Clear search

Close search

Google apps

Main menu

Community homogeneity measures for θ varying.

The Residential Population Generator (RPGen): A tool to parameterize...

LAM Synthetic Forecast Generation Dataset

Synthetic XES Event Log of Malignant Melanoma Treatment

Tango Spacecraft Dataset for Region of Interest Estimation and Semantic...

Data from: Algorithm 2.

Synthetic gene expression data with underlying gene network

NHG Synthetic Forecast generation dataset

UTHealth - Fundus and Synthetic OCT-A Dataset (UT-FSOCTA)

Data from: Algorithm 1.

Structural measures for Ewthmax varying.

Structural measures for Ebtwmax varying and Ewthmax=14.

MATHWELL Human Annotation Dataset Dataset

Community homogeneity measures for θ varying.