Deep learning has emerged as a robust tool for automating feature extraction from 3D images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artificial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus Menardella to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and ..., Data collection 50 planktonic foraminifera, comprising 4 Menardella menardii, 17 Menardella limbata, 18 Menardella exilis, and 11 Menardella pertenuis specimens, were used in our analyses (electronic supplementary material, figures S1 and S2). The taxonomic classification of these species was established based on the analysis of morphological characteristics observed in their shells. In this context, all species are characterised by lenticular, low trochosprial tests with a prominent keel [13]. Discrimination among these species is achievable, as M. limbata can be distinguished from its ancestor, M. menardii, by having a greater number of chambers and a smaller umbilicus. Moreover, M. exilis and M. pertenuis can be discerned from M. limbata by their thinner, more polished tests and reduced trochospirality. Furthermore, M. pertenuis is identifiable by a thin plate extending over the umbilicus and possessing a greater number of chambers in the final whorl compared to M. exilis [13]. The s..., , # Data from: How many specimens make a sufficient training set for automated three dimensional feature extraction?
https://doi.org/10.5061/dryad.1rn8pk12f
All computer code and final raw data used for this research work are stored in GitHub: https://github.com/JamesMulqueeney/Automated-3D-Feature-Extraction and have been archived within the Zenodo repository:Â https://doi.org/10.5281/zenodo.11109348.Â
This data is the additional primary data used in each analysis. These include: CT Image Files, Manual Segmentation Files (use for training or analysis), Inputs and Outputs for Shape Analysis and an example .h5 file which can be used to practice AI segmentation.Â
The primary data is arranged into the following:
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Machine learning‐based behaviour classification using acceleration data is a powerful tool in bio‐logging research. Deep learning architectures such as convolutional neural networks (CNN), long short‐term memory (LSTM) and self‐attention mechanisms as well as related training techniques have been extensively studied in human activity recognition. However, they have rarely been used in wild animal studies. The main challenges of acceleration‐based wild animal behaviour classification include data shortages, class imbalance problems, various types of noise in data due to differences in individual behaviour and where the loggers were attached and complexity in data due to complex animal‐specific behaviours, which may have limited the application of deep learning techniques in this area. To overcome these challenges, we explored the effectiveness of techniques for efficient model training: data augmentation, manifold mixup and pre‐training of deep learning models with unlabelled data, using datasets from two species of wild seabirds and state‐of‐the‐art deep learning model architectures. Data augmentation improved the overall model performance when one of the various techniques (none, scaling, jittering, permutation, time‐warping and rotation) was randomly applied to each data during mini‐batch training. Manifold mixup also improved model performance, but not as much as random data augmentation. Pre‐training with unlabelled data did not improve model performance. The state‐of‐the‐art deep learning models, including a model consisting of four CNN layers, an LSTM layer and a multi‐head attention layer, as well as its modified version with shortcut connection, showed better performance among other comparative models. Using only raw acceleration data as inputs, these models outperformed classic machine learning approaches that used 119 handcrafted features. Our experiments showed that deep learning techniques are promising for acceleration‐based behaviour classification of wild animals and highlighted some challenges (e.g. effective use of unlabelled data). There is scope for greater exploration of deep learning techniques in wild animal studies (e.g. advanced data augmentation, multimodal sensor data use, transfer learning and self‐supervised learning). We hope that this study will stimulate the development of deep learning techniques for wild animal behaviour classification using time‐series sensor data.
This abstract is cited from the original article "Exploring deep learning techniques for wild animal behaviour classification using animal-borne accelerometers" in Methods in Ecology and Evolution (Otsuka et al., 2024).Please see README for the details of the datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 4. A Microsoft® Excel® workbook that details the raw data for the 8 experiments in which either the test set was augmented alone (after its allocation) or augmentation of the whole dataset was done before test-set allocation. All of the image-classification output probabilities are included.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Brain Tumor Detection Research Paper Code and Dataset
Paper title: Transforming brain tumor detection: the impact of YOLO models and MRI orientations.
Authored by: Yazan Al-Smadi, Ahmad Al-Qerem, et al. (2023)
This project contains a full version of the used brain tumor dataset and a full code version of the proposed research methodology.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The purified dataset for data augmentation for DAISM-DNNXMBD can be downloaded from this repository.
The pbmc8k dataset downloaded from 10X Genomics were processed and uesd for data augmentation to create training datasets for training DAISM-DNN models. pbmc8k.h5ad contains 5 cell types (B.cells, CD4.T.cells, CD8.T.cells, monocytic.lineage, NK.cells), and pbmc8k_fine.h5ad cantains 7 cell types (naive.B.cells, memory.B.cells, naive.CD4.T.cells, memory.CD4.T.cells,naive.CD8.T.cells, memory.CD8.T.cells, regulatory.T.cells, monocytes, macrophages, myeloid.dendritic.cells, NK.cells).
For RNA-seq dataset, it contains 5 cell types (B.cells, CD4.T.cells, CD8.T.cells, monocytic.lineage, NK.cells). Raw FASTQ reads were downloaded from the NCBI website, and transcription and gene-level expression quantification were performed using Salmon (version 0.11.3) with Gencode v29 after quality control of FASTQ reads using fastp. All tools were used with default parameters.
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
Dataset used for data augmentation in the training phase of the Variable Misuse tool. It contains some source code files extracted from third-party repositories.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.
Automated species identification and delimitation is challenging, particularly in rare and thus often scarcely sampled species, which do not allow sufficient discrimination of infraspecific versus interspecific variation. Typical problems arising from either low or exaggerated interspecific morphological differentiation are best met by automated methods of machine learning that learn efficient and effective species identification from training samples. However, limited infraspecific sampling remains a key challenge also in machine learning. In this study, we assessed whether a data augmentation approach may help to overcome the problem of scarce training data in automated visual species identification. The stepwise augmentation of data comprised image rotation as well as visual and virtual augmentation. The visual data augmentation applies classic approaches of data augmentation and generation of artificial images using a Generative Adversarial Networks (GAN) approach. Descriptive featu...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this project, we aim to annotate car images captured on highways. The annotated data will be used to train machine learning models for various computer vision tasks, such as object detection and classification.
For this project, we will be using Roboflow, a powerful platform for data annotation and preprocessing. Roboflow simplifies the annotation process and provides tools for data augmentation and transformation.
Roboflow offers data augmentation capabilities, such as rotation, flipping, and resizing. These augmentations can help improve the model's robustness.
Once the data is annotated and augmented, Roboflow allows us to export the dataset in various formats suitable for training machine learning models, such as YOLO, COCO, or TensorFlow Record.
By completing this project, we will have a well-annotated dataset ready for training machine learning models. This dataset can be used for a wide range of applications in computer vision, including car detection and tracking on highways.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The research focus for this study is to generate a larger respiration dataset for the creation of elderly respiration Digital Twin (DT) model. Initial experimental data is collected with an unobtrusive Wi-Fi sensor with Channel State Information (CSI) characteristics to collect the subject's respiration rate.
The generation of a DT model requires extensive and diverse data. Due to limited resources and the need for extensive experimentation, the data is generated by implementing a novel statistical time series data augmentation method on single-subject respiration data. The larger synthetic respiration datasets will allow for testing the signal processing methodologies for noise removal,Breaths Per Minute (BPM) estimation, extensive Artificial Intelligence (AI) implementation.
The sensor data is for BPM from 12BPM to 25BPM for a single subject. Normal respiration rate ranges from 12BPM to 16BPM and beyond this is considered abnormal BPM. A total of 14 files are present in the dataset. Each file is labeled according to the BPM. All 30 patient data are present for each BPM. Patient are numbered as "P1, P2, P3, .... untill P30"
This data can be utilized by researchers and scientists toward the development of novel signal processing methodologies in the respiration DT model. These larger respiration datasets can be utilized for Machine Learning (ML) and Deep Learning (DL) in providing predictive analysis and classification of multi-patient respiration in the DT model for an elderly respiration rate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This extensive dataset presents a meticulously curated collection of low-resolution images showcasing 20 well-established rice varieties native to diverse regions of Bangladesh. The rice samples were carefully gathered from both rural areas and local marketplaces, ensuring a comprehensive and varied representation. Serving as a visual compendium, the dataset provides a thorough exploration of the distinct characteristics of these rice varieties, facilitating precise classification.
The dataset encompasses 20 distinct classes, encompassing Subol Lota, Bashmoti (Deshi), Ganjiya, Shampakatari, Sugandhi Katarivog, BR-28, BR-29, Paijam, Bashful, Lal Aush, BR-Jirashail, Gutisharna, Birui, Najirshail, Pahari Birui, Polao (Katari), Polao (Chinigura), Amon, Shorna-5, and Lal Binni. In total, the dataset comprises 4,730 original JPG images and 23,650 augmented images.
These images were captured using an iPhone 11 camera with a 5x zoom feature. Each image capturing these rice varieties was diligently taken between October 18 and November 29, 2023. To facilitate efficient data management and organization, the dataset is structured into two variants: Original images and Augmented images. Each variant is systematically categorized into 20 distinct sub-directories, each corresponding to a specific rice variety.
The primary image set comprises 4,730 JPG images, uniformly sized at 853 × 853 pixels. Due to the initial low resolution, the file size was notably 268 MB. Employing compression through a zip program significantly optimized the dataset, resulting in a final size of 254 MB.
To address the substantial image volume requirements of deep learning models for machine vision, data augmentation techniques were implemented. Total 23,650 images was obtained from augmentation. These augmented images, also in JPG format and uniformly sized at 512 × 512 pixels, initially amounted to 781 MB. However, post-compression, the dataset was further streamlined to 699 MB.
The raw and augmented datasets are stored in two distinct zip files, namely 'Original.zip' and 'Augmented.zip'. Both zip files contain 20 sub-folders representing a unique rice variety, namely 1_Subol_Lota, 2_Bashmoti, 3_Ganjiya, 4_Shampakatari, 5_Katarivog, 6_BR28, 7_BR29, 8_Paijam, 9_Bashful, 10_Lal_Aush, 11_Jirashail, 12_Gutisharna, 13_Red_Cargo,14_Najirshail, 15_Katari_Polao, 16_Lal_Biroi, 17_Chinigura_Polao, 18_Amon, 19_Shorna5, 20_Lal_Binni.
To ease the experimenting process for the researchers we have balanced the data and split it in an 80:20 train-test ratio. The ‘Train_n_Test.zip’ folder contains two sub-directories: ‘1_TEST’ which contains 1125 images per class and ‘2_VALID’ which contains 225 images per class.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BIRD is an open dataset that consists of 100,000 multichannel room impulse responses generated using the image method. This makes it the largest multichannel open dataset currently available. We provide some Python code that shows how to download and use this dataset to perform online data augmentation. The code is compatible with the PyTorch dataset class, which eases integration in existing deep learning projects based on this framework.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The synthetic data solution market is experiencing robust growth, driven by increasing demand for data privacy and security, coupled with the need for large, high-quality datasets for training AI and machine learning models. The market, currently estimated at $2 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching an estimated market value of over $10 billion by 2033. This expansion is fueled by several key factors: stringent data privacy regulations like GDPR and CCPA, which restrict the use of real personal data; the rise of synthetic data generation techniques enabling the creation of realistic, yet privacy-preserving datasets; and the increasing adoption of AI and ML across various industries, particularly financial services, retail, and healthcare, creating a high demand for training data. The cloud-based segment is currently dominating the market, owing to its scalability, accessibility, and cost-effectiveness. The geographical distribution shows North America and Europe as leading regions, driven by early adoption of AI and robust data privacy regulations. However, the Asia-Pacific region is expected to witness significant growth in the coming years, propelled by the rapid expansion of the technology sector and increasing digitalization efforts in countries like China and India. Key players like LightWheel AI, Hanyi Innovation Technology, and Baidu are strategically investing in research and development, fostering innovation and expanding their market presence. While challenges such as the complexity of synthetic data generation and potential biases in generated data exist, the overall market outlook remains highly positive, indicating significant opportunities for growth and innovation in the coming decade. The "Others" application segment represents a promising area for future growth, encompassing sectors such as manufacturing, energy, and transportation, where synthetic data can address specific data challenges.
Description:
The Printed Digits Dataset is a comprehensive collection of approximately 3,000 grayscale images, specifically curate for numeric digit classification tasks. Originally create with 177 images, this dataset has undergone extensive augmentation to enhance its diversity and utility, making it an ideal resource for machine learning projects such as Sudoku digit recognition.
Dataset Composition:
Image Count: The dataset contains around 3,000 images, each representing a single numeric digit from 0 to 9.
Image Dimensions: Each image is standardized to a 28×28 pixel resolution, maintaining a consistent grayscale format.
Purpose: This dataset was develop with a specific focus on Sudoku digit classification. Notably, it includes blank images for the digit '0', reflecting the common occurrence of empty cells in Sudoku puzzles.
Download Dataset
Augmentation Details:
To expand the original dataset from 177 images to 3,000, a variety of data augmentation techniques were apply. These include:
Rotation: Images were rotated to simulate different orientations of printed digits.
Scaling: Variations in the size of digits were introduced to mimic real-world printing inconsistencies.
Translation: Digits were shifted within the image frame to represent slight misalignments often seen in printed text.
Noise Addition: Gaussian noise was added to simulate varying print quality and scanner imperfections.
Applications:
Sudoku Digit Recognition: Given its design, this dataset is particularly well-suited for training models to recognize and classify digits in Sudoku puzzles.
Handwritten Digit Classification: Although the dataset contains printed digits, it can be adapted and utilized in combination with handwritten digit datasets for broader numeric
classification tasks.
Optical Character Recognition (OCR): This dataset can also be valuable for training OCR systems, especially those aim at processing low-resolution or small-scale printed text.
Dataset Quality:
Uniformity: All images are uniformly scaled and aligned, providing a clean and consistent dataset for model training.
Diversity: Augmentation has significantly increased the diversity of digit representation, making the dataset robust for training deep learning models.
Usage Notes:
Zero Representation: Users should note that the digit '0' is represented by a blank image.
This design choice aligns with the specific application of Sudoku puzzle solving but may require adjustments if the dataset is use for other numeric classification tasks.
Preprocessing Required: While the dataset is ready for use, additional preprocessing steps, such as normalization or further augmentation, can be applied based on the specific requirements of the intended machine learning model.
File Format:
The images are stored in a standardized format compatible with most machine learning frameworks, ensuring ease of integration into existing workflows.
Conclusion: The Printed Digits Dataset offers a rich resource for those working on digit classification projects, particularly within the context of Sudoku or other numeric-based puzzles. Its extensive augmentation and attention to application-specific details make it a valuable asset for both academic research and practical Al development.
This dataset is sourced from Kaggle.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Having fluids that are stable over time is important for many applications, particularly sustainable aviation fuels (SAFs) derived from various renewable sources. Being able to understand this characteristic as early as possible during the development of SAFs would facilitate the blending of renewable sources with or without fossil fuels. Oxidation stability, defined as a hydrocarbon’s resistance to reacting with oxygen at near-ambient temperatures, is one of the most important hydrocarbon-stability-related properties. Indeed, the accumulation of byproducts of oxidation reactions may result in system failures. Assessing this property experimentally remains time-consuming; thus developing fast and accurate predictive models becomes relevant and approaches based on machine learning appear as valuable alternatives. The development of quantitative structure–property relationships (QSPRs) is subject to the availability of reference data, and unfortunately, these are currently lacking in the literature. In this study, we built a database containing consistent experimental results from accelerated oxidation tests conducted on diverse pure hydrocarbonswithin the carbon atom number range of SAFsusing the PetroOxy/RapidOxy test method, and second, we applied two machine-learning-based techniques (SVM and XGBoost) on the generated data set to derive QSPR-based models. The contribution of techniques such as data augmentation applied to our data set was also investigated and compared to more classical approaches. The best model (RMSEP = 2.7 h) was obtained after log-transforming the reference Induction Period, performing Smart Data Augmentation to enrich the database content, and using XGBoost with linear learners. While the model’s accuracy is not adequate for quantitative predictions, it allows fast and semiquantitative predictions.
Here, we provide a dataset of images of interfaces from household appliances, where all interface elements are labled with one of five different types of interface elements. Further, we provide auxillary materials to use and extend the dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Item response probabilities of DINA, DINO and ACDM models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset, sourced from Vimruli Guava Garden and Floating Market in Jhalakathi, Barisal, categorizes guava leaf and fruit conditions for better crop management. It includes images of healthy and diseased samples, making it a valuable resource for researchers and practitioners working on machine learning models to identify plant diseases. The dataset includes six classes for robust model training.
Dataset Summary: Location: Vimruli Guava Garden & Floating Market, Jhalakathi, Barisal. Subjects: Guava leaves and fruits. Purpose: Classification and detection of guava plant conditions.
Data Distribution: Classes: 1. Algal Leaves Spot: 100 original, 1320 augmented, 1420 total 2. Dry Leaves: 52 original, 676 augmented, 728 total 3. Healthy Fruit: 50 original, 650 augmented, 700 total 4. Healthy Leaves: 150 original, 1600 augmented, 1750 total 5. Insects Eaten: 164 original, 1720 augmented, 1884 total 6. Red Rust: 90 original, 1170 augmented, 1260 total
Total Samples: Original: 606 Augmented: 7136 Overall: 7742 samples
Class Details: 1. Algal Leaves Spot: Fungal spots on leaves. 2. Dry Leaves: Leaves dried from environmental/nutrient factors. 3. Healthy Fruit/Leaves: Free of diseases/damage. 4. Insects Eaten: Insect-caused damage on leaves. 5. Red Rust: Reddish spots due to fungal infection.
This dataset is well-suited for training and evaluating machine learning models to detect and classify various conditions of guava plants, aiding in automated disease identification and better agricultural management.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Under two conditions—natural ripening and accelerated ripening using calcium carbide—this dataset on "Ripen Banana" explores the many phases of ripening in Musa Sapientum, often known as the Sabri banana. Starting on August 26, 2024, the seven-day controlled experiment collected the data till September 2, 2024. Following Calcium Carbide treatment and at specified times for naturally ripened bananas, we took 1,404 original photos using a phone camera at two-hour intervals. Along with 1,093 naturally ripened bananas, the collection included 311 photos of carbide-treated bananas that attained full ripening by the second day. Moreover, methods of data augmentation were used to attain class balance, producing 2,814 enhanced photos for the naturally ripened batch and 3,596 augmented images for the batch treated with carbide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets (Tr0, Va0, Te0, Tr1, Va1, Te1, Te2) consisting of partial discharge (PD) and noise signals (NonPD) from electrical machines referred to in the publication "Deep learning and data augmentation for partial discharge detection in electrical machines" (DOI: https://doi.org/10.1016/j.engappai.2024.108074 )
Deep learning has emerged as a robust tool for automating feature extraction from 3D images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artificial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus Menardella to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and ..., Data collection 50 planktonic foraminifera, comprising 4 Menardella menardii, 17 Menardella limbata, 18 Menardella exilis, and 11 Menardella pertenuis specimens, were used in our analyses (electronic supplementary material, figures S1 and S2). The taxonomic classification of these species was established based on the analysis of morphological characteristics observed in their shells. In this context, all species are characterised by lenticular, low trochosprial tests with a prominent keel [13]. Discrimination among these species is achievable, as M. limbata can be distinguished from its ancestor, M. menardii, by having a greater number of chambers and a smaller umbilicus. Moreover, M. exilis and M. pertenuis can be discerned from M. limbata by their thinner, more polished tests and reduced trochospirality. Furthermore, M. pertenuis is identifiable by a thin plate extending over the umbilicus and possessing a greater number of chambers in the final whorl compared to M. exilis [13]. The s..., , # Data from: How many specimens make a sufficient training set for automated three dimensional feature extraction?
https://doi.org/10.5061/dryad.1rn8pk12f
All computer code and final raw data used for this research work are stored in GitHub: https://github.com/JamesMulqueeney/Automated-3D-Feature-Extraction and have been archived within the Zenodo repository:Â https://doi.org/10.5281/zenodo.11109348.Â
This data is the additional primary data used in each analysis. These include: CT Image Files, Manual Segmentation Files (use for training or analysis), Inputs and Outputs for Shape Analysis and an example .h5 file which can be used to practice AI segmentation.Â
The primary data is arranged into the following: