26 datasets found

m
A dataset for conduction heat transer and deep learning
data.mendeley.com
Updated Jun 25, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Edalatifar (2020). A dataset for conduction heat transer and deep learning [Dataset]. http://doi.org/10.17632/rw9yk3c559.1
Explore at:
Unique identifier
https://doi.org/10.17632/rw9yk3c559.1
Dataset updated
Jun 25, 2020
Authors
Mohammad Edalatifar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Big data images for conduction heat transfer The related paper has been published here: M. Edalatifar, M.B. Tavakoli, M. Ghalambaz, F. Setoudeh, Using deep learning to learn physics of conduction heat transfer, Journal of Thermal Analysis and Calorimetry; 2020. https://doi.org/10.1007/s10973-020-09875-6 Steps to reproduce: The dataset is saved in two format, .npz for python and .mat for matlab. *.mat has large size, then it is compressed with winzip. ReadDataset_Python.py and ReadDataset_Matlab.m are examples of read data using python and matlab respectively. For use dataset in matlab download Dataset/HeatTransferPhenomena_35_58.zip, unzip it and then use ReadDataset_Matlab.m as an example. In case of python, download Dataset/HeatTransferPhenomena_35_58.npz and run ReadDataset_Python.py.
Physical properties of natural coarse aggregate.
plos.figshare.com
xls
Updated May 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Jagadesh; Afzal Hussain Khan; B. Shanmuga Priya; A. Asheeka; Zineb Zoubir; Hassan M. Magbool; Shamshad Alam; Omer Y. Bakather (2024). Physical properties of natural coarse aggregate. [Dataset]. http://doi.org/10.1371/journal.pone.0303101.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303101.t004
Dataset updated
May 13, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
P. Jagadesh; Afzal Hussain Khan; B. Shanmuga Priya; A. Asheeka; Zineb Zoubir; Hassan M. Magbool; Shamshad Alam; Omer Y. Bakather
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This research study aims to understand the application of Artificial Neural Networks (ANNs) to forecast the Self-Compacting Recycled Coarse Aggregate Concrete (SCRCAC) compressive strength. From different literature, 602 available data sets from SCRCAC mix designs are collected, and the data are rearranged, reconstructed, trained and tested for the ANN model development. The models were established using seven input variables: the mass of cementitious content, water, natural coarse aggregate content, natural fine aggregate content, recycled coarse aggregate content, chemical admixture and mineral admixture used in the SCRCAC mix designs. Two normalization techniques are used for data normalization to visualize the data distribution. For each normalization technique, three transfer functions are used for modelling. In total, six different types of models were run in MATLAB and used to estimate the 28th day SCRCAC compressive strength. Normalization technique 2 performs better than 1 and TANSING is the best transfer function. The best k-fold cross-validation fold is k = 7. The coefficient of determination for predicted and actual compressive strength is 0.78 for training and 0.86 for testing. The impact of the number of neurons and layers on the model was performed. Inputs from standards are used to forecast the 28th day compressive strength. Apart from ANN, Machine Learning (ML) techniques like random forest, extra trees, extreme boosting and light gradient boosting techniques are adopted to predict the 28th day compressive strength of SCRCAC. Compared to ML, ANN prediction shows better results in terms of sensitive analysis. The study also extended to determine 28th day compressive strength from experimental work and compared it with 28th day compressive strength from ANN best model. Standard and ANN mix designs have similar fresh and hardened properties. The average compressive strength from ANN model and experimental results are 39.067 and 38.36 MPa, respectively with correlation coefficient is 1. It appears that ANN can validly predict the compressive strength of concrete.
Dataset for Crack Detection in Images of Bricks and Masonry Using CNNs
zenodo.org
data.niaid.nih.gov
+2more
Updated Jul 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artur Krukowski; Artur Krukowski (2022). Dataset for Crack Detection in Images of Bricks and Masonry Using CNNs [Dataset]. http://doi.org/10.5281/zenodo.6870108
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6870108
Dataset updated
Jul 21, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Artur Krukowski; Artur Krukowski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset for training CNN built from aerial drone images of buildings in Hamburg

This dataset contains images extracted from aerial surveillance photos of the Speicherstadt and Kesselhaus buildings in Hamburg, provided by the City of Hamburg. Original 834 high resolution images (5472 x 3648 pixels) have been separated into smaller images (227 x 227 pixels) of the size that could be processed using SqueezeNet, a deep Convolutional Neural Network (CNN). This resulted in more than 350 thousand images that had to be subsequently processed automatically to retain images containing solely bricks and mortar and concrete. The final stage contained tedious manual/visual verification of images and their separation into positive (containing cracks) and negative (clear bricks and mortars) sets of images. The final set contains nearly 40 thousand images.

Since images extracted from Hamburg buildings contained only specific type of bricks and our intention was to extend the CNN to be able to deal with wider range of brick types as well as concrete surfaces, we added to our training set also images from the following Open Access databases (note that such images required resizing to 227 x 227 pixel size before use):

Concrete Crack Images for Classification (Mendeley Data)

Dataset for Crack Detection in Images of Masonry Using CNNs

Such a combined data set resulted in over 80 thousand of images.

Matlab WebApp Server application based on trained SqueezeNet CNN

The integrated database of images has been used to train the SqueezeNet CNN using a method proposed by Kenta Itakura in his article published on Matlab Central: Classify crack image using deep learning and explain "WHY", which in turn is based on the work of Lei Zhang reported in his IEEE article: Road crack detection using deep convolutional neural network published at 2016 IEEE International Conference on Image Processing (ICIP).

The "Matlab" subfolder contains the complete software to allow building the application to run under Matlab WebApps Server. The provided version of the "netTransfer.mat" file has been compiled for Matlab revision 2020b, but it should also work when compiled for other revisions from 2019a onwards. BTW, the original location of the files was "D:\Cracks (2-class)\". For instructions how to use the provided Matlab files, refer to Matlab instructions at MATLAB Web App Server and Get Started with MATLAB Web App Server.

After producing and uploading the application to the Matlab WebApps Server, the application can be found at http://localhost:9988/webapps/home/ if deployed locally. It can be also deployed on a WEB server, subject to installation of the compliant Matlab Runtime package on the custom server, whcih can be found at MATLAB Runtimes (mathworks.com).

The important function included in the package is "unscramble.m", which corrects the error that exists in all known revisions of Matlab in uploading images selected by open file function in the Matlab App Designer. The effect is that image is "scrambled beyond recognition" after uploading to the Matlab WebApps Server. Our function de-scrambles such images, converting them into their original form.
m
Ultimate_Analysis
data.mendeley.com
Updated Jan 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akara Kijkarncharoensin (2022). Ultimate_Analysis [Dataset]. http://doi.org/10.17632/t8x96g88p3.2
Explore at:
Unique identifier
https://doi.org/10.17632/t8x96g88p3.2
Dataset updated
Jan 28, 2022
Authors
Akara Kijkarncharoensin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This database studies the performance inconsistency on the biomass HHV ultimate analysis. The research null hypothesis is the consistency in the rank of a biomass HHV model. Fifteen biomass models are trained and tested in four datasets. In each dataset, the rank invariability of these 15 models indicates the performance consistency.

The database includes the datasets and source codes to analyze the performance consistency of the biomass HHV. These datasets are stored in tabular on an excel workbook. The source codes are the biomass HHV machine learning model through the MATLAB Objected Orient Program (OOP). These machine learning models consist of eight regressions, four supervised learnings, and three neural networks.

An excel workbook, "BiomassDataSetUltimate.xlsx," collects the research datasets in six worksheets. The first worksheet, "Ultimate," contains 908 HHV data from 20 pieces of literature. The names of the worksheet column indicate the elements of the ultimate analysis on a % dry basis. The HHV column refers to the higher heating value in MJ/kg. The following worksheet, "Full Residuals," backups the model testing's residuals based on the 20-fold cross-validations. The article (Kijkarncharoensin & Innet, 2021) verifies the performance consistency through these residuals. The other worksheets present the literature datasets implemented to train and test the model performance in many pieces of literature.

A file named "SourceCodeUltimate.rar" collects the MATLAB machine learning models implemented in the article. The list of the folders in this file is the class structure of the machine learning models. These classes extend the features of the original MATLAB's Statistics and Machine Learning Toolbox to support, e.g., the k-fold cross-validation. The MATLAB script, name "runStudyUltimate.m," is the article's main program to analyze the performance consistency of the biomass HHV model through the ultimate analysis. The script instantly loads the datasets from the excel workbook and automatically fits the biomass model through the OOP classes.

The first section of the MATLAB script generates the most accurate model by optimizing the model's higher parameters. It takes a few hours for the first run to train the machine learning model via the trial and error process. The trained models can be saved in MATLAB .mat file and loaded back to the MATLAB workspace. The remaining script, separated by the script section break, performs the residual analysis to inspect the performance consistency. Furthermore, the figure of the biomass data in the 3D scatter plot, and the box plots of the prediction residuals are exhibited. Finally, the interpretations of these results are examined in the author's article.

Reference : Kijkarncharoensin, A., & Innet, S. (2022). Performance inconsistency of the Biomass Higher Heating Value (HHV) Models derived from Ultimate Analysis [Manuscript in preparation]. University of the Thai Chamber of Commerce.
The codes and dataset for paper Estimation of Global Horizontal Irradiance...
figshare.com
rar
Updated Jun 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weipeng Xing; Guangyuan Zhang; Stefan Poslad (2020). The codes and dataset for paper Estimation of Global Horizontal Irradiance in China Using A Deep Learning Method [Dataset]. http://doi.org/10.6084/m9.figshare.12440447.v1
Explore at:
rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12440447.v1
Dataset updated
Jun 6, 2020
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Weipeng Xing; Guangyuan Zhang; Stefan Poslad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file includes the dataset and codes as well. All the programs are run on MATLAB, only the trained model and data are provided. The deep learning package used is deebnet (deep belief networks) toolbox in MATLAB and octave v3.2 disclosed by M. A. keyvanrad and M. M. homayounpour, which can be found in http://ceit.aut.ac.ir/~keyvanrad/DeeBNet Toolbox.html DownloadThis test needs to download the software package first.1. Ready to run environment1). Input "addpath ('deebnet ') in the command line window of MATLAB to indicate the introduction of the software package.2). Enter "load ('dbn) on the command line_ for_ ghi.mat '), load the trained deep training model. You will see that three kinds of DBNs correspond to clearday / icecloud / watercloud respectively.2. Only input parameter data after processing is provided. The step of how to convert satellite image into input parameter file is omitted.3. First execute to compute GHI. M and then run convertoraster. M. It is recommended to run the code in blocks instead of directly executing the whole file.See readme and code Notes for details.
Phase unwrapping using Deep Learning in Holographic Tomography - dataset.
zenodo.org
bin
Updated Mar 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michał Gontarz; Michał Gontarz; Vibekananda Dutta; Vibekananda Dutta; Małgorzata Kujawińska; Małgorzata Kujawińska; Wojciech Krauze; Wojciech Krauze (2023). Phase unwrapping using Deep Learning in Holographic Tomography - dataset. [Dataset]. http://doi.org/10.5281/zenodo.7589387
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7589387
Dataset updated
Mar 27, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michał Gontarz; Michał Gontarz; Vibekananda Dutta; Vibekananda Dutta; Małgorzata Kujawińska; Małgorzata Kujawińska; Wojciech Krauze; Wojciech Krauze
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains two types of data: phase images and trained model files.

Real phase images - these phase images are contained with the files named with the prefix "real_". The type of the data files is ".npz", to be loaded with NumPy (np.load()), as a dictionary. The data is stored within the key ["arr_0"]. The images depict cells [1], organoids [2], phantoms [3-4] and regular 3D printed structures with high scattering properties [5]. The images have been augmented in order to expand the volume of the training dataset. All images are of shape (256,256,1). It is a big dataset containing 27,189 images of each type for training the unwrapping model:

unwrapped - continuous phase distribution (float32)

wrapped - phase wrapped into mod2\(\pi\) (float32)

wrapcount - wrap count phase maps coded in the integer form (0,1,2...) (uint8)

Synthetic phase images - phase images in these files were generated algorithmically in the MATLAB programming language. The files containing this dataset have a prefix "synthetic_". The type of the data files is ".npz", to be loaded with NumPy (np.load()), as a dictionary. The data is stored within the key ["arr_0"]. Phase images contained in the synthetic dataset can be split into 3 types by their type: spherical distribution, simulated cells w/ spherical background and simulated cells w/ introduced linear tilt. All images are of shape (256,256,1). This dataset contains 10,000 images of each type for training the unwrapping and denoising models:

unwrapped - continuous phase distribution (float32)

wrapped - phase wrapped into mod2\(\pi\) (float32)

wrapcount - wrap count phase maps coded in the integer form (0,1,2...) (uint8)

noised - wrapped phase images w/ synthetic noise (float32).

Trained models - trained model files. These model files are in the format ".h5", which contains the model architecture and the weights. They have been developed and saved with the keras library, and are loaded with the keras.models.load_model() function. The models list:

Unet_Denoising_1.h5 - U-Net model used for denoising as an image translation task. The input is a wrapped phase image with noise and the output is the same wrapped phase distribution, but denoised. Model is trained on the synthetic phase dataset.

Unet_Denoising_2.h5 - Similar model to the Unet_Denoising_1.h5, which denoises wrapped phase images with equally good performance.

Attn_Unet_Unwrapping.h5 - U-Net model with Attention Gates and Residual Blocks trained for the semantic segmentation task. The input of the model is the wrapped phase image and its output is the wrap count map. Model is trained on the real phase dataset.

[1] M. Baczewska, W. Krauze, A. Kuś, P. Stępień, K. Tokarska, K. Zukowski, E. Malinowska, Z. Brzózka, and M. Kujawińska, “On-chip holographic tomography for quantifying refractive index changes of cells’ dynamics,” in Quantitative Phase Imaging VIII, vol. 11970 Y. Liu, G. Popescu, and Y. Park, eds., International Society for Optics and Photonics (SPIE, 2022), p. 1197008.
[2] P. Stępień, M. Ziemczonok, M. Kujawińska, M. Baczewska, L. Valenti, A. Cherubini, E. Casirati, and W. Krauze, “Numerical refractive index correction for the stitching procedure in tomographic quantitative phase imaging,” Biomed. Opt. Express 13, 5709–5720 (2022).
[3] M. Ziemczonok, A. Kuś, P. Wasylczyk, and M. Kujawińska, “3d-printed biological cell phantom for testing 3d quantitative phase imaging systems,” Sci. Reports 9, 1–9 (2019).
[4] M. Ziemczonok, A. Kuś, and M. Kujawińska, “Optical diffraction tomography meets metrology — measurement accuracy on cellular and subcellular level,” Measurement 195, 111106 (2022).
[5] W. Krauze, A. Kuś, M. Ziemczonok, M. Haimowitz, S. Chowdhury, and M. Kujawińska, “3d scattering microphantom sample to assess quantitative accuracy in tomographic phase microscopy techniques,” Sci. Reports 12, 1–9 (2022).
QLKNN11D training set
zenodo.org
data.niaid.nih.gov
+1more
bin, text/x-python
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karel Lucas van de Plassche; Karel Lucas van de Plassche; Jonathan Citrin; Jonathan Citrin (2023). QLKNN11D training set [Dataset]. http://doi.org/10.5281/zenodo.8017522
Explore at:
bin, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8017522
Dataset updated
Jun 8, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Karel Lucas van de Plassche; Karel Lucas van de Plassche; Jonathan Citrin; Jonathan Citrin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
QLKNN11D training set

This dataset contains a large-scale run of ~1 billion flux calculations of the quasilinear gyrokinetic transport model QuaLiKiz. QuaLiKiz is applied in numerous tokamak integrated modelling suites, and is openly available at https://gitlab.com/qualikiz-group/QuaLiKiz/. This dataset was generated with the 'QLKNN11D-hyper' tag of QuaLiKiz, equivalent to 2.8.1 apart from the negative magnetic shear filter being disabled. See https://gitlab.com/qualikiz-group/QuaLiKiz/-/tags/QLKNN11D-hyper for the in-repository tag.

The dataset is appropriate for the training of learned surrogates of QuaLiKiz, e.g. with neural networks. See https://doi.org/10.1063/1.5134126 for a Physics of Plasmas publication illustrating the development of a learned surrogate (QLKNN10D-hyper) of an older version of QuaLiKiz (2.4.0) with a 300 million point 10D dataset. The paper is also available on arXiv https://arxiv.org/abs/1911.05617 and the older dataset on Zenodo https://doi.org/10.5281/zenodo.3497066. For an application example, see Van Mulders et al 2021 https://doi.org/10.1088/1741-4326/ac0d12, where QLKNN10D-hyper was applied for ITER hybrid scenario optimization. For any learned surrogates developed for QLKNN11D, the effective addition of the alphaMHD input dimension through rescaling the input magnetic shear (s) by s = s - alpha_MHD/2, as carried out in Van Mulders et al., is recommended.

Related repositories:

General QuaLiKiz documentation https://qualikiz.com

QuaLiKiz/QLKNN input/output variables naming scheme https://qualikiz.com/QuaLiKiz/Input-and-output-variables

Training, plotting, filtering, and auxiliary tools https://gitlab.com/Karel-van-de-Plassche/QLKNN-develop

QuaLiKiz related tools https://gitlab.com/qualikiz-group/QuaLiKiz-pythontools

FORTRAN QLKNN implementation with wrapper for Python and MATLAB https://gitlab.com/qualikiz-group/QLKNN-fortran

Weights and biases of 'hyperrectangle style' QLKNN https://gitlab.com/qualikiz-group/qlknn-hype

Data exploration

The data is provided in 43 netCDF files. We advise opening single datasets using xarray or multiple datasets out-of-core using dask. For reference, we give the load times and sizes of a single variable that just depends on the scan size `dimx` below. This was tested single-core on a Intel Xeon 8160 CPU at 2.1 GHz and 192 GB of DDR4 RAM. Note that during loading, more memory is needed than the final number.

Timing of dataset loading
Amount of datasets Final in-RAM memory (GiB)
Loading time single var
(M:SS)
1 10.3 0:09
5 43.9 1:00
10 63.2 2:01
16 98.0 3:25
17 Out Of Memory x:xx

Full dataset

The full dataset of QuaLiKiz in-and-output data is available on request. Note that this is 2.2 TiB of netCDF files!
m
Data for: Automated detection of sigmatism using deep learning applied to...
data.mendeley.com
Updated Apr 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pawel Badura (2021). Data for: Automated detection of sigmatism using deep learning applied to multichannel speech signal [Dataset]. http://doi.org/10.17632/7cpgytgt6s.1
Explore at:
Unique identifier
https://doi.org/10.17632/7cpgytgt6s.1
Dataset updated
Apr 20, 2021
Authors
Pawel Badura
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The file contains three CNN 5-CH models in Matlab Deep Learning Toolbox format. The trained networks correspond to Experiments #1, #2, and #3b as described in the paper.
r
Dataset for Air-writing recognition based on Ultra-wide band Radar Sensor
research-repository.rmit.edu.au
researchdata.edu.au
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nermine Hendy; Haytham Fayek; Akram Al-Hourani (2023). Dataset for Air-writing recognition based on Ultra-wide band Radar Sensor [Dataset]. http://doi.org/10.25439/rmt.20225907.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25439/rmt.20225907.v2
Dataset updated
Jun 1, 2023
Dataset provided by
RMIT University
Authors
Nermine Hendy; Haytham Fayek; Akram Al-Hourani
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset for Air-writing recognition based on Ultrawideband Radar Sensor (Xethru X4-M03 from Novelda). The collected dataset includes air-written numbers from 0 to 9 using a uni-stroke writing technique. The Dataset contains 4 sub-folders and a MATLAB script. The dataset ia provided in CSV, MAT, and PNG formats. Please read the attached Dataset Description pdf file to understand the radar dataset collection setup, the included folders, the data format, and how to use the data for the air-writing recognition application.
Microgrid Optimal Energy Scheduling with Battery Degradation
figshare.com
zip
Updated Jan 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cunzhi Zhao; Xingpeng Li (2023). Microgrid Optimal Energy Scheduling with Battery Degradation [Dataset]. http://doi.org/10.6084/m9.figshare.21959582.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21959582.v1
Dataset updated
Jan 26, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Cunzhi Zhao; Xingpeng Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The uploaded package includes 3 parts: 1. Dataset and Matlab Simulator for Battery Aging Tests 2. Learning-ready Dataset and Python Codes for Training a Battery Degradation Neural Network Model 3. Microgrid Optimal Energy Scheduling with Battery Degradation Neural Network in Python

If you use these codes for your work, please cite the following paper: Cunzhi Zhao and Xingpeng Li, “Microgrid Optimal Energy Scheduling Considering Neural Network based Battery Degradation”, IEEE Transactions on Power Systems, early access, Jan. 2023. Paper website: https://rpglab.github.io/papers/CunzhiZhao-NNBD-MDS/
Rescaled Fashion-MNIST dataset
zenodo.org
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled Fashion-MNIST dataset [Dataset]. http://doi.org/10.5281/zenodo.15187793
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15187793
Dataset updated
Jun 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
Time period covered
Apr 10, 2025
Description
Motivation

The goal of introducing the Rescaled Fashion-MNIST dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.

The Rescaled Fashion-MNIST dataset was introduced in the paper:

[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

with a pre-print available at arXiv:

[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

Importantly, the Rescaled Fashion-MNIST dataset is more challenging than the MNIST Large Scale dataset, introduced in:

[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2.

Access and rights

The Rescaled Fashion-MNIST dataset is provided on the condition that you provide proper citation for the original Fashion-MNIST dataset:

[4] Xiao, H., Rasul, K., and Vollgraf, R. (2017) “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms”, arXiv preprint arXiv:1708.07747

and also for this new rescaled version, using the reference [1] above.

The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

The dataset

The Rescaled FashionMNIST dataset is generated by rescaling 28×28 gray-scale images of clothes from the original FashionMNIST dataset [4]. The scale variations are up to a factor of 4, and the images are embedded within black images of size 72x72, with the object in the frame always centred. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

There are 10 different classes in the dataset: “T-shirt/top”, “trouser”, “pullover”, “dress”, “coat”, “sandal”, “shirt”, “sneaker”, “bag” and “ankle boot”. In the dataset, these are represented by integer labels in the range [0, 9].

The dataset is split into 50 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 50 000 samples from the original Fashion-MNIST training set. The validation dataset, on the other hand, is formed from the final 10 000 images of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original Fashion-MNIST test set.

The h5 files containing the dataset

The training dataset file (~2.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

fashionmnist_with_scale_variations_tr50000_vl10000_te10000_outsize72-72_scte1p000_scte1p000.h5

Additionally, for the Rescaled FashionMNIST dataset, there are 9 datasets (~415 MB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2^k/4, with k being integers in the range [-4, 4]:

fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p500.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p595.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p707.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p841.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p000.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p189.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p414.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p682.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte2p000.h5

These dataset files were used for the experiments presented in Figures 6, 7, 14, 16, 19 and 23 in [1].

Instructions for loading the data set

The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

The training dataset can be loaded in Python as:

with h5py.File(`

x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)

We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))

The test datasets can be loaded in Python as:

with h5py.File(`

x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)

The test datasets can be loaded in Matlab as:

x_test = h5read(`

The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.

There is also a closely related Fashion-MNIST with translations dataset, which in addition to scaling variations also comprises spatial translations of the objects.
d
Data from: Machine learning without a processor: Emergent learning in a...
dataone.org
data.niaid.nih.gov
+1more
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sam Dillavou; Benjamin Beyer; Menachem Stern; Andrea Liu; Marc Miskin; Douglas Durian (2024). Machine learning without a processor: Emergent learning in a nonlinear analog network [Dataset]. http://doi.org/10.5061/dryad.8w9ghx3vx
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.8w9ghx3vx
Dataset updated
Jul 3, 2024
Dataset provided by
Dryad Digital Repository
Authors
Sam Dillavou; Benjamin Beyer; Menachem Stern; Andrea Liu; Marc Miskin; Douglas Durian
Description
The capabilities of digital artificial neural networks grow rapidly with their size, however the time and energy required to train them does as well. The tradeoff is far better for Brains, where the constituent parts (neurons) update their analog connections in ignorance of the actions of other neurons, eschewing centralized processing. Recently introduced analog electronic contrastive local learning networks (CLLNs) share this important decentralized property. However their capabilities were limited because existing implementations are linear. In this dataset we include experimental demonstrations of a nonlinear CLLN, establishing a new paradigm for scalable learning.Â Included here are data and scripts required to generate figures 2-6 of the manuscript titled "Machine learning without a processor: Emergent learning in a nonlinear analog network"., Methods are described in detail the associated manuscript., , # Data for "Machine learning without a processor: Emergent learning in a nonlinear analog network"

https://doi.org/10.5061/dryad.8w9ghx3vx

Data from experiments using a nonlinear Contrastive Local Learning Network (CLLN).

Description of the data and file structure

Figures are generated using numbered MATLAB scripts. Data and helper scripts are contained in separate zip folders and should be uncompressed to allow scripts to run properly: unzip and place in the following file structure, and ensure the MATLAB filepath includes both folders.

>helper_scripts/ {all .m files: Experiment2.m, ExperimentGroup.m, logic_truth_table3.m, makeOrthonormalModes.m, makePlotPrettyNow.m, network_project_superclass2.m}Â

>data/{all .mat files}

Data stored inÂ OscilloscopeData.mat: single array T2 with columns: Time (sec), Input Voltage (V), Output Voltage (V).

Data stored in all other mat files is within an "experiment" object (class in Experiment2...
Data from: Dataset of in-situ and remote sensing measurements of...
data.europa.eu
data.niaid.nih.gov
+1more
unknown
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2023). Dataset of in-situ and remote sensing measurements of precipitation in the inner Antarctic (Dome-C 75°S 123°E) for the years 2014-2021 [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8427614?locale=en
Explore at:
unknown(19278)Available download formats
Dataset updated
Oct 9, 2023
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Dome C
Description
The database contains original data collected between 2014 and 2021 at the Concordia Station (Dome-C, Antarctica, 75°S, 123°E) using three automatic instruments: 1) A flatbed scanner (ICECAMERA) provides information on the shape and size of precipitation on an hourly basis. 2) An automatic depolarization LIDAR provides the height, structure and phase of the cloud that originated the precipitation on a 5-minute basis. The height range for the LIDAR is between 20 and 7000 meters. 3) A microwave radiometer (HAMSTRAD) provides the local temperature at the altitude where precipitation is formed. The combination of three instruments made it possible to 'label' each precipitation grain with its size, shape parameters, temperature, altitude of formation, and surface meteorological data. Each yearly DATA_YYYY.rar data set is organized into daily directories, where all valid LIDAR false color plots, HAMSTRAD data, and ICECAMERA images are collected, along with processed numerical data for all the ice grains collected. HYSPLIT back trajectories with Dome-C as the final point are also included. The dataset's content is explained in the data legend.doc file MATLAB models.rar includes multiple MATLAB Canonical and SVM models that automatically classify the type of cloud-originating precipitation (at Dome C) based on the relative abundance of different ice grain shapes. Details and instructions for their use can be found in the Legend for MATLAB classifiers.docx document.
Hydatid Cyst
kaggle.com
zip
Updated Jan 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taha Mu (2023). Hydatid Cyst [Dataset]. https://www.kaggle.com/datasets/tahamu/hydatid-cyst
Explore at:
zip(88890917 bytes)Available download formats
Dataset updated
Jan 28, 2023
Authors
Taha Mu
Description
This matlab code has been used.

clc,clear all,close all dosya=dir('*.*g'); net{1}=resnet18; net{2}=resnet50; net{3}=resnet101; net{4}=darknet19; net{5}=mobilenetv2; net{6}=darknet53; net{7}=xception(); net{8}=shufflenet(); net{9}=nasnetmobile(); net{10}=nasnetlarge(); net{11}=densenet201(); net{12}= inceptionv3(); net{13} = inceptionresnetv2(); net{14}=googlenet(); net{15}=alexnet(); net{16}=vgg16(); net{17}=vgg19(); net{18}=squeezenet();

for od=1:18 layer{od}=net{od}.Layers(end-2).Name; if od==14 || od==18 layer{18+od}=net{od}.Layers(end-4).Name; elseif (od>=15 && od<=18) layer{18+od}=net{od}.Layers(end-5).Name; elseif od==4 layer{18+od}=net{od}.Layers(end-1).Name; else layer{18+od}=net{od}.Layers(end-3).Name; end inputSizeA = net{od}.Layers(1).InputSize; for k=1:length(dosya) res=imread(dosya(k).name); [ww,hh,ll]=size(res); res=imresize(res,[256,256]); if ll==1 res(:,:,1)=res(:,:,1); res(:,:,2)=res(:,:,1); res(:,:,3)=res(:,:,1); end augimdsTrainA = augmentedImageDatastore(inputSizeA(1:2),res);
fm = activations(net{od},augimdsTrainA,layer{od},'OutputAs','rows'); fm1 = activations(net{od},augimdsTrainA,layer{18+od},'OutputAs','rows'); X(k,:)=[fm fm1]; y(k)=str2num(dosya(k).name(1)); end idx=relieff(X,y',10); for i=1:1000 son{od}(:,i)=X(:,idx(i)); end

template = templateSVM(... 'KernelFunction', 'polynomial', ... 'PolynomialOrder', 3, ...m 'KernelScale', 'auto', ... 'BoxConstraint', 1, ... 'Standardize', true); mdl = fitcecoc(... son{od}, ... y, ... 'Learners', template, ... 'Coding', 'onevsone', ... 'ClassNames', [1:4]'); kk=crossval(mdl,'KFold',10); loss(od) = kfoldLoss(kk); clear X mdl xx idx end

save test_sonuclari.mat son loss
Time Series in Risk Assessment
kaggle.com
zip
Updated Jun 13, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saurabh Shahane (2021). Time Series in Risk Assessment [Dataset]. https://www.kaggle.com/saurabhshahane/time-series-in-risk-assessment
Explore at:
zip(9303646 bytes)Available download formats
Dataset updated
Jun 13, 2021
Authors
Saurabh Shahane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

This dataset is generated via simulation of a probabilistic risk assessment model and a Linear Programming models to evaluate Risk Index and Total supply chain cost using MATLAB simulink. This dataset can be utilized for sequence-to-sequence forecasting and multi-labelled classification problems related to Supply Chain Risk Management (SCRM). This dataset can also be used for evaluating Non-linear Autoregressive models and Deep Neural Networks for supply chain networks. The dataset is published for academic purposes only, with limited usage after careful validation of norms and permissions as stated by the owner.

Acknowledgements

Banerjee, Heerok; Saparia, Grishma; Ganapathy, Velappa; Garg, Priyanshi; Shenbagaraman, V. M. (2019), “Time Series Dataset for Risk Assessment in Supply Chain Networks”, Mendeley Data, V2, doi: 10.17632/gystn6d3r4.2
t
Data from: 3d µct images of specimens of carbon fiber reinforced polyamide 6...
service.tib.eu
radar.kit.edu
+1more
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). 3d µct images of specimens of carbon fiber reinforced polyamide 6 plaque, fiber orientation tensor data of these images, and three python code files for two different algebraic and one machine learning based tensor interpolation algorithms [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1409
Explore at:
Dataset updated
Aug 4, 2023
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Abstract: This dataset includes 3D µCT images of nine different specimen of 10 mm \times 10 mm of a carbon fiber reinforced polyamide 6 plaque produced in the long fiber reinforced thermoplastic direct (LFT-D) process. The position of the specimen in the plaque can be learned from the referenced publication (Blarr et al., Implementation and comparison of algebraic and machine learning based tensor interpolation methods applied to fiber orientation tensor fields obtained from CT images, Computational Materials Science, 2022). After small pre-processing steps, the fiber orientation tensor of each of the image stacks is determined with the help of the structure tensor based implementation of Pinter et al. The code can be found here: https://sourceforge.net/p/composight/code/HEAD/tree/trunk/SiOTo/StructureTensorOrientation/FibreOrientation/StructureTensorOrientationFilter.cxx#l186. Hence, nine .dat-files containing the fiber orientation tensor of second order are also included in this dataset. Most importantly, this dataset contains three different Python codes. The author implemented a different interpolation method in each of those codes; two algebraic and one machine learning based one. The component averaging method is the simplest; the decomposition method is mathematically more difficult. It works with the decomposition of the tensor into shape and orientation and subsequent separate invariant and quaternion weighting, before reassembling the then interpolated tensor. The deep learning based method is the only Jupyter notebook in this dataset, where an ANN is implemented for the same interpolation task. Please consider the reference paper mentioned before for details. For the visualization of the tensor glyphs, a Matlab function by Barmpoutis is used, which can be found here: https://de.mathworks.com/matlabcentral/fileexchange/27462-diffusion-tensor-field-dti-visualization. TechnicalRemarks: In the folder "code" there are three Python scripts. The "component_averaging_method.py" and the "decomposition_method.py" work the same: The script needs an input .txt-file with coordinates and the corresponding fiber orientation tensors (the example used in the publication is given (file "Input_file_FOT.txt")). After running the code you are asked in the console for the name of the output file and for lower and upper x and y limit, which are 1 and 13, respectively, in the given case. The scripts then calculate the fiber orientation tensors at all missing positions with the respective method, which are then written into a MATLAB file (which is named the way you input in the console). This MATLAB file is structured in a way that the fiber orientation tensors can be plotted directly with the tensor glyph visualization function of Barmpoutis ("plotDTI") given in the abstract. The Jupyter Notebook "ANN_method.ipynb" works a bit differently as it is an artificial neural network. There, .csv-files are needed as input data. The components of the tensors are given to the network in separate files and the coordinates of the positions in another separate .csv-file. This is all documented in the paper as well. The output again is a .csv-file that has to be transferred into MATLAB if users want to use the same visualization function. The folder "scans_and_FOT" includes all nine scans and respective fiber orientation tensors used for the publication. The scans are given as .mhd- and .raw-files, the orientation tensors are given in the .dat-files. To generate the fiber orientation tensors from the images, the code by Pinter et al., which is given in the abstract, was used. This C++ code writes out a vector valued image with the orientations per voxel. From this, again with another MATLAB file, which composes the orientation tensor from the vector-valued image, these .dat files can be generated. As this is not the main focus of the publication, and the functionality of the python scripts can be verified with the given orientation tensors, this MATLAB script is not part of this dataset. Please consider the paper or contact the author Juliane Blarr for further questions.
Rescaled CIFAR-10 dataset
zenodo.org
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled CIFAR-10 dataset [Dataset]. http://doi.org/10.5281/zenodo.15188748
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15188748
Dataset updated
Jun 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
Description
Motivation

The goal of introducing the Rescaled CIFAR-10 dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.

The Rescaled CIFAR-10 dataset was introduced in the paper:

[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

with a pre-print available at arXiv:

[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

Importantly, the Rescaled CIFAR-10 dataset contains substantially more natural textures and patterns than the MNIST Large Scale dataset, introduced in:

[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2

and is therefore significantly more challenging.

Access and rights

The Rescaled CIFAR-10 dataset is provided on the condition that you provide proper citation for the original CIFAR-10 dataset:

[4] Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Tech. rep., University of Toronto.

and also for this new rescaled version, using the reference [1] above.

The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

The dataset

The Rescaled CIFAR-10 dataset is generated by rescaling 32×32 RGB images of animals and vehicles from the original CIFAR-10 dataset [4]. The scale variations are up to a factor of 4. In order to have all test images have the same resolution, mirror extension is used to extend the images to size 64x64. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

There are 10 distinct classes in the dataset: “airplane”, “automobile”, “bird”, “cat”, “deer”, “dog”, “frog”, “horse”, “ship” and “truck”. In the dataset, these are represented by integer labels in the range [0, 9].

The dataset is split into 40 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 40 000 samples from the original CIFAR-10 training set. The validation dataset, on the other hand, is formed from the final 10 000 image batch of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original CIFAR-10 test set.

The h5 files containing the dataset

The training dataset file (~5.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

cifar10_with_scale_variations_tr40000_vl10000_te10000_outsize64-64_scte1p000_scte1p000.h5

Additionally, for the Rescaled CIFAR-10 dataset, there are 9 datasets (~1 GB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2^k/4, with k being integers in the range [-4, 4]:

cifar10_with_scale_variations_te10000_outsize64-64_scte0p500.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p595.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p707.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p841.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p000.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p189.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p414.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p682.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte2p000.h5

These dataset files were used for the experiments presented in Figures 9, 10, 15, 16, 20 and 24 in [1].

Instructions for loading the data set

The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

The training dataset can be loaded in Python as:

with h5py.File(`

x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)

We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))

The test datasets can be loaded in Python as:

with h5py.File(`

x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)

The test datasets can be loaded in Matlab as:

x_test = h5read(`

The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.
l
Data set for article: Fusing ToF-SIMS images for spatial-spectral resolution...
opal.latrobe.edu.au
researchdata.edu.au
bin
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wil Gardner (2024). Data set for article: Fusing ToF-SIMS images for spatial-spectral resolution enhancement using a convolutional neural network [Dataset]. http://doi.org/10.26181/20220603.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26181/20220603.v1
Dataset updated
Mar 7, 2024
Dataset provided by
La Trobe
Authors
Wil Gardner
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This data set is uploaded as supporting information for the publication entitled: Fusing ToF-SIMS images for spatial-spectral resolution enhancement using a convolutional neural network

Files are as follows: gold_mesh_data.mat - MATLAB workspace file containing peak-picked ToF-SIMS data (hyperspectral array) for the gold mesh sample. tumor_tissue_data.mat - MATLAB workspace file containing peak-picked ToF-SIMS data (hyperspectral array) for the tumor tissue sample. Additional details about the datasets can be found in the published article.

If you use this data set in your work, please cite our work as follows:

Gardner, W., Winkler, D. A., Maliki, R., Cutts, S. M., Ellis, S., Anderson, R. L., Muir, B. W., Pigram, P. J., Fusing ToF-SIMS Images for Spatial-Spectral Resolution Enhancement using a Convolutional Neural Network. Adv. Mater. Interfaces 2022, 2201464. https://doi.org/10.1002/admi.202201464
Oncotype-DX Breast Cancer Dataset
kaggle.com
zip
Updated Aug 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryad Z (2022). Oncotype-DX Breast Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/rzemouri/oncotypedx-breast-cancer-dataset
Explore at:
zip(148884 bytes)Available download formats
Dataset updated
Aug 28, 2022
Authors
Ryad Z
Description
Oncotype DX (ODX) is a multi-gene expression signature designed for estrogen receptor (ER)-positive and human epidermal growth factor receptor 2 (HER2)-negative breast cancer patients to predict the recurrence score (RS) and chemotherapy (CT) benefit. The aim of our study is to develop a prediction tool for the three RS’s categories based on deep multi-layer perceptrons (DMLP) and using only the morphoimmunohistological variables. We performed a retrospective cohort of 320 patients who underwent ODX testing from three French hospitals. Clinico-pathological characteristics were recorded. We built a supervised machine learning classification model using Matlab software with 152 cases for the training and 168 cases for the testing. Three classifiers were used to learn the three risk categories of the ODX, namely the low, intermediate, and high risk. Experimental results provide the area under the curve (AUC), respectively, for the three risk categories: 0.63 [95% confidence interval: (0.5446, 0.7154), p < 0.001], 0.59 [95% confidence interval: (0.5031, 0.6769), p < 0.001], 0.75 [95% confidence interval: (0.6184, 0.8816), p < 0.001]. Concordance rate between actual RS and predicted RS ranged from 53 to 56% for each class between DMLP and ODX. The concordance rate of low and intermediate combined risk group was 85%.We developed a predictive machine learning model that could help to define patient’s RS. Moreover, we integrated histopathological data and DMLP results to select tumor for ODX testing. Thus, this process allows more relevant use of histopathological data, and optimizes and enhances this information.

Relevant Papers and Citation Request:

Prediction of Oncotype DX recurrence score using deep multi-layer perceptrons in estrogen receptor-positive, HER2-negative breast cancer May 2020 Breast Cancer 27(5) DOI: 10.1007/s12282-020-01100-4

Breast cancer diagnosis based on joint variable selection and Constructive Deep Neural Network, March 2018, Conference: 2018 IEEE 4th Middle East Conference on Biomedical Engineering (MECBME), DOI:10.1109/MECBME.2018.8402426

Constructive Deep Neural Network for Breast Cancer Diagnosis, January 2018, IFAC-PapersOnLine 51(27):98-103, DOI:10.1016/j.ifacol.2018.11.660
Data from: Large-scale statistical learning for mass transport prediction in...
zenodo.org
txt, zip
Updated Nov 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benedikt Prifling; Magnus Röding; Philip Townsend; Matthias Neumann; Volker Schmidt; Benedikt Prifling; Magnus Röding; Philip Townsend; Matthias Neumann; Volker Schmidt (2021). Large-scale statistical learning for mass transport prediction in porous materials using 90,000 artificially generated microstructures [Dataset]. http://doi.org/10.5281/zenodo.4047774
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4047774
Dataset updated
Nov 22, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benedikt Prifling; Magnus Röding; Philip Townsend; Matthias Neumann; Volker Schmidt; Benedikt Prifling; Magnus Röding; Philip Townsend; Matthias Neumann; Volker Schmidt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset and code used in B Prifling, et al, "Large-scale statistical learning for mass transport prediction in porous materials using 90,000 artificially generated microstructures", published in Frontiers in Materials. In this work, we investigate relationships between 3D microstructure and effective diffusivity and permeability, based on a dataset of 90,000 structures and using analytical formulas, artificial neural networks (ANNs), and convolutional neural networks (CNNs). Herein, the codes in Matlab and Python/Tensorflow necessary to investigate the prediction models and reproduce the results of the paper are supplied. Also, microstructures together with their computed geometrical descriptors and effective properties are included.

Timing of dataset loading
Amount of datasets	Final in-RAM memory (GiB)	Loading time single var (M:SS)
1	10.3	0:09
5	43.9	1:00
10	63.2	2:01
16	98.0	3:25
17	Out Of Memory	x:xx

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohammad Edalatifar (2020). A dataset for conduction heat transer and deep learning [Dataset]. http://doi.org/10.17632/rw9yk3c559.1

A dataset for conduction heat transer and deep learning

Explore at:

Unique identifier

https://doi.org/10.17632/rw9yk3c559.1

Dataset updated

Jun 25, 2020

Authors

Mohammad Edalatifar

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Big data images for conduction heat transfer The related paper has been published here: M. Edalatifar, M.B. Tavakoli, M. Ghalambaz, F. Setoudeh, Using deep learning to learn physics of conduction heat transfer, Journal of Thermal Analysis and Calorimetry; 2020. https://doi.org/10.1007/s10973-020-09875-6 Steps to reproduce: The dataset is saved in two format, .npz for python and .mat for matlab. *.mat has large size, then it is compressed with winzip. ReadDataset_Python.py and ReadDataset_Matlab.m are examples of read data using python and matlab respectively. For use dataset in matlab download Dataset/HeatTransferPhenomena_35_58.zip, unzip it and then use ReadDataset_Matlab.m as an example. In case of python, download Dataset/HeatTransferPhenomena_35_58.npz and run ReadDataset_Python.py.

Clear search

Close search

Google apps

Main menu

A dataset for conduction heat transer and deep learning

Physical properties of natural coarse aggregate.

Dataset for Crack Detection in Images of Bricks and Masonry Using CNNs

Ultimate_Analysis

The codes and dataset for paper Estimation of Global Horizontal Irradiance...

Phase unwrapping using Deep Learning in Holographic Tomography - dataset.

QLKNN11D training set

Data for: Automated detection of sigmatism using deep learning applied to...

Dataset for Air-writing recognition based on Ultra-wide band Radar Sensor

Microgrid Optimal Energy Scheduling with Battery Degradation

Rescaled Fashion-MNIST dataset

Motivation

Access and rights

The dataset

The h5 files containing the dataset

Instructions for loading the data set

Data from: Machine learning without a processor: Emergent learning in a...

Description of the data and file structure

Data from: Dataset of in-situ and remote sensing measurements of...

Hydatid Cyst

This matlab code has been used.

Time Series in Risk Assessment

Context

Acknowledgements

Data from: 3d µct images of specimens of carbon fiber reinforced polyamide 6...

Rescaled CIFAR-10 dataset

Motivation

Access and rights

The dataset

The h5 files containing the dataset

Instructions for loading the data set

Data set for article: Fusing ToF-SIMS images for spatial-spectral resolution...

Oncotype-DX Breast Cancer Dataset

Data from: Large-scale statistical learning for mass transport prediction in...

A dataset for conduction heat transer and deep learningSee More Versions

A dataset for conduction heat transer and deep learning