56 datasets found

T
mnist
tensorflow.org
universe.roboflow.com
+3more
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist
Explore at:
Dataset updated
Jun 1, 2024
Description
The MNIST database of handwritten digits.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('mnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">
T
placesfull
tensorflow.org
Updated Dec 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). placesfull [Dataset]. https://www.tensorflow.org/datasets/catalog/placesfull
Explore at:
Dataset updated
Dec 16, 2022
Description
The Places dataset is designed following principles of human visual cognition. Our goal is to build a core of visual knowledge that can be used to train artificial systems for high-level visual understanding tasks, such as scene context, object recognition, action and event prediction, and theory-of-mind inference.

The semantic categories of Places are defined by their function: the labels represent the entry-level of an environment. To illustrate, the dataset has different categories of bedrooms, or streets, etc, as one does not act the same way, and does not make the same predictions of what can happen next, in a home bedroom, an hotel bedroom or a nursery. In total, Places contains more than 10 million images comprising 400+ unique scene categories. The dataset features 5000 to 30,000 training images per class, consistent with real-world frequencies of occurrence. Using convolutional neural networks (CNN), Places dataset allows learning of deep scene features for various scene recognition tasks, with the goal to establish new state-of-the-art performances on scene-centric benchmarks.

Here we provide the Places Database and the trained CNNs for academic research and education purposes.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('placesfull', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/placesfull-1.0.0.png" alt="Visualization" width="500px">
code and data for A new method applied for the determination of relative...
zenodo.org
Updated Oct 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xiong zhao; xiong zhao; caijun Xu; caijun Xu (2022). code and data for A new method applied for the determination of relative weight ratios under the TensorFlow platform when estimating coseismic slip distribution [Dataset]. http://doi.org/10.5281/zenodo.7207507
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7207507
Dataset updated
Oct 17, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
xiong zhao; xiong zhao; caijun Xu; caijun Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The zip file contains three folders:"Data for Illapel earthquake", "HVCE and ABIC method implement on matlab" and "GDED method implement on tensorflow". Take the simulation experiments 1.2 and actual Illapel earthquakes as examples. the code for GDED method are placed on "GDED method implement on tensorflow" folders, and the code for the ABIC method and the HVCE method are place on "HVCE and ABIC method implement on matlab" folders.

In the file"HVCE and ABIC method implement on matlab", the meaning of each code are represent as following

ABIC_SIM.m:the slip distribution inversion results with the relative weight ratios determined by ABIC method Of simulation experiments and Illapel earthquakes

HVCE.m:the slip distribution inversion results with the relative weight ratios determined by HVCE method Of simulation experiments and Illapel earthquakes

GDED.m:the slip distribution inversion results with the relative weight ratios determined by GDED method Of simulation experiments and Illapel earthquakes(the relative weight ratios are from the "GDED method implement on tensorflow")

savedata.m: that code are used for save matrix or data for the GDED method implement on tensorflow

In the file"GDED method implement on tensorflow", the meaning of each code are represent as following
joint_inver_tensor_ex_1.0(1.1).py: the code for determining the relative weight ratios by the GDED method with(without) plot figures, which implement on Tensorflow platform

the InSAR data and GPS data of Illapel earthquakes are palce on the folder" Data for Illapel earthquake/GPS_ori.txt and InSAR_ori.txt"
T
cardiotox
tensorflow.org
Updated Dec 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). cardiotox [Dataset]. https://www.tensorflow.org/datasets/catalog/cardiotox
Explore at:
Dataset updated
Dec 1, 2021
Description
Drug Cardiotoxicity dataset [1-2] is a molecule classification task to detect cardiotoxicity caused by binding hERG target, a protein associated with heart beat rhythm. The data covers over 9000 molecules with hERG activity.

Note:

The data is split into four splits: train, test-iid, test-ood1, test-ood2.

Each molecule in the dataset has 2D graph annotations which is designed to facilitate graph neural network modeling. Nodes are the atoms of the molecule and edges are the bonds. Each atom is represented as a vector encoding basic atom information such as atom type. Similar logic applies to bonds.

We include Tanimoto fingerprint distance (to training data) for each molecule in the test sets to facilitate research on distributional shift in graph domain.

For each example, the features include: atoms: a 2D tensor with shape (60, 27) storing node features. Molecules with less than 60 atoms are padded with zeros. Each atom has 27 atom features. pairs: a 3D tensor with shape (60, 60, 12) storing edge features. Each edge has 12 edge features. atom_mask: a 1D tensor with shape (60, ) storing node masks. 1 indicates the corresponding atom is real, othewise a padded one. pair_mask: a 2D tensor with shape (60, 60) storing edge masks. 1 indicates the corresponding edge is real, othewise a padded one. active: a one-hot vector indicating if the molecule is toxic or not. [0, 1] indicates it's toxic, otherwise [1, 0] non-toxic.

References

[1]: V. B. Siramshetty et al. Critical Assessment of Artificial Intelligence Methods for Prediction of hERG Channel Inhibition in the Big Data Era. JCIM, 2020. https://pubs.acs.org/doi/10.1021/acs.jcim.0c00884

[2]: K. Han et al. Reliable Graph Neural Networks for Drug Discovery Under Distributional Shift. NeurIPS DistShift Workshop 2021. https://arxiv.org/abs/2111.12951

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('cardiotox', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
Data for Yolo v3 kernel
kaggle.com
Updated Feb 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
heartkilla (2019). Data for Yolo v3 kernel [Dataset]. https://www.kaggle.com/aruchomu/data-for-yolo-v3-kernel/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
heartkilla
Description
Context

Data for my Yolo v3 Object Detection in Tensorflow kernel.

Content

Contains sample images, fonts, class names and weights.

Acknowledgements

YOLO: Real-Time Object Detection
T
imagenet2012_real
tensorflow.org
Updated Jun 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). imagenet2012_real [Dataset]. https://www.tensorflow.org/datasets/catalog/imagenet2012_real
Explore at:
Dataset updated
Jun 1, 2024
Description
This dataset contains ILSVRC-2012 (ImageNet) validation images augmented with a new set of "Re-Assessed" (ReaL) labels from the "Are we done with ImageNet" paper, see https://arxiv.org/abs/2006.07159. These labels are collected using the enhanced protocol, resulting in multi-label and more accurate annotations.

Important note: about 3500 examples contain no label, these should be excluded from the averaging when computing the accuracy. One possible way of doing this is with the following NumPy code:

is_correct = [pred in real_labels[i] for i, pred in enumerate(predictions) if real_labels[i]] real_accuracy = np.mean(is_correct)

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('imagenet2012_real', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/imagenet2012_real-1.0.0.png" alt="Visualization" width="500px">
f
Data from: OpenColab project: OpenSim in Google colaboratory to explore...
tandf.figshare.com
docx
Updated Jul 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hossein Mokhtarzadeh; Fangwei Jiang; Shengzhe Zhao; Fatemeh Malekipour (2023). OpenColab project: OpenSim in Google colaboratory to explore biomechanics on the web [Dataset]. http://doi.org/10.6084/m9.figshare.20440340.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20440340.v1
Dataset updated
Jul 6, 2023
Dataset provided by
Taylor & Francis
Authors
Hossein Mokhtarzadeh; Fangwei Jiang; Shengzhe Zhao; Fatemeh Malekipour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OpenSim is an open-source biomechanical package with a variety of applications. It is available for many users with bindings in MATLAB, Python, and Java via its application programming interfaces (APIs). Although the developers described well the OpenSim installation on different operating systems (Windows, Mac, and Linux), it is time-consuming and complex since each operating system requires a different configuration. This project aims to demystify the development of neuro-musculoskeletal modeling in OpenSim with zero configuration on any operating system for installation (thus cross-platform), easy to share models while accessing free graphical processing units (GPUs) on a web-based platform of Google Colab. To achieve this, OpenColab was developed where OpenSim source code was used to build a Conda package that can be installed on the Google Colab with only one block of code in less than 7 min. To use OpenColab, one requires a connection to the internet and a Gmail account. Moreover, OpenColab accesses vast libraries of machine learning methods available within free Google products, e.g. TensorFlow. Next, we performed an inverse problem in biomechanics and compared OpenColab results with OpenSim graphical user interface (GUI) for validation. The outcomes of OpenColab and GUI matched well (r≥0.82). OpenColab takes advantage of the zero-configuration of cloud-based platforms, accesses GPUs, and enables users to share and reproduce modeling approaches for further validation, innovative online training, and research applications. Step-by-step installation processes and examples are available at: https://simtk.org/projects/opencolab.
swin_transformer_tf
kaggle.com
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HyeongChan Kim (2025). swin_transformer_tf [Dataset]. https://www.kaggle.com/kozistr/swin-transformer-tf/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
HyeongChan Kim
Description
Swin Transformer (Tensorflow)

Tensorflow reimplementation of Swin Transformer model.

Based on Official Pytorch implementation. https://user-images.githubusercontent.com/24825165/121768619-038e6d80-cb9a-11eb-8cb7-daa827e7772b.png" alt="image">

Requirements

tensorflow >= 2.4.1

Pretrained Swin Transformer Checkpoints

ImageNet-1K and ImageNet-22K Pretrained Checkpoints
| name | pretrain | resolution |acc@1 | #params | model | | :---: | :---: | :---: | :---: | :---: | :---: | |swin_tiny_224 |ImageNet-1K |224x224|81.2|28M|github| |swin_small_224|ImageNet-1K |224x224|83.2|50M|github| |swin_base_224 |ImageNet-22K|224x224|85.2|88M|github| |swin_base_384 |ImageNet-22K|384x384|86.4|88M|github| |swin_large_224|ImageNet-22K|224x224|86.3|197M|github| |swin_large_384|ImageNet-22K|384x384|87.3|197M|github|

Examples

Initializing the model: ```python from swintransformer import SwinTransformer

model = SwinTransformer('swin_tiny_224', num_classes=1000, include_top=True, pretrained=False) You can use a pretrained model like this:python import tensorflow as tf from swintransformer import SwinTransformer

model = tf.keras.Sequential([ tf.keras.layers.Lambda(lambda data: tf.keras.applications.imagenet_utils.preprocess_input(tf.cast(data, tf.float32), mode="torch"), input_shape=[*IMAGE_SIZE, 3]), SwinTransformer('swin_tiny_224', include_top=False, pretrained=True), tf.keras.layers.Dense(NUM_CLASSES, activation='softmax') ]) If you use a pretrained model with TPU on kaggle, specify `use_tpu` option:python import tensorflow as tf from swintransformer import SwinTransformer

model = tf.keras.Sequential([ tf.keras.layers.Lambda(lambda data: tf.keras.applications.imagenet_utils.preprocess_input(tf.cast(data, tf.float32), mode="torch"), input_shape=[*IMAGE_SIZE, 3]), SwinTransformer('swin_tiny_224', include_top=False, pretrained=True, use_tpu=True), tf.keras.layers.Dense(NUM_CLASSES, activation='softmax') ]) ``` Example: TPU training on Kaggle

Citation

@article{liu2021Swin, title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows}, author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining}, journal={arXiv preprint arXiv:2103.14030}, year={2021} }
d
Data from: Machine learning can be as good as maximum likelihood when...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikita Kulikov; Christoph Mayer (2023). Machine learning can be as good as maximum likelihood when reconstructing phylogenetic trees and determining the best evolutionary model on four taxon alignments [Dataset]. http://doi.org/10.5061/dryad.ksn02v783
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.ksn02v783
Dataset updated
Nov 29, 2023
Dataset provided by
Dryad Digital Repository
Authors
Nikita Kulikov; Christoph Mayer
Time period covered
Jan 1, 2023
Description
Machine learning can be as good as maximum likelihood when reconstructing phylogenetic topologies and determining the best evolutionary model on four taxon alignments. Phylogenetic tree reconstruction with molecular data is important in many fields of life science research. The gold standard in this discipline is the Maximum Likelihood tree reconstruction method. Here we show that for quartet trees, Machine Learning using neural networks can be as good as the Maximum Likelihood method to infer the best tree topology and the best model of sequence evolution for nucleotide as well as amino acid sequences. For this purpose we simulated data sets for a wide range of branch lengths, evolutionary models and model parameters and compared the topologies and inferred models obtained with Machine learning with those obtained with the Maximum Likelihood and the Neighbour Joining method. Our results show that neural networks are a promising avenue for determining relatedness between taxa, which is ..., This archive is part of the DeepNNPhylogeny project: DeepNNPhylogeny, for which the code of the software is available on GitHub. It contains pre-trained neural networks to predict (a) the best models of sequence evolution and (b) the best quartet tree topologies for alignments of four nucleotide or amino acid sequences. For each use case, six neural networks with different architectures have been trained and saved for further usage with the Python library TensorFlow. Neural networks have been saved with the tf.keras.Model.save function in the so-called Tensorflow SavedModel format. All neural networks have been trained with a large number of alignments simulated with the software PolyMoSim v1.1.4, which is available on GitHub. For each simulated data set, model parameters (including proportion of invariant sites, shape parameter of gamma distribution for site heterogeneity, transition/transversion ratio - if applicable, nucleotide base frequencies - if applicable, relative substitution ..., In this project, neural networks have been trained to: - predict/classify the correct topology for four nucleotide or amino acid sequences that evolved on a quartet tree. - predict the best model of sequence evolution for four nucleotide or amino acid sequences that evolved on a quartet tree. Together with the software in theÂ DeepNNPhylogeny project, the pre-trained neural networks can be used to predict the best model of sequence evolution for the model and topology classification tasks. The GitHub repository DeepNNPhylogeny contains the software with which: a) the neural networks presented here have been trained and with which new neural networks can be trained, b) predictions can be made using the pre-trained neural networks available in this archive. They can predict with an accuracy close or identical to the Maximum likelihood method the best evolutionary model and best topology for alignments of four nucleotide or amino acid sequences. The neural networks stored in this repository...
T
symmetric_solids
tensorflow.org
opendatalab.com
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). symmetric_solids [Dataset]. https://www.tensorflow.org/datasets/catalog/symmetric_solids
Explore at:
Dataset updated
Nov 23, 2022
Description
This is a pose estimation dataset, consisting of symmetric 3D shapes where multiple orientations are visually indistinguishable. The challenge is to predict all equivalent orientations when only one orientation is paired with each image during training (as is the scenario for most pose estimation datasets). In contrast to most pose estimation datasets, the full set of equivalent orientations is available for evaluation.

There are eight shapes total, each rendered from 50,000 viewpoints distributed uniformly at random over the full space of 3D rotations. Five of the shapes are featureless -- tetrahedron, cube, icosahedron, cone, and cylinder. Of those, the three Platonic solids (tetrahedron, cube, icosahedron) are annotated with their 12-, 24-, and 60-fold discrete symmetries, respectively. The cone and cylinder are annotated with their continuous symmetries discretized at 1 degree intervals. These symmetries are provided for evaluation; the intended supervision is only a single rotation with each image.

The remaining three shapes are marked with a distinguishing feature. There is a tetrahedron with one red-colored face, a cylinder with an off-center dot, and a sphere with an X capped by a dot. Whether or not the distinguishing feature is visible, the space of possible orientations is reduced. We do not provide the set of equivalent rotations for these shapes.

Each example contains of

the 224x224 RGB image

a shape index so that the dataset may be filtered by shape.
The indices correspond to:

0 = tetrahedron

1 = cube

2 = icosahedron

3 = cone

4 = cylinder

5 = marked tetrahedron

6 = marked cylinder

7 = marked sphere

the rotation used in the rendering process, represented as a 3x3 rotation matrix

the set of known equivalent rotations under symmetry, for evaluation.

In the case of the three marked shapes, this is only the rendering rotation.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('symmetric_solids', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/symmetric_solids-1.0.0.png" alt="Visualization" width="500px">
n
Nuclei_Segmentation_Experiments_Demo_By_DIMAN
narcis.nl
data.mendeley.com
Updated Mar 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xie, L (via Mendeley Data) (2019). Nuclei_Segmentation_Experiments_Demo_By_DIMAN [Dataset]. http://doi.org/10.17632/k9hjr45jry.1
Explore at:
Unique identifier
https://doi.org/10.17632/k9hjr45jry.1
Dataset updated
Mar 9, 2019
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Xie, L (via Mendeley Data)
Description
We present a novel and efficient computing framework for segmenting the overlapping nuclei by combining Marker-controlled Watershed with our proposed convolutional neural network (DIMAN). We implemented our method based on the open-source machine learning framework TensorFlow and reinforcement learning library TensorLayer.This repository contains all code used in our experiments, incuding the data preparation, model construction, model training and result evaluation. For comparison with our method, we also utilized TensorFlow and TensorLayer to reimplement four known semantic segmentation convolutional neural networks: FCN8s, U-Net, HED and SharpMask. Beside this, we also compare our method with four published state-of-art methods.
I
Dataset for: "A Dual-Frequency Radar Retrieval of Snowfall Properties Using...
aws-databank-alb.library.illinois.edu
databank.illinois.edu
Updated Nov 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Randy Chase (2020). Dataset for: "A Dual-Frequency Radar Retrieval of Snowfall Properties Using a Neural Network" [Dataset]. http://doi.org/10.13012/B2IDB-0791318_V2
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-0791318_V2
Dataset updated
Nov 18, 2020
Authors
Randy Chase
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
U.S. National Aeronautics and Space Administration (NASA)
Description
This is the dataset that accompanies the paper titled "A Dual-Frequency Radar Retrieval of Snowfall Properties Using a Neural Network", submitted for peer review in August 2020. Please see the github for the most up-to-date data after the revision process: https://github.com/dopplerchase/Chase_et_al_2021_NN Authors: Randy J. Chase, Stephen W. Nesbitt and Greg M. McFarquhar Corresponding author: Randy J. Chase (randyjc2@illinois.edu) Here we have the data used in the manuscript. Please email me if you have specific questions about units etc. 1) DDA/GMM database of scattering properties: base_df_DDA.csv This is the combined dataset from the following papers: Leinonen & Moisseev, 2015; Leinonen & Szyrmer, 2015; Lu et al., 2016; Kuo et al., 2016; Eriksson et al., 2018. The column names are D: Maximum dimension in meters, M: particle mass in grams kg, sigma_ku: backscatter cross-section at ku in m^2, sigma_ka: backscatter cross-section at ka in m^2, sigma_w: backscatter cross-section at w in m^2. The first column is just an index column. 2) Synthetic Data used to train and test the neural network: Unrimed_simulation_wholespecturm_train_V2.nc, Unrimed_simulation_wholespecturm_test_V2.nc This was the result of combining the PSDs and DDA/GMM particles randomly to build the training and test dataset. 3) Notebook for training the network using the synthetic database and Google Colab (tensorflow): Train_Neural_Network_Chase2020.ipynb This is the notebook used to train the neural network. 4)Trained tensorflow neural network: NN_6by8.h5 This is the hdf5 tensorflow model that resulted from the training. You will need this to run the retrieval. 5) Scalers needed to apply the neural network: scaler_X_V2.pkl, scaler_y_V2.pkl These are the sklearn scalers used in training the neural network. You will need these to scale your data if you wish to run the retrieval. 6) New in this version - Example notebook of how to run the trained neural network on Ku- Ka- band observations. We showed this with the 3rd case in the paper: Run_Chase2021_NN.ipynb 7) New in this version - APR data used to show how to run the neural network retrieval: Chase_2021_NN_APR03Dec2015.nc The data for the analysis on the observations are not provided here because of the size of the radar data. Please see the GHRC website (https://ghrc.nsstc.nasa.gov/home/) if you wish to download the radar and in-situ data or contact me. We can coordinate transferring the exact datafiles used. The GPM-DPR data are avail. here: http://dx.doi.org/10.5067/GPM/DPR/GPM/2A/05
o
Graph topological features extracted from expression profiles of...
explore.openaire.eu
Updated Aug 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Léon-Charles Tranchevent; Francisco Azuaje; Jagath C Rajapakse (2019). Graph topological features extracted from expression profiles of neuroblastoma patients [Dataset]. http://doi.org/10.5281/zenodo.3357673
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3357673
Dataset updated
Aug 7, 2019
Authors
Léon-Charles Tranchevent; Francisco Azuaje; Jagath C Rajapakse
Description
Introduction This dataset contains the data described in the paper titled "A deep neural network approach to predicting clinical outcomes of neuroblastoma patients." by Tranchevent, Azuaje and Rajapakse. More precisely, this dataset contains the topological features extracted from graphs built from publicly available expression data (see details below). This dataset does not contain the original expression data, which are available elsewhere. We thank the scientists who did generate and share these data (please see below the relevant links and publications). Content File names start with the name of the publicly available dataset they are built on (among "Fischer", "Maris" and "Versteeg"). This name is followed by a tag representing whether they contain raw data ("raw", which means, in this case, the raw topological features) or TF formatted data ("TF", which stands for TensorFlow). This tag is then followed by a unique identifier representing a unique configuration. The configuration file "Global_configuration.tsv" contains details about these configurations such as which topological features are present and which clinical outcome is considered. The code associated to the same manuscript that uses these data is at https://gitlab.com/biomodlih/SingalunDeep. The procedure by which the raw data are transformed into the TensorFlow ready data is described in the paper. File format All files are TSV files that correspond to matrices with samples as rows and features as columns (or clinical data as columns for clinical data files). The data files contain various sets of topological features that were extracted from the sample graphs (or Patient Similarity Networks - PSN). The clinical files contain relevant clinical outcomes. The raw data files only contain the topological data. For instance, the file "Fischer_raw_2d0000_data_tsv" contains 24 values for each sample corresponding to the 12 centralities computed for both the microarray (Fischer-M) and RNA-seq (Fischer-R) datasets. The TensorFlow ready files do not contain the sample identifiers in the first column. However, they contain two extra columns at the end. The first extra column is the sample weights (for the classifiers and because we very often have a dominant class). The second extra column is the class labels (binary), based on the clinical outcome of interest. Dataset details The Fischer dataset is used to train, evaluate and validate the models, so the dataset is split into train / eval / valid files, which contains respectively 249, 125 and 124 rows (samples) of the original 498 samples. In contrast, the other two datasets (Maris and Versteeg) are smaller and are only used for validation (and therefore have no training or evaluation file). The Fischer dataset also has more data files because various configurations were tested (see manuscript). In contrast, the validation, using the Maris and Versteeg datasets is only done for a single configuration and there are therefore less files. For Fischer, a few configurations are listed in the global configuration file but there is no corresponding raw data. This is because these items are derived from concatenations of the original raw data (see global configuration file and manuscript for details). References This dataset is associated with Tranchevent L., Azuaje F.. Rajapakse J.C., A deep neural network approach to predicting clinical outcomes of neuroblastoma patients. If you use these data in your research, please do not forget to also cite the researchers who have generated the original expression datasets. Fischer dataset: Zhang W. et al., Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biology 16(1) (2015). doi:10.1186/s13059-015-0694-1 Wang C. et al., The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat. Biotechnol. 32(9), 926–932. doi:10.1038/nbt.3001 Versteeg dataset: Molenaar J.J. et al., Sequencing of neuroblastoma identifies chromothripsis and defects in neuritogenesis genes. Nature 483(7391), 589–593. doi:10.1038/nature10910 Maris dataset: Wang Q. et al., Integrative genomics identifies distinct molecular classes of neuroblastoma and shows that multiple genes are targeted by regional alterations in DNA copy number. Cancer Res. 66(12), 6050–6062. doi:10.1158/0008-5472.CAN-05-4618 Project supported by the Fonds National de la Recherche (FNR), Luxembourg (SINGALUN project). This research was also partially supported by Tier-2 grant MOE2016-T2-1-029 by the Ministry of Education, Singapore.
f
Evaluation criteria for TFDF model.
plos.figshare.com
xls
Updated Sep 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Mahmudul Hasan; Md. Jahid Hasan; Parisha Binte Rahman (2024). Evaluation criteria for TFDF model. [Dataset]. http://doi.org/10.1371/journal.pone.0310446.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310446.t002
Dataset updated
Sep 19, 2024
Dataset provided by
PLOS ONE
Authors
Md. Mahmudul Hasan; Md. Jahid Hasan; Parisha Binte Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Forecasting the weather in an area characterized by erratic weather patterns and unpredictable climate change is a challenging endeavour. The weather is classified as a non-linear system since it is influenced by various factors that contribute to climate change, such as humidity, average temperature, sea level pressure, and rainfall. A reliable forecasting system is crucial in several industries, including transportation, agriculture, tourism, & development. This study showcases the effectiveness of data mining, meteorological analysis, and machine learning techniques such as RNN-LSTM, TensorFlow Decision Forest (TFDF), and model stacking (including ElasticNet, GradientBoost, KRR, and Lasso) in improving the precision and dependability of weather forecasting. The stacking model strategy entails aggregating multiple base models into a meta-model to address issues of overfitting and underfitting, hence improving the accuracy of the prediction model. To carry out the study, a comprehensive 60-year meteorological record from Bangladesh was gathered, encompassing data on rainfall, humidity, average temperature, and sea level pressure. The results of this study suggest that the stacking average model outperforms the TFDF and RNN-LSTM models in predicting average temperature. The stacking average model achieves an RMSLE of 1.3002, which is a 10.906% improvement compared to the TFDF model. It is worth noting that the TFDF model had previously outperformed the RNN-LSTM model. The performance of the individual stacking model is not as impressive as that of the average model, with the validation results being better in TFDF.
f
Data Sheet 1_End-to-end 3D instance segmentation of synthetic data and...
figshare.com
pdf
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel David; Emmanuel Faure (2025). Data Sheet 1_End-to-end 3D instance segmentation of synthetic data and embryo microscopy images with a 3D Mask R-CNN.pdf [Dataset]. http://doi.org/10.3389/fbinf.2024.1497539.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fbinf.2024.1497539.s001
Dataset updated
Jan 29, 2025
Dataset provided by
Frontiers
Authors
Gabriel David; Emmanuel Faure
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent years, the exploitation of three-dimensional (3D) data in deep learning has gained momentum despite its inherent challenges. The necessity of 3D approaches arises from the limitations of two-dimensional (2D) techniques when applied to 3D data due to the lack of global context. A critical task in medical and microscopy 3D image analysis is instance segmentation, which is inherently complex due to the need for accurately identifying and segmenting multiple object instances in an image. Here, we introduce a 3D adaptation of the Mask R-CNN, a powerful end-to-end network designed for instance segmentation. Our implementation adapts a widely used 2D TensorFlow Mask R-CNN by developing custom TensorFlow operations for 3D Non-Max Suppression and 3D Crop And Resize, facilitating efficient training and inference on 3D data. We validate our 3D Mask R-CNN on two experiences. The first experience uses a controlled environment of synthetic data with instances exhibiting a wide range of anisotropy and noise. Our model achieves good results while illustrating the limit of the 3D Mask R-CNN for the noisiest objects. Second, applying it to real-world data involving cell instance segmentation during the morphogenesis of the ascidian embryo Phallusia mammillata, we show that our 3D Mask R-CNN outperforms the state-of-the-art method, achieving high recall and precision scores. The model preserves cell connectivity, which is crucial for applications in quantitative study. Our implementation is open source, ensuring reproducibility and facilitating further research in 3D deep learning.
f
Results for RNN-LSTM layer architecture.
plos.figshare.com
xls
Updated Sep 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Mahmudul Hasan; Md. Jahid Hasan; Parisha Binte Rahman (2024). Results for RNN-LSTM layer architecture. [Dataset]. http://doi.org/10.1371/journal.pone.0310446.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0310446.t001
Dataset updated
Sep 19, 2024
Dataset provided by
PLOS ONE
Authors
Md. Mahmudul Hasan; Md. Jahid Hasan; Parisha Binte Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Forecasting the weather in an area characterized by erratic weather patterns and unpredictable climate change is a challenging endeavour. The weather is classified as a non-linear system since it is influenced by various factors that contribute to climate change, such as humidity, average temperature, sea level pressure, and rainfall. A reliable forecasting system is crucial in several industries, including transportation, agriculture, tourism, & development. This study showcases the effectiveness of data mining, meteorological analysis, and machine learning techniques such as RNN-LSTM, TensorFlow Decision Forest (TFDF), and model stacking (including ElasticNet, GradientBoost, KRR, and Lasso) in improving the precision and dependability of weather forecasting. The stacking model strategy entails aggregating multiple base models into a meta-model to address issues of overfitting and underfitting, hence improving the accuracy of the prediction model. To carry out the study, a comprehensive 60-year meteorological record from Bangladesh was gathered, encompassing data on rainfall, humidity, average temperature, and sea level pressure. The results of this study suggest that the stacking average model outperforms the TFDF and RNN-LSTM models in predicting average temperature. The stacking average model achieves an RMSLE of 1.3002, which is a 10.906% improvement compared to the TFDF model. It is worth noting that the TFDF model had previously outperformed the RNN-LSTM model. The performance of the individual stacking model is not as impressive as that of the average model, with the validation results being better in TFDF.
d
Data for: Advances and critical assessment of machine learning techniques...
search.dataone.org
datadryad.org
+1more
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Bucinsky; MariÃ¡n Gall; JÃ¡n MatÃºÅ¡ka; Michal PitoÅˆÃ¡k; Marek Å teklÃ¡Ä (2023). Data for: Advances and critical assessment of machine learning techniques for prediction of docking scores [Dataset]. http://doi.org/10.5061/dryad.zgmsbccg7
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.zgmsbccg7
Dataset updated
Nov 29, 2023
Dataset provided by
Dryad Digital Repository
Authors
Lukas Bucinsky; MariÃ¡n Gall; JÃ¡n MatÃºÅ¡ka; Michal PitoÅˆÃ¡k; Marek Å teklÃ¡Ä
Time period covered
Mar 3, 2023
Description
Semi-flexible docking was performed using AutoDock Vina 1.2.2 software on the SARS-CoV-2 main protease Mpro (PDB ID: 6WQF). Two data sets are provided in the xyz format containing the AutoDock Vina docking scores. These files were used as input and/or reference in the machine learning models using TensorFlow, XGBoost, and SchNetPack to study their docking scores prediction capability. The first data set originally contained 60,411 in-vivo labeled compounds selected for the training of ML models. The second data set,denoted as in-vitro-only, originally contained 175,696 compounds active or assumed to be active at 10 Î¼M or less in a direct binding assay. These sets were downloaded on the 10th of December 2021 from the ZINC15 database. Four compounds in the in-vivo set and 12 in the in-vitro-only set were left out of consideration due to presence of Si atoms. Compounds with no charges assigned in mol2 files were excluded as well (523 compounds in the in-vivo and 1,666 in the in-vitro-only..., Molecular docking calculations and the machine learning approaches are described in the Computational details section of [1]. Reference[1] Lukas Bucinsky, MariÃ¡n Gall, JÃ¡n MatÃºÅ¡ka, Michal PitoÅˆÃ¡k, Marek Å teklÃ¡Ä . Advances and critical assessment of machine learning techniques for prediction of docking scores. Int. J. Quantum. Chem. (2023) DOI: 10.1002/qua.27110., ,
Z
Dataset for "Enhancing Cloud Detection in Sentinel-2 Imagery: A...
data.niaid.nih.gov
zenodo.org
Updated Feb 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yin Ranyu (2024). Dataset for "Enhancing Cloud Detection in Sentinel-2 Imagery: A Spatial-Temporal Approach and Dataset" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8419699
Explore at:
Dataset updated
Feb 4, 2024
Dataset provided by
Wang Guizhou
Yin Ranyu
Long Tengfei
Jiao Weili
He Guojin
Gong Chengjuan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is built for time-series Sentinel-2 cloud detection and stored in Tensorflow TFRecord (refer to https://www.tensorflow.org/tutorials/load_data/tfrecord).

Each file is compressed in 7z format and can be decompressed using Bandzip or 7-zip software.

Dataset Structure:

Each filename can be split into three parts using underscores. The first part indicates whether it is designated for training or validation ('train' or 'val'); the second part indicates the Sentinel-2 tile name, and the last part indicates the number of samples in this file.

For each sample, it includes:

Sample ID;

Array of time series 4 band image patches in 10m resolution, shaped as (n_timestamps, 4, 42, 42);

Label list indicating cloud cover status for the center (6\times6) pixels of each timestamp;

Ordinal list for each timestamp;

Sample weight list (reserved);

Here is a demonstration function for parsing the TFRecord file:

import tensorflow as tf

init Tensorflow Dataset from file name

def parseRecordDirect(fname): sep = '/' parts = tf.strings.split(fname,sep) tn = tf.strings.split(parts[-1],sep='_')[-2] nn = tf.strings.to_number(tf.strings.split(parts[-1],sep='_')[-1],tf.dtypes.int64) t = tf.data.Dataset.from_tensors(tn).repeat().take(nn) t1 = tf.data.TFRecordDataset(fname) ds = tf.data.Dataset.zip((t, t1)) return ds

keys_to_features_direct = { 'localid': tf.io.FixedLenFeature([], tf.int64, -1), 'image_raw_ldseries': tf.io.FixedLenFeature((), tf.string, ''), 'labels': tf.io.FixedLenFeature((), tf.string, ''), 'dates': tf.io.FixedLenFeature((), tf.string, ''), 'weights': tf.io.FixedLenFeature((), tf.string, '') }

The Decoder (Optional)

class SeriesClassificationDirectDecorder(decoder.Decoder): """A tf.Example decoder for tfds classification datasets.""" def init(self) -> None: super()._init_()

def decode(self, tid, ds): parsed = tf.io.parse_single_example(ds, keys_to_features_direct) encoded = parsed['image_raw_ldseries'] labels_encoded = parsed['labels'] decoded = tf.io.decode_raw(encoded, tf.uint16) label = tf.io.decode_raw(labels_encoded, tf.int8) dates = tf.io.decode_raw(parsed['dates'], tf.int64) weight = tf.io.decode_raw(parsed['weights'], tf.float32) decoded = tf.reshape(decoded,[-1,4,42,42]) sample_dict = { 'tid': tid, # tile ID 'dates': dates, # Date list 'localid': parsed['localid'], # sample ID 'imgs': decoded, # image array 'labels': label, # label list 'weights': weight } return sample_dict

simple function

def preprocessDirect(tid, record): parsed = tf.io.parse_single_example(record, keys_to_features_direct) encoded = parsed['image_raw_ldseries'] labels_encoded = parsed['labels'] decoded = tf.io.decode_raw(encoded, tf.uint16) label = tf.io.decode_raw(labels_encoded, tf.int8) dates = tf.io.decode_raw(parsed['dates'], tf.int64) weight = tf.io.decode_raw(parsed['weights'], tf.float32) decoded = tf.reshape(decoded,[-1,4,42,42]) return tid, dates, parsed['localid'], decoded, label, weight

t1 = parseRecordDirect('filename here') dataset = t1.map(preprocessDirect, num_parallel_calls=tf.data.experimental.AUTOTUNE)

#

Class Definition:

0: clear

1: opaque cloud

2: thin cloud

3: haze

4: cloud shadow

5: snow

Dataset Construction:

First, we randomly generate 500 points for each tile, and all these points are aligned to the pixel grid center of the subdatasets in 60m resolution (eg. B10) for consistence when comparing with other products. It is because that other cloud detection method may use the cirrus band as features, which is in 60m resolution.

Then, the time series image patches of two shapes are cropped with each point as the center.The patches of shape (42 \times 42) are cropped from the bands in 10m resolution (B2, B3, B4, B8) and are used to construct this dataset.And the patches of shape (348 \times 348) are cropped from the True Colour Image (TCI, details see sentinel-2 user guide) file and are used to interpreting class labels.

The samples with a large number of timestamps could be time-consuming in the IO stage, thus the time series patches are divided into different groups with timestamps not exceeding 100 for every group.
Image Classification for Biospecies
kaggle.com
zip
Updated Jun 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olga Belitskaya (2020). Image Classification for Biospecies [Dataset]. https://www.kaggle.com/datasets/olgabelitskaya/image-classification-for-biospecies
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jun 19, 2020
Authors
Olga Belitskaya
Description
Context

This data collection was created for quick and easy applying Machine Learning. All images and labels are numeric arrays with the same data types and shapes. Of course, the data is free for noncommercial and nongovernmental goals as original data.

Content

1) DogBreedImages.h5(3.77 GB) Origin: Homepage & Source code Images (float32 => 128x128 pixels, 3 color channels) and labels (int32 => 120 classes): 12,000 for training & 8580 for testing.

Acknowledgments

This catalog really impressed me => TensorFlow Datasets

Inspiration

Discovering the capabilities of algorithms in the recognition of biological objects based on the same formatted data.
Image Classification for Biospecies 2
kaggle.com
zip
Updated Jun 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olga Belitskaya (2020). Image Classification for Biospecies 2 [Dataset]. https://www.kaggle.com/olgabelitskaya/image-classification-for-biospecies-2
Explore at:
zip(547761699 bytes)Available download formats
Dataset updated
Jun 19, 2020
Authors
Olga Belitskaya
Description
Context

This data collection was created for quick and easy applying Machine Learning. All images and labels are numeric arrays with the same data types and shapes. Of course, the data is free for noncommercial and nongovernmental goals as original data.

Content

1) Images of Biospecies 2 2) TfFlowerImages.h5 (688.14 MB) Origin: Homepage & Source code Images (float32 => 128x128 pixels, 3 color channels) and labels (int32 => 5 classes): 3303 for training & 367 for testing.

Acknowledgments

This catalog really impressed me => TensorFlow Datasets

Inspiration

Discovering the capabilities of algorithms in the recognition of biological objects based on the same formatted data.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist

mnist

Explore at:

82 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 1, 2024

Description

The MNIST database of handwritten digits.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('mnist', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">

Clear search

Close search

Google apps

Main menu

mnist

placesfull

code and data for A new method applied for the determination of relative...

cardiotox

References

Data for Yolo v3 kernel

Context

Content

Acknowledgements

imagenet2012_real

Data from: OpenColab project: OpenSim in Google colaboratory to explore...

swin_transformer_tf

Swin Transformer (Tensorflow)

Requirements

Pretrained Swin Transformer Checkpoints

Examples

Citation

Data from: Machine learning can be as good as maximum likelihood when...

symmetric_solids

Nuclei_Segmentation_Experiments_Demo_By_DIMAN

Dataset for: "A Dual-Frequency Radar Retrieval of Snowfall Properties Using...

Graph topological features extracted from expression profiles of...

Evaluation criteria for TFDF model.

Data Sheet 1_End-to-end 3D instance segmentation of synthetic data and...

Results for RNN-LSTM layer architecture.

Data for: Advances and critical assessment of machine learning techniques...

Dataset for "Enhancing Cloud Detection in Sentinel-2 Imagery: A...

init Tensorflow Dataset from file name

The Decoder (Optional)

simple function

Image Classification for Biospecies

Context

Content

Acknowledgments

Inspiration

Image Classification for Biospecies 2

Context

Content

Acknowledgments

Inspiration

mnist