This dataset was created by Robert Sizemore
This dataset was created by kwang
The MNIST database of handwritten digits.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('cifar10', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/cifar10-3.0.2.png" alt="Visualization" width="500px">
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is built for time-series Sentinel-2 cloud detection and stored in Tensorflow TFRecord (refer to https://www.tensorflow.org/tutorials/load_data/tfrecord).
Each file is compressed in 7z format and can be decompressed using Bandzip or 7-zip software.
Dataset Structure:
Each filename can be split into three parts using underscores. The first part indicates whether it is designated for training or validation ('train' or 'val'); the second part indicates the Sentinel-2 tile name, and the last part indicates the number of samples in this file.
For each sample, it includes:
Sample ID;
Array of time series 4 band image patches in 10m resolution, shaped as (n_timestamps, 4, 42, 42);
Label list indicating cloud cover status for the center (6\times6) pixels of each timestamp;
Ordinal list for each timestamp;
Sample weight list (reserved);
Here is a demonstration function for parsing the TFRecord file:
import tensorflow as tf
def parseRecordDirect(fname): sep = '/' parts = tf.strings.split(fname,sep) tn = tf.strings.split(parts[-1],sep='_')[-2] nn = tf.strings.to_number(tf.strings.split(parts[-1],sep='_')[-1],tf.dtypes.int64) t = tf.data.Dataset.from_tensors(tn).repeat().take(nn) t1 = tf.data.TFRecordDataset(fname) ds = tf.data.Dataset.zip((t, t1)) return ds
keys_to_features_direct = { 'localid': tf.io.FixedLenFeature([], tf.int64, -1), 'image_raw_ldseries': tf.io.FixedLenFeature((), tf.string, ''), 'labels': tf.io.FixedLenFeature((), tf.string, ''), 'dates': tf.io.FixedLenFeature((), tf.string, ''), 'weights': tf.io.FixedLenFeature((), tf.string, '') }
class SeriesClassificationDirectDecorder(decoder.Decoder): """A tf.Example decoder for tfds classification datasets.""" def init(self) -> None: super()._init_()
def decode(self, tid, ds): parsed = tf.io.parse_single_example(ds, keys_to_features_direct) encoded = parsed['image_raw_ldseries'] labels_encoded = parsed['labels'] decoded = tf.io.decode_raw(encoded, tf.uint16) label = tf.io.decode_raw(labels_encoded, tf.int8) dates = tf.io.decode_raw(parsed['dates'], tf.int64) weight = tf.io.decode_raw(parsed['weights'], tf.float32) decoded = tf.reshape(decoded,[-1,4,42,42]) sample_dict = { 'tid': tid, # tile ID 'dates': dates, # Date list 'localid': parsed['localid'], # sample ID 'imgs': decoded, # image array 'labels': label, # label list 'weights': weight } return sample_dict
def preprocessDirect(tid, record): parsed = tf.io.parse_single_example(record, keys_to_features_direct) encoded = parsed['image_raw_ldseries'] labels_encoded = parsed['labels'] decoded = tf.io.decode_raw(encoded, tf.uint16) label = tf.io.decode_raw(labels_encoded, tf.int8) dates = tf.io.decode_raw(parsed['dates'], tf.int64) weight = tf.io.decode_raw(parsed['weights'], tf.float32) decoded = tf.reshape(decoded,[-1,4,42,42]) return tid, dates, parsed['localid'], decoded, label, weight
t1 = parseRecordDirect('filename here') dataset = t1.map(preprocessDirect, num_parallel_calls=tf.data.experimental.AUTOTUNE)
#
Class Definition:
0: clear
1: opaque cloud
2: thin cloud
3: haze
4: cloud shadow
5: snow
Dataset Construction:
First, we randomly generate 500 points for each tile, and all these points are aligned to the pixel grid center of the subdatasets in 60m resolution (eg. B10) for consistence when comparing with other products. It is because that other cloud detection method may use the cirrus band as features, which is in 60m resolution.
Then, the time series image patches of two shapes are cropped with each point as the center.The patches of shape (42 \times 42) are cropped from the bands in 10m resolution (B2, B3, B4, B8) and are used to construct this dataset.And the patches of shape (348 \times 348) are cropped from the True Colour Image (TCI, details see sentinel-2 user guide) file and are used to interpreting class labels.
The samples with a large number of timestamps could be time-consuming in the IO stage, thus the time series patches are divided into different groups with timestamps not exceeding 100 for every group.
The BOREAS TF-08 team collected energy, CO2, and water vapor flux data at the BOREAS NSA-OJP site during the growing season of 1994 and most of the year for 1996.
This deposit contains the underlying data related to the conference publication "Improving engineering information retrieval by combining TF-IDF and product structure classification" Complete download (zip, 13.4 MiB)
Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('fashion_mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/fashion_mnist-3.0.1.png" alt="Visualization" width="500px">
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Rajnish Chauhan
Released under CC0: Public Domain
Members of the BOREAS TF-02 team collected meteorological and ozone measurements from instruments mounted below a tethered balloon. These data were collected at the SSA-OA site to extend meteorological and ozone measurements made from the flux tower to heights of 300 m. The tethersonde operated during the fall of 1993 and the spring, summer, and fall of 1994.
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('imdb_reviews', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
The BOREAS TF-11 team collected several data sets in their efforts to fully describe the flux and site characteristics at the SSA-Fen site. This data set contains fluxes of methane and carbon dioxide at the SSA fen site measured using static chambers. The measurements were conducted as part of a 2x2 factorial experiment in which we added carbon (300 g m-2 as wheat straw) and nitrogen (6 g m-2 as urea) to four replicate locations in the vicinity of the TF-11 tower. In addition to siting and treatment variables, it reports air temperature and water table height relative to the average peat surface during each measurement. The data set covers the period from the first week of June 1994 through the second week of September, 1994.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Rajnish Chauhan
Released under CC0: Public Domain
This dataset provides information about the number of properties, residents, and average property values for T F Hicks cross streets in New Hudson, MI.
The BOREAS TF-01 team collected energy, carbon dioxide, and momentum flux data under the canopy along with meteorological and soils data at the BOREAS SSA-OA site from mid-October to mid-November of 1993 and throughout all of 1994.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tensorflow Records for the Screw Detection module. The records are loaded by the module for training the DCNN.
These statistics come from more than three million data items reported on about 250,000 sales tax returns filed quarterly and on about 300,000 returns filed annually. The dataset categorizes quarterly sales and purchases data by industry group using the North American Industry Classification System. The status of data will change as preliminary data becomes final.
Tensorflow reimplementation of Swin Transformer model.
Based on Official Pytorch implementation.
https://user-images.githubusercontent.com/24825165/121768619-038e6d80-cb9a-11eb-8cb7-daa827e7772b.png" alt="image">
tensorflow >= 2.4.1
ImageNet-1K and ImageNet-22K Pretrained Checkpoints
| name | pretrain | resolution |acc@1 | #params | model |
| :---: | :---: | :---: | :---: | :---: | :---: |
|swin_tiny_224
|ImageNet-1K |224x224|81.2|28M|github|
|swin_small_224
|ImageNet-1K |224x224|83.2|50M|github|
|swin_base_224
|ImageNet-22K|224x224|85.2|88M|github|
|swin_base_384
|ImageNet-22K|384x384|86.4|88M|github|
|swin_large_224
|ImageNet-22K|224x224|86.3|197M|github|
|swin_large_384
|ImageNet-22K|384x384|87.3|197M|github|
Initializing the model: ```python from swintransformer import SwinTransformer
model = SwinTransformer('swin_tiny_224', num_classes=1000, include_top=True, pretrained=False)
You can use a pretrained model like this:
python
import tensorflow as tf
from swintransformer import SwinTransformer
model = tf.keras.Sequential([
tf.keras.layers.Lambda(lambda data: tf.keras.applications.imagenet_utils.preprocess_input(tf.cast(data, tf.float32), mode="torch"), input_shape=[*IMAGE_SIZE, 3]),
SwinTransformer('swin_tiny_224', include_top=False, pretrained=True),
tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')
])
If you use a pretrained model with TPU on kaggle, specify `use_tpu` option:
python
import tensorflow as tf
from swintransformer import SwinTransformer
model = tf.keras.Sequential([ tf.keras.layers.Lambda(lambda data: tf.keras.applications.imagenet_utils.preprocess_input(tf.cast(data, tf.float32), mode="torch"), input_shape=[*IMAGE_SIZE, 3]), SwinTransformer('swin_tiny_224', include_top=False, pretrained=True, use_tpu=True), tf.keras.layers.Dense(NUM_CLASSES, activation='softmax') ]) ``` Example: TPU training on Kaggle
@article{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
journal={arXiv preprint arXiv:2103.14030},
year={2021}
}
This dataset is comprised of the final assessment rolls submitted to the New York State Department of Taxation and Finance – Office of Real Property Tax Services by 996 local governments. Together, the assessment rolls provide the details of the more than 4.7 million parcels in New York State.
The dataset includes assessment rolls for all cities and towns, except New York City. (For New York City assessment roll data, see NYC Open Data [https://opendata.cityofnewyork.us])
For each property, the dataset includes assessed value, full market value, property size, owners, exemption information, and other fields.
Tip: For a unique identifier for every property in New York State, combine the SWIS code and print key fields.
This dataset contains CLIP 4-TF 2 Community data from the Latnjajaure site, Sweden in 1995, 1996, 1997, 1999 & 2000. The Community Level Interaction Program (CLIP) data comprises a block of poor, dry heath. For more information, please see the readme file.
This dataset was created by Robert Sizemore