9 datasets found

T
fmb
tensorflow.org
huggingface.co
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). fmb [Dataset]. https://www.tensorflow.org/datasets/catalog/fmb
Explore at:
Dataset updated
May 31, 2024
Description
Our dataset consists of objects in diverse appearance and geometry. It requires multi-stage and multi-modal fine motor skills to successfully assemble the pegs onto a unfixed board in a randomized scene. We collected a total of 22,550 trajectories across two different tasks on a Franka Panda arm. We record the trajectories from 2 global views and 2 wrist views. Each view contains both RGB and depth map.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('fmb', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
Z
Dataset for "Enhancing Cloud Detection in Sentinel-2 Imagery: A...
data.niaid.nih.gov
zenodo.org
Updated Feb 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yin Ranyu (2024). Dataset for "Enhancing Cloud Detection in Sentinel-2 Imagery: A Spatial-Temporal Approach and Dataset" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8419699
Explore at:
Dataset updated
Feb 4, 2024
Dataset provided by
Gong Chengjuan
Yin Ranyu
Wang Guizhou
He Guojin
Long Tengfei
Jiao Weili
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is built for time-series Sentinel-2 cloud detection and stored in Tensorflow TFRecord (refer to https://www.tensorflow.org/tutorials/load_data/tfrecord).

Each file is compressed in 7z format and can be decompressed using Bandzip or 7-zip software.

Dataset Structure:

Each filename can be split into three parts using underscores. The first part indicates whether it is designated for training or validation ('train' or 'val'); the second part indicates the Sentinel-2 tile name, and the last part indicates the number of samples in this file.

For each sample, it includes:

Sample ID;

Array of time series 4 band image patches in 10m resolution, shaped as (n_timestamps, 4, 42, 42);

Label list indicating cloud cover status for the center (6\times6) pixels of each timestamp;

Ordinal list for each timestamp;

Sample weight list (reserved);

Here is a demonstration function for parsing the TFRecord file:

import tensorflow as tf

init Tensorflow Dataset from file name

def parseRecordDirect(fname): sep = '/' parts = tf.strings.split(fname,sep) tn = tf.strings.split(parts[-1],sep='_')[-2] nn = tf.strings.to_number(tf.strings.split(parts[-1],sep='_')[-1],tf.dtypes.int64) t = tf.data.Dataset.from_tensors(tn).repeat().take(nn) t1 = tf.data.TFRecordDataset(fname) ds = tf.data.Dataset.zip((t, t1)) return ds

keys_to_features_direct = { 'localid': tf.io.FixedLenFeature([], tf.int64, -1), 'image_raw_ldseries': tf.io.FixedLenFeature((), tf.string, ''), 'labels': tf.io.FixedLenFeature((), tf.string, ''), 'dates': tf.io.FixedLenFeature((), tf.string, ''), 'weights': tf.io.FixedLenFeature((), tf.string, '') }

The Decoder (Optional)

class SeriesClassificationDirectDecorder(decoder.Decoder): """A tf.Example decoder for tfds classification datasets.""" def init(self) -> None: super()._init_()

def decode(self, tid, ds): parsed = tf.io.parse_single_example(ds, keys_to_features_direct) encoded = parsed['image_raw_ldseries'] labels_encoded = parsed['labels'] decoded = tf.io.decode_raw(encoded, tf.uint16) label = tf.io.decode_raw(labels_encoded, tf.int8) dates = tf.io.decode_raw(parsed['dates'], tf.int64) weight = tf.io.decode_raw(parsed['weights'], tf.float32) decoded = tf.reshape(decoded,[-1,4,42,42]) sample_dict = { 'tid': tid, # tile ID 'dates': dates, # Date list 'localid': parsed['localid'], # sample ID 'imgs': decoded, # image array 'labels': label, # label list 'weights': weight } return sample_dict

simple function

def preprocessDirect(tid, record): parsed = tf.io.parse_single_example(record, keys_to_features_direct) encoded = parsed['image_raw_ldseries'] labels_encoded = parsed['labels'] decoded = tf.io.decode_raw(encoded, tf.uint16) label = tf.io.decode_raw(labels_encoded, tf.int8) dates = tf.io.decode_raw(parsed['dates'], tf.int64) weight = tf.io.decode_raw(parsed['weights'], tf.float32) decoded = tf.reshape(decoded,[-1,4,42,42]) return tid, dates, parsed['localid'], decoded, label, weight

t1 = parseRecordDirect('filename here') dataset = t1.map(preprocessDirect, num_parallel_calls=tf.data.experimental.AUTOTUNE)

#

Class Definition:

0: clear

1: opaque cloud

2: thin cloud

3: haze

4: cloud shadow

5: snow

Dataset Construction:

First, we randomly generate 500 points for each tile, and all these points are aligned to the pixel grid center of the subdatasets in 60m resolution (eg. B10) for consistence when comparing with other products. It is because that other cloud detection method may use the cirrus band as features, which is in 60m resolution.

Then, the time series image patches of two shapes are cropped with each point as the center.The patches of shape (42 \times 42) are cropped from the bands in 10m resolution (B2, B3, B4, B8) and are used to construct this dataset.And the patches of shape (348 \times 348) are cropped from the True Colour Image (TCI, details see sentinel-2 user guide) file and are used to interpreting class labels.

The samples with a large number of timestamps could be time-consuming in the IO stage, thus the time series patches are divided into different groups with timestamps not exceeding 100 for every group.
T
forest_fires
tensorflow.org
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). forest_fires [Dataset]. https://www.tensorflow.org/datasets/catalog/forest_fires
Explore at:
Dataset updated
Nov 23, 2022
Description
This is a regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data.

Data Set Information:

In [Cortez and Morais, 2007], the output 'area' was first transformed with a ln(x+1) function. Then, several Data Mining methods were applied. After fitting the models, the outputs were post-processed with the inverse of the ln(x+1) transform. Four different input setups were used. The experiments were conducted using a 10-fold (cross-validation) x 30 runs. Two regression metrics were measured: MAD and RMSE. A Gaussian support vector machine (SVM) fed with only 4 direct weather conditions (temp, RH, wind and rain) obtained the best MAD value: 12.71 +- 0.01 (mean and confidence interval within 95% using a t-student distribution). The best RMSE was attained by the naive mean predictor. An analysis to the regression error curve (REC) shows that the SVM model predicts more examples within a lower admitted error. In effect, the SVM model predicts better small fires, which are the majority.

Attribute Information:

For more information, read [Cortez and Morais, 2007].

X - x-axis spatial coordinate within the Montesinho park map: 1 to 9

Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9

month - month of the year: 'jan' to 'dec'

day - day of the week: 'mon' to 'sun'

FFMC - FFMC index from the FWI system: 18.7 to 96.20

DMC - DMC index from the FWI system: 1.1 to 291.3

DC - DC index from the FWI system: 7.9 to 860.6

ISI - ISI index from the FWI system: 0.0 to 56.10

temp - temperature in Celsius degrees: 2.2 to 33.30

RH - relative humidity in %: 15.0 to 100

wind - wind speed in km/h: 0.40 to 9.40

rain - outside rain in mm/m2 : 0.0 to 6.4

area - the burned area of the forest (in ha): 0.00 to 1090.84 (this output variable is very skewed towards 0.0, thus it may make sense to model with the logarithm transform).

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('forest_fires', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
h
ucmerced
huggingface.co
tensorflow.org
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TorchGeo (2024). ucmerced [Dataset]. https://huggingface.co/datasets/torchgeo/ucmerced
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 30, 2024
Dataset authored and provided by
TorchGeo
Description
Redistributed from http://weegee.vision.ucmerced.edu/datasets/landuse.html without modification. See https://www.usgs.gov/faqs/what-are-terms-uselicensing-map-services-and-data-national-map for license.
h
lj_speech
huggingface.co
tensorflow.org
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keith Ito (2024). lj_speech [Dataset]. https://huggingface.co/datasets/keithito/lj_speech
Explore at:
Dataset updated
May 17, 2024
Authors
Keith Ito
License
https://choosealicense.com/licenses/unlicense/https://choosealicense.com/licenses/unlicense/
Description
This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books in English. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.

Note that in order to limit the required storage for preparing this dataset, the audio is stored in the .wav format and is not converted to a float32 array. To convert the audio file to a float32 array, please make use of the .map() function as follows:

import soundfile as sf def map_to_array(batch): speech_array, _ = sf.read(batch["file"]) batch["speech"] = speech_array return batch dataset = dataset.map(map_to_array, remove_columns=["file"])
T
cityscapes
tensorflow.org
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). cityscapes [Dataset]. https://www.tensorflow.org/datasets/catalog/cityscapes
Explore at:
Dataset updated
Dec 6, 2022
Description
Cityscapes is a dataset consisting of diverse urban street scenes across 50 different cities at varying times of the year as well as ground truths for several vision tasks including semantic segmentation, instance level segmentation (TODO), and stereo pair disparity inference.

For segmentation tasks (default split, accessible via 'cityscapes/semantic_segmentation'), Cityscapes provides dense pixel level annotations for 5000 images at 1024 * 2048 resolution pre-split into training (2975), validation (500) and test (1525) sets. Label annotations for segmentation tasks span across 30+ classes commonly encountered during driving scene perception. Detailed label information may be found here: https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/helpers/labels.py#L52-L99

Cityscapes also provides coarse grain segmentation annotations (accessible via 'cityscapes/semantic_segmentation_extra') for 19998 images in a 'train_extra' split which may prove useful for pretraining / data-heavy models.

Besides segmentation, cityscapes also provides stereo image pairs and ground truths for disparity inference tasks on both the normal and extra splits (accessible via 'cityscapes/stereo_disparity' and 'cityscapes/stereo_disparity_extra' respectively).

Ingored examples:

For 'cityscapes/stereo_disparity_extra':

troisdorf_000000_000073_{*} images (no disparity map present)

WARNING: this dataset requires users to setup a login and password in order to get the files.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('cityscapes', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
T
math_qa
tensorflow.org
Updated Dec 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). math_qa [Dataset]. https://www.tensorflow.org/datasets/catalog/math_qa
Explore at:
Dataset updated
Dec 14, 2022
Description
A large-scale dataset of math word problems and an interpretable neural math problem solver that learns to map problems to operation programs.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('math_qa', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
h
tiny_shakespeare
huggingface.co
tensorflow.org
Updated Mar 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrej K (2024). tiny_shakespeare [Dataset]. https://huggingface.co/datasets/karpathy/tiny_shakespeare
Explore at:
Dataset updated
Mar 27, 2024
Authors
Andrej K
Description
40,000 lines of Shakespeare from a variety of Shakespeare's plays. Featured in Andrej Karpathy's blog post 'The Unreasonable Effectiveness of Recurrent Neural Networks': http://karpathy.github.io/2015/05/21/rnn-effectiveness/.

To use for e.g. character modelling:

d = datasets.load_dataset(name='tiny_shakespeare')['train'] d = d.map(lambda x: datasets.Value('strings').unicode_split(x['text'], 'UTF-8')) # train split includes vocabulary for other splits vocabulary = sorted(set(next(iter(d)).numpy())) d = d.map(lambda x: {'cur_char': x[:-1], 'next_char': x[1:]}) d = d.unbatch() seq_len = 100 batch_size = 2 d = d.batch(seq_len) d = d.batch(batch_size)
A
Programs and Code for Geothermal Exploration Artificial Intelligence
data.amerigeoss.org
md, py, sh, zip
Updated Jun 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2021). Programs and Code for Geothermal Exploration Artificial Intelligence [Dataset]. https://data.amerigeoss.org/dataset/programs-and-code-for-geothermal-exploration-artificial-intelligence-fac4c
Explore at:
md, py, zip, shAvailable download formats
Dataset updated
Jun 9, 2021
Dataset provided by
United States
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The scripts below are used to run the Geothermal Exploration Artificial Intelligence developed within the "Detection of Potential Geothermal Exploration Sites from Hyperspectral Images via Deep Learning" project. It includes all scripts for pre-processing and processing, including: - Land Surface Temperature K-Means classifier - Labeling AI using Self Organizing Maps (SOM) - Post-processing for Permanent Scatterer InSAR (PSInSAR) analysis with SOM - Mineral marker summarizing - Artificial Intelligence (AI) Data splitting: creates data set from a single raster file - Artificial Intelligence Model: creates AI from a single data set, after splitting in Train, Validation and Test subsets - AI Mapper: creates a classification map based on a raster file
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). fmb [Dataset]. https://www.tensorflow.org/datasets/catalog/fmb

fmb

Explore at:

Dataset updated

May 31, 2024

Description

Our dataset consists of objects in diverse appearance and geometry. It requires multi-stage and multi-modal fine motor skills to successfully assemble the pegs onto a unfixed board in a randomized scene. We collected a total of 22,550 trajectories across two different tasks on a Franka Panda arm. We record the trajectories from 2 global views and 2 wrist views. Each view contains both RGB and depth map.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('fmb', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Clear search

Close search

Google apps

Main menu

fmb

Dataset for "Enhancing Cloud Detection in Sentinel-2 Imagery: A...

init Tensorflow Dataset from file name

The Decoder (Optional)

simple function

forest_fires

ucmerced

lj_speech

cityscapes

math_qa

tiny_shakespeare

Programs and Code for Geothermal Exploration Artificial Intelligence

fmb