9 datasets found
  1. T

    fmb

    • tensorflow.org
    • huggingface.co
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). fmb [Dataset]. https://www.tensorflow.org/datasets/catalog/fmb
    Explore at:
    Dataset updated
    May 31, 2024
    Description

    Our dataset consists of objects in diverse appearance and geometry. It requires multi-stage and multi-modal fine motor skills to successfully assemble the pegs onto a unfixed board in a randomized scene. We collected a total of 22,550 trajectories across two different tasks on a Franka Panda arm. We record the trajectories from 2 global views and 2 wrist views. Each view contains both RGB and depth map.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('fmb', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  2. Z

    Dataset for "Enhancing Cloud Detection in Sentinel-2 Imagery: A...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yin Ranyu (2024). Dataset for "Enhancing Cloud Detection in Sentinel-2 Imagery: A Spatial-Temporal Approach and Dataset" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8419699
    Explore at:
    Dataset updated
    Feb 4, 2024
    Dataset provided by
    Gong Chengjuan
    Yin Ranyu
    Wang Guizhou
    He Guojin
    Long Tengfei
    Jiao Weili
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset is built for time-series Sentinel-2 cloud detection and stored in Tensorflow TFRecord (refer to https://www.tensorflow.org/tutorials/load_data/tfrecord).

    Each file is compressed in 7z format and can be decompressed using Bandzip or 7-zip software.

    Dataset Structure:

    Each filename can be split into three parts using underscores. The first part indicates whether it is designated for training or validation ('train' or 'val'); the second part indicates the Sentinel-2 tile name, and the last part indicates the number of samples in this file.

    For each sample, it includes:

    Sample ID;

    Array of time series 4 band image patches in 10m resolution, shaped as (n_timestamps, 4, 42, 42);

    Label list indicating cloud cover status for the center (6\times6) pixels of each timestamp;

    Ordinal list for each timestamp;

    Sample weight list (reserved);

    Here is a demonstration function for parsing the TFRecord file:

    import tensorflow as tf

    init Tensorflow Dataset from file name

    def parseRecordDirect(fname): sep = '/' parts = tf.strings.split(fname,sep) tn = tf.strings.split(parts[-1],sep='_')[-2] nn = tf.strings.to_number(tf.strings.split(parts[-1],sep='_')[-1],tf.dtypes.int64) t = tf.data.Dataset.from_tensors(tn).repeat().take(nn) t1 = tf.data.TFRecordDataset(fname) ds = tf.data.Dataset.zip((t, t1)) return ds

    keys_to_features_direct = { 'localid': tf.io.FixedLenFeature([], tf.int64, -1), 'image_raw_ldseries': tf.io.FixedLenFeature((), tf.string, ''), 'labels': tf.io.FixedLenFeature((), tf.string, ''), 'dates': tf.io.FixedLenFeature((), tf.string, ''), 'weights': tf.io.FixedLenFeature((), tf.string, '') }

    The Decoder (Optional)

    class SeriesClassificationDirectDecorder(decoder.Decoder): """A tf.Example decoder for tfds classification datasets.""" def init(self) -> None: super()._init_()

    def decode(self, tid, ds): parsed = tf.io.parse_single_example(ds, keys_to_features_direct) encoded = parsed['image_raw_ldseries'] labels_encoded = parsed['labels'] decoded = tf.io.decode_raw(encoded, tf.uint16) label = tf.io.decode_raw(labels_encoded, tf.int8) dates = tf.io.decode_raw(parsed['dates'], tf.int64) weight = tf.io.decode_raw(parsed['weights'], tf.float32) decoded = tf.reshape(decoded,[-1,4,42,42]) sample_dict = { 'tid': tid, # tile ID 'dates': dates, # Date list 'localid': parsed['localid'], # sample ID 'imgs': decoded, # image array 'labels': label, # label list 'weights': weight } return sample_dict

    simple function

    def preprocessDirect(tid, record): parsed = tf.io.parse_single_example(record, keys_to_features_direct) encoded = parsed['image_raw_ldseries'] labels_encoded = parsed['labels'] decoded = tf.io.decode_raw(encoded, tf.uint16) label = tf.io.decode_raw(labels_encoded, tf.int8) dates = tf.io.decode_raw(parsed['dates'], tf.int64) weight = tf.io.decode_raw(parsed['weights'], tf.float32) decoded = tf.reshape(decoded,[-1,4,42,42]) return tid, dates, parsed['localid'], decoded, label, weight

    t1 = parseRecordDirect('filename here') dataset = t1.map(preprocessDirect, num_parallel_calls=tf.data.experimental.AUTOTUNE)

    #

    Class Definition:

    0: clear

    1: opaque cloud

    2: thin cloud

    3: haze

    4: cloud shadow

    5: snow

    Dataset Construction:

    First, we randomly generate 500 points for each tile, and all these points are aligned to the pixel grid center of the subdatasets in 60m resolution (eg. B10) for consistence when comparing with other products. It is because that other cloud detection method may use the cirrus band as features, which is in 60m resolution.

    Then, the time series image patches of two shapes are cropped with each point as the center.The patches of shape (42 \times 42) are cropped from the bands in 10m resolution (B2, B3, B4, B8) and are used to construct this dataset.And the patches of shape (348 \times 348) are cropped from the True Colour Image (TCI, details see sentinel-2 user guide) file and are used to interpreting class labels.

    The samples with a large number of timestamps could be time-consuming in the IO stage, thus the time series patches are divided into different groups with timestamps not exceeding 100 for every group.

  3. T

    forest_fires

    • tensorflow.org
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). forest_fires [Dataset]. https://www.tensorflow.org/datasets/catalog/forest_fires
    Explore at:
    Dataset updated
    Nov 23, 2022
    Description

    This is a regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data.

    Data Set Information:

    In [Cortez and Morais, 2007], the output 'area' was first transformed with a ln(x+1) function. Then, several Data Mining methods were applied. After fitting the models, the outputs were post-processed with the inverse of the ln(x+1) transform. Four different input setups were used. The experiments were conducted using a 10-fold (cross-validation) x 30 runs. Two regression metrics were measured: MAD and RMSE. A Gaussian support vector machine (SVM) fed with only 4 direct weather conditions (temp, RH, wind and rain) obtained the best MAD value: 12.71 +- 0.01 (mean and confidence interval within 95% using a t-student distribution). The best RMSE was attained by the naive mean predictor. An analysis to the regression error curve (REC) shows that the SVM model predicts more examples within a lower admitted error. In effect, the SVM model predicts better small fires, which are the majority.

    Attribute Information:

    For more information, read [Cortez and Morais, 2007].

    1. X - x-axis spatial coordinate within the Montesinho park map: 1 to 9
    2. Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9
    3. month - month of the year: 'jan' to 'dec'
    4. day - day of the week: 'mon' to 'sun'
    5. FFMC - FFMC index from the FWI system: 18.7 to 96.20
    6. DMC - DMC index from the FWI system: 1.1 to 291.3
    7. DC - DC index from the FWI system: 7.9 to 860.6
    8. ISI - ISI index from the FWI system: 0.0 to 56.10
    9. temp - temperature in Celsius degrees: 2.2 to 33.30
    10. RH - relative humidity in %: 15.0 to 100
    11. wind - wind speed in km/h: 0.40 to 9.40
    12. rain - outside rain in mm/m2 : 0.0 to 6.4
    13. area - the burned area of the forest (in ha): 0.00 to 1090.84 (this output variable is very skewed towards 0.0, thus it may make sense to model with the logarithm transform).

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('forest_fires', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  4. h

    ucmerced

    • huggingface.co
    • tensorflow.org
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TorchGeo (2024). ucmerced [Dataset]. https://huggingface.co/datasets/torchgeo/ucmerced
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 30, 2024
    Dataset authored and provided by
    TorchGeo
    Description
  5. h

    lj_speech

    • huggingface.co
    • tensorflow.org
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keith Ito (2024). lj_speech [Dataset]. https://huggingface.co/datasets/keithito/lj_speech
    Explore at:
    Dataset updated
    May 17, 2024
    Authors
    Keith Ito
    License

    https://choosealicense.com/licenses/unlicense/https://choosealicense.com/licenses/unlicense/

    Description

    This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books in English. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.

    Note that in order to limit the required storage for preparing this dataset, the audio is stored in the .wav format and is not converted to a float32 array. To convert the audio file to a float32 array, please make use of the .map() function as follows:

    import soundfile as sf
    
    def map_to_array(batch):
      speech_array, _ = sf.read(batch["file"])
      batch["speech"] = speech_array
      return batch
    
    dataset = dataset.map(map_to_array, remove_columns=["file"])
    
  6. T

    cityscapes

    • tensorflow.org
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). cityscapes [Dataset]. https://www.tensorflow.org/datasets/catalog/cityscapes
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    Cityscapes is a dataset consisting of diverse urban street scenes across 50 different cities at varying times of the year as well as ground truths for several vision tasks including semantic segmentation, instance level segmentation (TODO), and stereo pair disparity inference.

    For segmentation tasks (default split, accessible via 'cityscapes/semantic_segmentation'), Cityscapes provides dense pixel level annotations for 5000 images at 1024 * 2048 resolution pre-split into training (2975), validation (500) and test (1525) sets. Label annotations for segmentation tasks span across 30+ classes commonly encountered during driving scene perception. Detailed label information may be found here: https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/helpers/labels.py#L52-L99

    Cityscapes also provides coarse grain segmentation annotations (accessible via 'cityscapes/semantic_segmentation_extra') for 19998 images in a 'train_extra' split which may prove useful for pretraining / data-heavy models.

    Besides segmentation, cityscapes also provides stereo image pairs and ground truths for disparity inference tasks on both the normal and extra splits (accessible via 'cityscapes/stereo_disparity' and 'cityscapes/stereo_disparity_extra' respectively).

    Ingored examples:

    • For 'cityscapes/stereo_disparity_extra':
      • troisdorf_000000_000073_{*} images (no disparity map present)

    WARNING: this dataset requires users to setup a login and password in order to get the files.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('cityscapes', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  7. T

    math_qa

    • tensorflow.org
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). math_qa [Dataset]. https://www.tensorflow.org/datasets/catalog/math_qa
    Explore at:
    Dataset updated
    Dec 14, 2022
    Description

    A large-scale dataset of math word problems and an interpretable neural math problem solver that learns to map problems to operation programs.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('math_qa', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  8. h

    tiny_shakespeare

    • huggingface.co
    • tensorflow.org
    Updated Mar 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrej K (2024). tiny_shakespeare [Dataset]. https://huggingface.co/datasets/karpathy/tiny_shakespeare
    Explore at:
    Dataset updated
    Mar 27, 2024
    Authors
    Andrej K
    Description

    40,000 lines of Shakespeare from a variety of Shakespeare's plays. Featured in Andrej Karpathy's blog post 'The Unreasonable Effectiveness of Recurrent Neural Networks': http://karpathy.github.io/2015/05/21/rnn-effectiveness/.

    To use for e.g. character modelling:

    d = datasets.load_dataset(name='tiny_shakespeare')['train']
    d = d.map(lambda x: datasets.Value('strings').unicode_split(x['text'], 'UTF-8'))
    # train split includes vocabulary for other splits
    vocabulary = sorted(set(next(iter(d)).numpy()))
    d = d.map(lambda x: {'cur_char': x[:-1], 'next_char': x[1:]})
    d = d.unbatch()
    seq_len = 100
    batch_size = 2
    d = d.batch(seq_len)
    d = d.batch(batch_size)
    
  9. A

    Programs and Code for Geothermal Exploration Artificial Intelligence

    • data.amerigeoss.org
    md, py, sh, zip
    Updated Jun 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2021). Programs and Code for Geothermal Exploration Artificial Intelligence [Dataset]. https://data.amerigeoss.org/dataset/programs-and-code-for-geothermal-exploration-artificial-intelligence-fac4c
    Explore at:
    md, py, zip, shAvailable download formats
    Dataset updated
    Jun 9, 2021
    Dataset provided by
    United States
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The scripts below are used to run the Geothermal Exploration Artificial Intelligence developed within the "Detection of Potential Geothermal Exploration Sites from Hyperspectral Images via Deep Learning" project. It includes all scripts for pre-processing and processing, including: - Land Surface Temperature K-Means classifier - Labeling AI using Self Organizing Maps (SOM) - Post-processing for Permanent Scatterer InSAR (PSInSAR) analysis with SOM - Mineral marker summarizing - Artificial Intelligence (AI) Data splitting: creates data set from a single raster file - Artificial Intelligence Model: creates AI from a single data set, after splitting in Train, Validation and Test subsets - AI Mapper: creates a classification map based on a raster file

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). fmb [Dataset]. https://www.tensorflow.org/datasets/catalog/fmb

fmb

Related Article
Explore at:
Dataset updated
May 31, 2024
Description

Our dataset consists of objects in diverse appearance and geometry. It requires multi-stage and multi-modal fine motor skills to successfully assemble the pegs onto a unfixed board in a randomized scene. We collected a total of 22,550 trajectories across two different tasks on a Franka Panda arm. We record the trajectories from 2 global views and 2 wrist views. Each view contains both RGB and depth map.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('fmb', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Search
Clear search
Close search
Google apps
Main menu