100+ datasets found
  1. R

    K=3,train And Val Split Dataset

    • universe.roboflow.com
    zip
    Updated Dec 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    new-workspace-vursk (2021). K=3,train And Val Split Dataset [Dataset]. https://universe.roboflow.com/new-workspace-vursk/k-3-train-and-val-split
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 21, 2021
    Dataset authored and provided by
    new-workspace-vursk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    3 Train (7 Av Express)
    Variables measured
    Saba Shekari Bounding Boxes
    Description

    K=3,train And Val Split

    ## Overview
    
    K=3,train And Val Split is a dataset for object detection tasks - it contains Saba Shekari annotations for 3,206 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  2. t

    KITTI val split - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). KITTI val split - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/kitti-val-split
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    The KITTI val split is a subset of the KITTI dataset, used for validation and testing.

  3. h

    cleaned-quora-dataset-train-test-split

    • huggingface.co
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fivesixseven (2024). cleaned-quora-dataset-train-test-split [Dataset]. https://huggingface.co/datasets/567-labs/cleaned-quora-dataset-train-test-split
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2024
    Dataset authored and provided by
    fivesixseven
    Description

    This is a cleaned version of the Quora dataset that's been configured with a train-test-val split.

    Train : For training model Test : For running experiments and comparing different OSS models and closed sourced models Val : Only to be used at the end!

    Colab Notebook to reproduce : https://colab.research.google.com/drive/1dGjGiqwPV1M7JOLfcPEsSh3SC37urItS?usp=sharing

  4. Dataskripsi_split

    • kaggle.com
    Updated Sep 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dewizzz (2023). Dataskripsi_split [Dataset]. https://www.kaggle.com/datasets/dewizzz/dataskripsi-split
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dewizzz
    Description

    Dataset

    This dataset was created by Dewizzz

    Contents

  5. split-dataset

    • kaggle.com
    Updated Sep 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Büşra Ertekin (2025). split-dataset [Dataset]. https://www.kaggle.com/datasets/busraertekin/split-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Büşra Ertekin
    Description

    Dataset

    This dataset was created by Büşra Ertekin

    Contents

  6. h

    fineweb-sample-100BT_over-2048-tokens-subset-split-processed-val

    • huggingface.co
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brown (2025). fineweb-sample-100BT_over-2048-tokens-subset-split-processed-val [Dataset]. https://huggingface.co/datasets/Bradley/fineweb-sample-100BT_over-2048-tokens-subset-split-processed-val
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Brown
    Description

    Bradley/fineweb-sample-100BT_over-2048-tokens-subset-split-processed-val dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    baseball_players

    • huggingface.co
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    big (2024). baseball_players [Dataset]. https://huggingface.co/datasets/yangcci/baseball_players
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 31, 2024
    Authors
    big
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    dataset_info: features: -name: image dtype: image -name: question dtype: string -name: caption dtype: string

    splits: -name: train num_bytes: 1,572,864 num_examples: 40

    -name: test num_bytes: 764,825.6 num_examples: 20

    -name: val num_bytes: 961,740.8 num_examples: 20

    configs: data_files: -split: train path: data/train -split: test path: data/test -split: val path: data/val

  8. Caltech 256 (train-val split)

    • kaggle.com
    zip
    Updated May 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivan Marinichev (2023). Caltech 256 (train-val split) [Dataset]. https://www.kaggle.com/datasets/ivanmarinichev/caltech-256-train-val-split/discussion
    Explore at:
    zip(84904 bytes)Available download formats
    Dataset updated
    May 1, 2023
    Authors
    Ivan Marinichev
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note

    This dataset doesn't contain source data, only train-val description.

    Caltech 256 original description

    We introduce a challenging set of 256 object categories containing a total of 30607 images. The original Caltech-101 was collected by choosing a set of object categories, downloading examples from Google Images and then manually screening out all images that did not fit the category. Caltech-256 is collected in a similar manner with several improvements: a) the number of categories is more than doubled, b) the minimum number of images in any category is increased from 31 to 80, c) artifacts due to image rotation are avoided and d) a new and larger clutter category is introduced for testing background rejection. We suggest several testing paradigms to measure classification performance, then benchmark the dataset using two simple metrics as well as a state-of-the-art spatial pyramid matching algorithm. Finally we use the clutter category to train an interest detector which rejects uninformative background regions.

    Citation

    Griffin, G., Holub, A., & Perona, P. (2022). Caltech 256 (1.0) [Data set]. CaltechDATA.

    https://doi.org/10.22002/D1.20087

  9. h

    lexclipr

    • huggingface.co
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Upadhya (2001). lexclipr [Dataset]. https://huggingface.co/datasets/rohit-upadhya/lexclipr
    Explore at:
    Dataset updated
    Feb 1, 2001
    Authors
    Rohit Upadhya
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Division

    Unique_query : contains unique queries that are not present in train, val, test split. Used for testing unseen understandability of models. Train_all : contains the unsplit train, test, val datapoints. train : train split val : val split test : test split

    Original Paper : LexCLiPR: Cross-Lingual Paragraph Retrieval from Legal Judgments Bibtext: @inproceedings{upadhya-t-y-s-s-2025-lexclipr, title = "{L}ex{CL}i{PR}: Cross-Lingual Paragraph Retrieval from Legal… See the full description on the dataset page: https://huggingface.co/datasets/rohit-upadhya/lexclipr.

  10. R

    Split 6 Train Track Kfcv Dataset

    • universe.roboflow.com
    zip
    Updated Nov 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ADMU CSCI 199 UNDERGRAD THESIS (2024). Split 6 Train Track Kfcv Dataset [Dataset]. https://universe.roboflow.com/admu-csci-199-undergrad-thesis/split-6-train-track-kfcv/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 30, 2024
    Dataset authored and provided by
    ADMU CSCI 199 UNDERGRAD THESIS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    6 Train (Lexington Av Local/Pelham Express)
    Variables measured
    Train Track Damage Bounding Boxes
    Description

    Split 6 Train Track KFCV

    ## Overview
    
    Split 6 Train Track KFCV is a dataset for object detection tasks - it contains Train Track Damage annotations for 1,296 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  11. h

    pokemon-classification

    • huggingface.co
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victoria (2024). pokemon-classification [Dataset]. https://huggingface.co/datasets/VictoriaDerks/pokemon-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2024
    Authors
    Victoria
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Dataset Name

    Generation I pokemon images dataset (35 pokemon).

      Dataset Details
    

    This dataset was curated using pokemon images found on bulbapedia. It contains 35 randomly selected Generation I pokemon, with 7 images per pokemon.

      Dataset Description
    

    The dataset has train/test/val splits. The train split contains 4 images per pokemon, the test & val splits contain either 2 or 1 image per pokemon. The following images were selected from… See the full description on the dataset page: https://huggingface.co/datasets/VictoriaDerks/pokemon-classification.

  12. R

    Val_split Dataset

    • universe.roboflow.com
    zip
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JGATE (2024). Val_split Dataset [Dataset]. https://universe.roboflow.com/jgate-90tes/val_split/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 21, 2024
    Dataset authored and provided by
    JGATE
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Good Broke Lose Uncovered Circle 7GO9 Bounding Boxes
    Description

    Val_split

    ## Overview
    
    Val_split is a dataset for object detection tasks - it contains Good Broke Lose Uncovered Circle 7GO9 annotations for 1,489 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  13. R

    Split 10 Train Track Kfcv Dataset

    • universe.roboflow.com
    zip
    Updated Nov 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ADMU CSCI 199 UNDERGRAD THESIS (2024). Split 10 Train Track Kfcv Dataset [Dataset]. https://universe.roboflow.com/admu-csci-199-undergrad-thesis/split-10-train-track-kfcv/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 30, 2024
    Dataset authored and provided by
    ADMU CSCI 199 UNDERGRAD THESIS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Train Track Damage Bounding Boxes
    Description

    Split 10 Train Track KFCV

    ## Overview
    
    Split 10 Train Track KFCV is a dataset for object detection tasks - it contains Train Track Damage annotations for 1,296 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  14. R

    Yt Temporal 1b: Val Dataset

    • universe.roboflow.com
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SA-Co VEVal Dataset (2025). Yt Temporal 1b: Val Dataset [Dataset]. https://universe.roboflow.com/sa-co-veval/yt-temporal-1b-val
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset authored and provided by
    SA-Co VEVal Dataset
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects Polygons
    Description

    SA-Co/VEval – YT-Temporal-1B: Val

    SA-Co/VEval is an evaluation dataset for promptable concept segmentation (PCS) in images developed by Meta for the Segment Anything 3 model (SAM 3). The dataset contains videos paired with text labels (also referred as Noun Phrases aka NPs), each annotated exhaustively with masks on all object instances that match the label.

    This Project allows you to explore YT-Temporal-1B: Val, which is the val split from the YT-Temporal-1B subset. You can see the test split at YT-Temporal-1B: Test.

    Download Instructions

    The full SA-Co/VEval data is available in its canonical, eval-ready form below.

    Download SA-V video frames: https://sa-co.roboflow.com/veval/saco_sav.zip

    Download YT-1B video frames: https://sa-co.roboflow.com/veval/saco_yt1b.zip

    Download SmartGlasses video frames: https://sa-co.roboflow.com/veval/saco_sg.zip

    Download ground truth annotations: https://sa-co.roboflow.com/veval/gt-annotations.zip

    Download the full bundle: https://sa-co.roboflow.com/veval/all.zip

    Explore all SA-Co/VEval datasets

    The Sa-Co/VEval dataset covers 3 image sources. The image sources are: SA-V, YT-Temporal-1B, SmartGlasses.

    Explore all: SA-Co/VEval on Roboflow Universe

    Read Meta's data license for SA-Co/VEval: License

  15. R

    Sa V: Val Dataset

    • universe.roboflow.com
    zip
    Updated Nov 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SA-Co VEVal Dataset (2025). Sa V: Val Dataset [Dataset]. https://universe.roboflow.com/sa-co-veval/sa-v-val/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset authored and provided by
    SA-Co VEVal Dataset
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects X4nL Polygons
    Description

    SA-Co/VEval – SA-V: Val

    SA-Co/VEval is an evaluation dataset for promptable concept segmentation (PCS) in images developed by Meta for the Segment Anything 3 model (SAM 3). The dataset contains videos paired with text labels (also referred as Noun Phrases aka NPs), each annotated exhaustively with masks on all object instances that match the label.

    This Project allows you to explore SA-V: Val, which is the val split from the SA-V subset. You can see the test split at SA-V: Test.

    Download Instructions

    The full SA-Co/VEval data is available in its canonical, eval-ready form below.

    Download SA-V video frames: https://sa-co.roboflow.com/veval/saco_sav.zip

    Download YT-1B video frames: https://sa-co.roboflow.com/veval/saco_yt1b.zip

    Download SmartGlasses video frames: https://sa-co.roboflow.com/veval/saco_sg.zip

    Download ground truth annotations: https://sa-co.roboflow.com/veval/gt-annotations.zip

    Download the full bundle: https://sa-co.roboflow.com/veval/all.zip

    Explore all SA-Co/VEval datasets

    The Sa-Co/VEval dataset covers 3 image sources. The image sources are: SA-V, YT-Temporal-1B, SmartGlasses.

    Explore all: SA-Co/VEval on Roboflow Universe

    Read Meta's data license for SA-Co/VEval: License

  16. train/val split

    • kaggle.com
    zip
    Updated Apr 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Baiduc (2024). train/val split [Dataset]. https://www.kaggle.com/datasets/gabrielbaiduc/trainval-split/discussion
    Explore at:
    zip(9597685 bytes)Available download formats
    Dataset updated
    Apr 19, 2024
    Authors
    Gabriel Baiduc
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Gabriel Baiduc

    Released under Apache 2.0

    Contents

  17. h

    nemotron-post-training-samples-splits

    • huggingface.co
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brandon Tong (2025). nemotron-post-training-samples-splits [Dataset]. https://huggingface.co/datasets/brandolorian/nemotron-post-training-samples-splits
    Explore at:
    Dataset updated
    Sep 18, 2025
    Authors
    Brandon Tong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nemotron Post-Training Samples with Train/Val/Test Splits

    This dataset contains structured train/validation/test splits from the nvidia/Llama-Nemotron-Post-Training-Dataset, with both tagged and untagged versions for different training scenarios.

      Attribution
    

    This work is derived from the Llama-Nemotron-Post-Training-Dataset-v1.1 by NVIDIA Corporation, licensed under CC BY 4.0. Original Dataset: nvidia/Llama-Nemotron-Post-Training-Dataset Original Authors: NVIDIA… See the full description on the dataset page: https://huggingface.co/datasets/brandolorian/nemotron-post-training-samples-splits.

  18. T

    ref_coco

    • tensorflow.org
    • opendatalab.com
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ref_coco [Dataset]. https://www.tensorflow.org/datasets/catalog/ref_coco
    Explore at:
    Dataset updated
    May 31, 2024
    Description

    A collection of 3 referring expression datasets based off images in the COCO dataset. A referring expression is a piece of text that describes a unique object in an image. These datasets are collected by asking human raters to disambiguate objects delineated by bounding boxes in the COCO dataset.

    RefCoco and RefCoco+ are from Kazemzadeh et al. 2014. RefCoco+ expressions are strictly appearance based descriptions, which they enforced by preventing raters from using location based descriptions (e.g., "person to the right" is not a valid description for RefCoco+). RefCocoG is from Mao et al. 2016, and has more rich description of objects compared to RefCoco due to differences in the annotation process. In particular, RefCoco was collected in an interactive game-based setting, while RefCocoG was collected in a non-interactive setting. On average, RefCocoG has 8.4 words per expression while RefCoco has 3.5 words.

    Each dataset has different split allocations that are typically all reported in papers. The "testA" and "testB" sets in RefCoco and RefCoco+ contain only people and only non-people respectively. Images are partitioned into the various splits. In the "google" split, objects, not images, are partitioned between the train and non-train splits. This means that the same image can appear in both the train and validation split, but the objects being referred to in the image will be different between the two sets. In contrast, the "unc" and "umd" splits partition images between the train, validation, and test split. In RefCocoG, the "google" split does not have a canonical test set, and the validation set is typically reported in papers as "val*".

    Stats for each dataset and split ("refs" is the number of referring expressions, and "images" is the number of images):

    datasetpartitionsplitrefsimages
    refcocogoogletrain4000019213
    refcocogoogleval50004559
    refcocogoogletest50004527
    refcocounctrain4240416994
    refcocouncval38111500
    refcocounctestA1975750
    refcocounctestB1810750
    refcoco+unctrain4227816992
    refcoco+uncval38051500
    refcoco+unctestA1975750
    refcoco+unctestB1798750
    refcocoggoogletrain4482224698
    refcocoggoogleval50004650
    refcocogumdtrain4222621899
    refcocogumdval25731300
    refcocogumdtest50232600

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('ref_coco', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/ref_coco-refcoco_unc-1.1.0.png" alt="Visualization" width="500px">

  19. Z

    Synthetically Spoken STAIR

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William N. Havard; Jean-Pierre Chevrot; Laurent Besacier (2020). Synthetically Spoken STAIR [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1495069
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Université Grenoble Alpes, LIDILEM
    Université Grenoble Alpes, LIG/GETALP
    Université Grenoble Alpes, LIG/GETALP and LIDILEM
    Authors
    William N. Havard; Jean-Pierre Chevrot; Laurent Besacier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset consists of synthetically spoken captions for the STAIR dataset. Following the same methodology as Chrupała et al. (see article | dataset | code) we generated speech for each caption of the STAIR dataset using Google's Text-to-Speech API.

    This dataset was used for visually grounded speech experiments (see article accepted at ICASSP2019).

    @INPROCEEDINGS{8683069, author={W. N. {Havard} and J. {Chevrot} and L. {Besacier}}, booktitle={ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese}, year={2019}, volume={}, number={}, pages={8618-8622}, keywords={information retrieval;natural language processing;neural nets;speech processing;word processing;artificial neural attention;human attention;monolingual models;part-of-speech tags;nouns;neural models;visually grounded speech signal;English language;Japanese language;word endings;cross-lingual speech-to-speech retrieval;grounded language learning;attention mechanism;cross-lingual speech retrieval;recurrent neural networks.}, doi={10.1109/ICASSP.2019.8683069}, ISSN={2379-190X}, month={May},}

    The dataset comprises the following files :

    mp3-stair.tar.gz : MP3 files of each caption in the STAIR dataset. Filenames have the following pattern imageID_captionID, where both imageID and captionID correspond to those provided in the original dataset (see annotation format here)

    dataset.mfcc.npy : Numpy array with MFCC vectors for each caption. MFCC were extracted using python_speech_features with default configuration. To know to which caption the MFCC vectors belong to, you can use the files dataset.words.txt and dataset.ids.txt.

    dataset.words.txt : Captions corresponding to each MFCC vector (line number = position in Numpy array, starting from 0)

    dataset.ids.txt : IDs of the captions (imageID_captionID) corresponding to each MFCC vector (line number = position in Numpy array, starting from 0)

    Splits

    test

    test.txt : captions comprising the test split

    test_ids.txt: IDs of the captions in the test split

    test_tagged.txt : tagged version of the test split

    test-alignments.json.zip : Forced alignments of all the captions in the test split. (dictionary where the key corresponds to the caption ID in the STAIR dataset). Due to an unknown error during upload, the JSON file had to be zipped...

    train

    train.txt : captions comprising the train split

    train_ids.txt : IDs of the captions in the train split

    train_tagged.txt : tagged version of the train split

    val

    val.txt : captions comprising the val split

    val_ids.txt : IDs of the captions in the val split

    val_tagged.txt : tagged version of the val split

  20. transition1x splits

    • figshare.com
    bin
    Updated Dec 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junha Lee (2023). transition1x splits [Dataset]. http://doi.org/10.6084/m9.figshare.24792828.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 12, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Junha Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    .db versions of the train/test/val splits as described by https://gitlab.com/matschreiner/Transition1x.For Fall 2023 CS224W Final Project

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
new-workspace-vursk (2021). K=3,train And Val Split Dataset [Dataset]. https://universe.roboflow.com/new-workspace-vursk/k-3-train-and-val-split

K=3,train And Val Split Dataset

k-3-train-and-val-split

k3train-and-val-split-dataset

Explore at:
zipAvailable download formats
Dataset updated
Dec 21, 2021
Dataset authored and provided by
new-workspace-vursk
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
3 Train (7 Av Express)
Variables measured
Saba Shekari Bounding Boxes
Description

K=3,train And Val Split

## Overview

K=3,train And Val Split is a dataset for object detection tasks - it contains Saba Shekari annotations for 3,206 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Search
Clear search
Close search
Google apps
Main menu