7 datasets found
  1. R

    Fill In The Blanks Dataset

    • universe.roboflow.com
    zip
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shape2 (2022). Fill In The Blanks Dataset [Dataset]. https://universe.roboflow.com/shape2/fill-in-the-blanks
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 23, 2022
    Dataset authored and provided by
    shape2
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    FIB Bounding Boxes
    Description

    Fill In The Blanks

    ## Overview
    
    Fill In The Blanks is a dataset for object detection tasks - it contains FIB annotations for 1,496 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  2. Polyvore Outfit Dataset

    • kaggle.com
    zip
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enis Teper (2025). Polyvore Outfit Dataset [Dataset]. https://www.kaggle.com/datasets/enisteper1/polyvore-outfit-dataset
    Explore at:
    zip(2474775159 bytes)Available download formats
    Dataset updated
    Mar 7, 2025
    Authors
    Enis Teper
    Description

    Introduction

    The Polyvore Outfit dataset is one of the largest datasets for fashion compatibility prediction and Fill-in-the-Blank (FITB) tasks. It provides structured information about outfits and individual fashion items, making it a valuable resource for research in outfit recommendation and fashion compatibility modeling.

    The dataset consists of two types of sets: disjoint and non-disjoint:

    • In the non-disjoint set, some items (but not complete outfits) appear in both training and test splits. This allows models to leverage previously seen items to predict outfit compatibility.
    • In the disjoint set, the training and test splits have no overlapping items. This ensures that models must generalize to unseen outfits without relying on previously encountered fashion items.

    Dataset statistics:

    • Non-disjoint set: 53,306 training, 5,000 validation and 10,000 testing outfits.
    • Disjoint set: 16,995 training outfits, 3,000 validation and 15,145 testing outfits.

    Each item has information in polyvore_item_metadata.json: json {url_name: "bean scotch plaid shirt relaxed", description': "The same great tartan flannel as in our men's shirt, designed just for you. Relaxed Fit: Our most generous fit sits farthest from the body. Falls at low hip. etc." categories: ["Women's Fashion", "Clothing", "Tops", "L.L.Bean tops"], title: "L.L.Bean Scotch Plaid Shirt, Relaxed", related: ["Plaid shirts", "Flannel shirt", "Shirt top", "Button front shirt", "Bright shirts", "Tartan shirt"], category_id: "11", semantic_category: "tops"} Category ids can be matched via categories.csv: | category_id | sub_category|main_category | | --- | --- | | 3 | dress|all-body | | 7 | skirt|bottoms | | 11 | sweater|tops |

    Each set has information in polyvore_outfit_titles.json: json { 'url_name': "parka time is now", 'title': "Parkas" }

    Fill-in-the-blank Task

    The Fill-in-the-Blank (FITB) task is designed to evaluate how well a model understands fashion compatibility. Given a sequence of items in an outfit, the model must predict the missing (target) item from a set of candidate items.

      {
       question: Item sequence of a set,
       blank_position: Target item.
       answers: Multiple items including the target item
      }
    

    An Illustration of FITB Task

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3530207%2Fdfbcb126319732404616d40e8e4adcee%2Fpolyvore_outfit.png?generation=1741297192521567&alt=media">

    Compatibility Task

    The Compatibility Task is used to assess how well a model can determine whether a given set of fashion items are compatible with each other. The model learns an embedding space where visually and semantically similar items are placed closer together, making it possible to predict outfit compatibility effectively.

    An Illustration of Compatibility Task

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3530207%2F0773192a4e3639df5bfe3a9a583e2aad%2Fpolyvore_outfit-Page_2.png?generation=1741325987776694&alt=media">

    References:

    Citation:

    @misc{vasileva2018learningtypeawareembeddingsfashion,
       title={Learning Type-Aware Embeddings for Fashion Compatibility}, 
       author={Mariya I. Vasileva and Bryan A. Plummer and Krishna Dusad and Shreya Rajpal and Ranjitha Kumar and David Forsyth},
       year={2018},
       eprint={1803.09196},
       archivePrefix={arXiv},
       primaryClass={cs.CV},
       url={https://arxiv.org/abs/1803.09196}, 
    }
    

    License:

    Currently could not find source license from the authors about the provided dataset.

  3. R

    All_blanks_ghw Dataset

    • universe.roboflow.com
    zip
    Updated Oct 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BAT Campaign (2025). All_blanks_ghw Dataset [Dataset]. https://universe.roboflow.com/bat-campaign/all_blanks_ghw
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 19, 2025
    Dataset authored and provided by
    BAT Campaign
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    . Bounding Boxes
    Description

    All_blanks_ghw

    ## Overview
    
    All_blanks_ghw is a dataset for object detection tasks - it contains . annotations for 781 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  4. Dominion Cards

    • kaggle.com
    zip
    Updated Aug 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Stacey (2021). Dominion Cards [Dataset]. https://www.kaggle.com/adamstacey/dominion-cards
    Explore at:
    zip(213219 bytes)Available download formats
    Dataset updated
    Aug 13, 2021
    Authors
    Adam Stacey
    Description

    Context

    This dataset contains information about every Dominion card so far released, including promo cards (blue-backed randomizer cards are not included). I wanted to make the set pretty comprehensive so there is a lot of data about each card.

    Content

    Check out the metadata table to find out what each column contains.

    Acknowledgements

    Thanks to Rio Grande Games for such an amazing game, but also please stop. All these cards are super heavy!

    Inspiration

    Feel free to take a look at the tables. Please let me know if you find any mistakes. I tried to be careful but I did enter all this data manually. I plan to use R and/or Python to do some analysis of the data.

  5. g

    Chemical analysis of oil samples from the Gulf of Mexico and adjoining...

    • data.griidc.org
    • search.dataone.org
    • +1more
    Updated Jul 7, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BP Gulf Science Data (2016). Chemical analysis of oil samples from the Gulf of Mexico and adjoining states from May 2010 to March 2014 [Dataset]. http://doi.org/10.7266/N7902251
    Explore at:
    Dataset updated
    Jul 7, 2016
    Dataset provided by
    GRIIDC
    Authors
    BP Gulf Science Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Description

    This dataset contains oil chemistry data from the Deepwater Horizon (DWH) accident collected cooperatively by BP and the agencies, including agencies that serve as DWH natural resource damage trustees (Trustees). This report provides additional context for the oil chemistry dataset, including information about the collection, analysis, and organization of the data. This data posting differs from other recently published datasets relating to the Gulf of Mexico in several respects: This oil chemistry dataset focuses on information related to petroleum-related chemical constituents in oil samples and other matrices collected during studies focused primarily on the sampling of oil. Chemical compounds that are not present in Mississippi Canyon lease block 252 (MC252) oil (such as polychlorinated biphenyls, pesticides, and halogenated volatiles) are not included in this dataset. This dataset includes data from independent studies performed by BP consultants. BP has been working to produce and organize these independent data, and has engaged outside consultants to perform quality assurance and quality control (QA/QC) checks. As a result, these data have not previously been publicly accessible. This dataset combines results from 24 NRDA studies and 20 Response studies to create a unified data file. Before posting, extensive work was done independently by BP contractors to verify some aspects of the posted data (e.g., positional coordinates and field data). The chemical parameters provided in this data posting include: Parent and alkylated polycyclic aromatic hydrocarbons (PAHs) Saturated hydrocarbons (SHC) Total petroleum hydrocarbons (TPH), including parameters reported as total extractable material (TEM) and total extractable hydrocarbon (TEH) Benzene, toluene, ethylbenzene, and xylenes (BTEX) and other volatile hydrocarbons classified as paraffins, isoparaffins, aromatics, naphthenes, and olefins (PIANO) Geochemical biomarkers (sterane and triterpane), where available Dispersant markers Metals Total organic carbon. The chemical analyte lists are generally consistent between studies for the standard PAH, SHC, and BTEX compounds. However, portions of the dataset also include analysis of the extended PIANO volatile hydrocarbon list, TPH, or geochemical biomarkers. Additionally, NRDA PAH analyses include an extended list of PAHs (parent and alkylated PAHs) and non-PAH compounds (decalins, benzothiophenes, naphthobenzothiophenes, and related chemicals) that were not included in all of the Response analyses. This dataset includes data associated with natural field samples for oil and other sample matrices, along with the associated field-collected quality control samples, such as field replicates, equipment blanks, field blanks, and trip blanks. Laboratory duplicate sample data are provided, where available. MC252 control oils associated with chemistry analyses are part of a separate data posting (see Version History). No other laboratory quality control samples (e.g., laboratory blanks or spike samples) are included in this data posting. In addition to the detailed data file, a cross-tab summary version of these data is also provided (OilChemistry_O-05v01-01_xTab.zip) in Excel format. Results for each sample are presented in a single row with individual chemicals in columns. In addition, a summed concentration for total PAHs is included as “PAH50 Sum” and was calculated using 50 individual PAH results. This summary is provided to enable researchers to access this large data set in a more manageable format. A sample summary table in excel format is provided to summarize the number of analytical results for each parameter type for each sample.

  6. g

    Chemistry data associated with water column samples collected in the Gulf of...

    • data.griidc.org
    • search.dataone.org
    Updated Feb 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BP Gulf Science Data (2019). Chemistry data associated with water column samples collected in the Gulf of Mexico from May 2010 through July 2012 [Dataset]. http://doi.org/10.7266/N747489X
    Explore at:
    Dataset updated
    Feb 7, 2019
    Dataset provided by
    GRIIDC
    Authors
    BP Gulf Science Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Description

    This report contains water chemistry data, and provides additional context for the posted dataset, including information about the collection, analysis, and organization of the data. This data posting differs from other recently published datasets relating to the Gulf of Mexico in several respects: This Water Chemistry dataset focuses only on information related to petroleum-related chemical constituents in water samples. Chemical compounds that are not present in oil (such as polychlorinated biphenyls, pesticides, and halogenated volatiles) are not included in this dataset. This dataset includes data from independent studies performed by BP consultants. BP has been working to produce and organize these independent data, and has engaged outside consultants to perform quality assurance and quality control (QA/QC) checks. These data were initially made publically accessible in November 2013 and are updated with additional results in this publication to reflect the results of further QA/QC. This dataset combines results from fifty-four NRDA studies and thirteen Response studies to create a unified data file. Certain Response data have been reprocessed with lower analytical method detection limits (MDLs) than originally reported, and other data have been adjusted for surrogate recoveries as described in the documentation. The focus of this data posting is chemistry data associated with water column samples collected in both federal and state jurisdictional waters in the Gulf of Mexico from May 2010 through July 2012. More than 20,000 water samples with associated chemistry analyses collected at more than 6,300 sampling stations are included in this posting. These samples were collected during 67 studies, using more than 100 sampling cruises and surveys. These studies can be classified into four general categories: NRDA Cooperative—Studies conducted as part of the NRDA which were agreed to and executed cooperatively by the National Oceanic and Atmospheric Administration (NOAA), U.S. Department of Interior (DOI), and/or other Trustees, and BP. BP NRDA Independent—Studies conducted by BP independently to develop data to support and inform the NRDA. Trustee Independent—Studies conducted by NOAA, DOI, and/or other Trustees independently to develop data to support and inform the NRDA. Response (non-NRDA)—Studies conducted by BP and/or government representatives under the direction of the Unified Area Command and in association with activities performed in response to the DWH accident (the Response). The chemical parameters provided in this data posting include: Parent and alkylated polycyclic aromatic hydrocarbons (PAHs) Saturated hydrocarbons (SHC) Total petroleum hydrocarbons (TPH), including parameters reported as total extractable material (TEM) and total extractable hydrocarbon (TEH) Benzene, toluene, ethyl benzene, and xylenes (BTEX) and other volatile hydrocarbons classified as paraffins, isoparaffins, aromatics, naphthenes, and olefins (PIANO) Geochemical biomarkers (sterane and triterpane), where available. The chemical analyte lists are generally consistent between studies for the standard PAH, SHC, and BTEX compounds. However, portions of the dataset also include analysis of the extended PIANO volatile hydrocarbon list, TPH, or geochemical biomarkers. Additionally, NRDA PAH analyses include an extended list of parent and alkylated decalins, benzothiophenes, naphthobenzothiophenes, and several other PAHs and related chemicals that were not included in Response analyses. This dataset includes data associated with natural field samples for whole water and for Payne filter and filtrate pairs (Payne et al. 1999), along with the associated field-collected quality control samples, such as field replicates, equipment blanks, field blanks, and trip blanks. Laboratory duplicate sample data are provided, where available. Mississippi Canyon lease block 252 (MC252) control oils analyses associated with water chemistry analyses are provided separately from this data posting (see Version History). No other laboratory quality control samples (e.g., laboratory blanks and spike samples) are included in this data posting. Before posting, extensive work was done to verify some aspects of the posted information (e.g., positional coordinates and field data).

  7. 6.03 Million - Majors Questions Text Parsing And Processing Data

    • nexdata.ai
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2025). 6.03 Million - Majors Questions Text Parsing And Processing Data [Dataset]. https://www.nexdata.ai/datasets/llm/1387
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset authored and provided by
    Nexdata
    Variables measured
    Language, Data size, Data fields, Data content, Storage format, Major categories, Question type categories
    Description

    Majors Questions Text Data, About 6.03 million majors questions with explanations and without explanations combined; Each question includes question type, question, answer, and explanation, some questions may have errors in question types; majors include Party Building, Law, Engineering, Civil Service, Computer Science, Economics, Graduate Studies, Medicine, Language, Self-Study, Comprehensive and Policy Essay Writing; question types include Multiple Choice, Single Choice, True/False, Fill in the Blanks, Short Answer, and Essay; this dataset can be used for tasks such as LLM training, chatgpt

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
shape2 (2022). Fill In The Blanks Dataset [Dataset]. https://universe.roboflow.com/shape2/fill-in-the-blanks

Fill In The Blanks Dataset

fill-in-the-blanks

fill-in-the-blanks-dataset

Explore at:
zipAvailable download formats
Dataset updated
Nov 23, 2022
Dataset authored and provided by
shape2
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured
FIB Bounding Boxes
Description

Fill In The Blanks

## Overview

Fill In The Blanks is a dataset for object detection tasks - it contains FIB annotations for 1,496 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Search
Clear search
Close search
Google apps
Main menu