22 datasets found
  1. h

    SEED-Data-Edit

    • huggingface.co
    Updated May 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TencentAILab-CVC (2024). SEED-Data-Edit [Dataset]. https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit
    Explore at:
    Dataset updated
    May 1, 2024
    Dataset authored and provided by
    TencentAILab-CVC
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    SEED-Data-Edit

    SEED-Data-Edit is a hybrid dataset for instruction-guided image editing with a total of 3.7 image editing pairs, which comprises three distinct types of data: Part-1: Large-scale high-quality editing data produced by automated pipelines (3.5M editing pairs). Part-2: Real-world scenario data collected from the internet (52K editing pairs). Part-3: High-precision multi-turn editing data annotated by humans (95K editing pairs, 21K multi-turn rounds with a maximum… See the full description on the dataset page: https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit.

  2. h

    HQ-Edit

    • huggingface.co
    Updated Aug 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSC-VLAA (2024). HQ-Edit [Dataset]. https://huggingface.co/datasets/UCSC-VLAA/HQ-Edit
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2024
    Dataset authored and provided by
    UCSC-VLAA
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for HQ-EDIT

    HQ-Edit, a high-quality instruction-based image editing dataset with total 197,350 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. HQ-Edit’s high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editing… See the full description on the dataset page: https://huggingface.co/datasets/UCSC-VLAA/HQ-Edit.

  3. h

    EditReward-Data

    • huggingface.co
    Updated Oct 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TIGER-Lab (2025). EditReward-Data [Dataset]. https://huggingface.co/datasets/TIGER-Lab/EditReward-Data
    Explore at:
    Dataset updated
    Oct 13, 2025
    Dataset authored and provided by
    TIGER-Lab
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    EditReward-Data

    This repository contains EditReward-Data, a large-scale, high-fidelity human preference dataset for instruction-guided image editing. It was introduced in the paper EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing. EditReward-Data comprises over 200K manually annotated preference pairs. These annotations were meticulously curated by trained experts following a rigorous and standardized protocol, ensuring high alignment with considered… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/EditReward-Data.

  4. h

    MagicBrush

    • huggingface.co
    • kaggle.com
    Updated Jun 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OSU NLP Group (2023). MagicBrush [Dataset]. https://huggingface.co/datasets/osunlp/MagicBrush
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 18, 2023
    Dataset authored and provided by
    OSU NLP Group
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for MagicBrush

      Dataset Summary
    

    MagicBrush is the first large-scale, manually-annotated instruction-guided image editing dataset covering diverse scenarios single-turn, multi-turn, mask-provided, and mask-free editing. MagicBrush comprises 10K (source image, instruction, target image) triples, which is sufficient to train large-scale image editing models. Please check our website to explore more visual results.

      Dataset Structure
    

    "img_id" (str):… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/MagicBrush.

  5. h

    AI-human-text

    • huggingface.co
    Updated Feb 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dinh Ngoc An (2024). AI-human-text [Dataset]. https://huggingface.co/datasets/andythetechnerd03/AI-human-text
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 27, 2024
    Authors
    Dinh Ngoc An
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is a processed dataset of Human vs AI Text roughly 400k rows. This is taken from the Kaggle dataset https://www.kaggle.com/datasets/shanegerami/ai-vs-human-text/data then processed and split into training and test sets.

  6. h

    EditReward-Bench

    • huggingface.co
    Updated Oct 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TIGER-Lab (2025). EditReward-Bench [Dataset]. https://huggingface.co/datasets/TIGER-Lab/EditReward-Bench
    Explore at:
    Dataset updated
    Oct 13, 2025
    Dataset authored and provided by
    TIGER-Lab
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    EditReward-Bench: A Human-Aligned Reward Model Benchmark for Instruction-Guided Image Editing

    This repository contains EditReward-Bench, a new benchmark introduced in the paper EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing.

      Introduction
    

    Recent advances in image editing with natural language instructions have shown promising progress, particularly with closed-source models. However, open-source models often lag due to the lack of a… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/EditReward-Bench.

  7. h

    gender-bias-PE

    • huggingface.co
    Updated Oct 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FBK-MT (2024). gender-bias-PE [Dataset]. https://huggingface.co/datasets/FBK-MT/gender-bias-PE
    Explore at:
    Dataset updated
    Oct 28, 2024
    Dataset authored and provided by
    FBK-MT
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Dataset Card for gender-bias-PE data

      Dataset Description
    

    The gender-bias-PE dataset contains the post-edits and associated behavioural data of the human-centered experiments presented in the paper: What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study accepted at EMNLP 2024. The dataset allows to study the impact of gender bias in Machine Translation (MT) via human-centered measures like post-editing effort (i.e.… See the full description on the dataset page: https://huggingface.co/datasets/FBK-MT/gender-bias-PE.

  8. h

    MPII_Human_Pose_Dataset

    • huggingface.co
    Updated May 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Voxel51 (2024). MPII_Human_Pose_Dataset [Dataset]. https://huggingface.co/datasets/Voxel51/MPII_Human_Pose_Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2024
    Dataset authored and provided by
    Voxel51
    License

    https://choosealicense.com/licenses/bsd-2-clause/https://choosealicense.com/licenses/bsd-2-clause/

    Description

    Dataset Card for MPII Human Pose

    MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/MPII_Human_Pose_Dataset.

  9. h

    Wikipedia-Corpora-Report

    • huggingface.co
    Updated Oct 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saied Alshahrani (2023). Wikipedia-Corpora-Report [Dataset]. https://huggingface.co/datasets/SaiedAlshahrani/Wikipedia-Corpora-Report
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2023
    Authors
    Saied Alshahrani
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for "Wikipedia-Corpora-Report"

    This dataset is used as a metadata database for the online WIKIPEDIA CORPORA META REPORT dashboard that illustrates how humans and bots generate or edit Wikipedia editions and provides metrics for “pages” and “edits” for all Wikipedia editions (320 languages). The “pages” metric counts articles and non-articles, while the “edits” metric tallies edits on articles and non-articles, all categorized by contributor type: humans or bots. The… See the full description on the dataset page: https://huggingface.co/datasets/SaiedAlshahrani/Wikipedia-Corpora-Report.

  10. h

    Get-Real

    • huggingface.co
    Updated Oct 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    whitebox (2025). Get-Real [Dataset]. https://huggingface.co/datasets/whitebox-lm/Get-Real
    Explore at:
    Dataset updated
    Oct 14, 2025
    Authors
    whitebox
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Get Real - A dataset from Whitebox An extremely small dataset of human-like text conversations in ShareGPT format. It contains a mix of 30% human-written conversations and 70% AI-generated responses designed to mimic human style, with additional manual editing to maintain that tone. This dataset is mostly comprised of text-like conversations, but also contains usual assistant conversations too; all assistant conversations are hand-written. The main goal of this dataset is to inject some… See the full description on the dataset page: https://huggingface.co/datasets/whitebox-lm/Get-Real.

  11. h

    qe4pe

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriele Sarti, qe4pe [Dataset]. https://huggingface.co/datasets/gsarti/qe4pe
    Explore at:
    Authors
    Gabriele Sarti
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Quality Estimation for Post-Editing (QE4PE)

    For more details on QE4PE, see our paper and our Github repository Gabriele Sarti • Vilém Zouhar • Grzegorz Chrupała • Ana Guerberof Arenas • Malvina Nissim • Arianna Bisazza

    Word-level quality estimation (QE) detects erroneous spans in machine translations, which can direct and facilitate human post-editing. While the accuracy of word-level QE systems has been assessed extensively, their usability and downstream influence on the… See the full description on the dataset page: https://huggingface.co/datasets/gsarti/qe4pe.

  12. h

    TDVE-DB

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang Juntong (2025). TDVE-DB [Dataset]. https://huggingface.co/datasets/Moyao001/TDVE-DB
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Wang Juntong
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    We introduce TDVE-DB, the largest and most comprehensive benchmark dataset for text-driven video editing quality assessment, featuring 3,857 edited videos from 12 models across 8 editing categories, annotated with 173,565 human subjective ratings.

    (a) Acquisition of the source video. (b) Generation of prompt words. (c) Obtaining 170K subjective scores through subjective experiments. (d) The number of videos for different models and different editing categories. (e) Three-dimensional… See the full description on the dataset page: https://huggingface.co/datasets/Moyao001/TDVE-DB.

  13. h

    hpe2sa

    • huggingface.co
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Schmidt Nurmberg (2025). hpe2sa [Dataset]. https://huggingface.co/datasets/rsn86/hpe2sa
    Explore at:
    Dataset updated
    Sep 6, 2025
    Authors
    Rodrigo Schmidt Nurmberg
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    hpe2sa: Human Post-Editing with Error Span Annotations 🤌🍕

    No momento apenas um placeholder enquanto o repositório privado é preparado para liberação. Github repository

  14. h

    mets

    • huggingface.co
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Black (2025). mets [Dataset]. https://huggingface.co/datasets/AlexBlck/mets
    Explore at:
    Dataset updated
    Oct 8, 2025
    Authors
    Alexander Black
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for METS (Multiple Edits and Textual Summaries)

      Dataset Summary
    

    METS (Multiple Edits and Textual Summaries) is a dataset of image editing sequences with human-annotated textual summaries describing the differences between original and edited images. The dataset captures cumulative changes after sequences of manipulations, providing ground truth for image difference captioning tasks. METS contains images that have undergone 5, 10, or 15 sequential edits, with… See the full description on the dataset page: https://huggingface.co/datasets/AlexBlck/mets.

  15. arena-human-preference-100k

    • huggingface.co
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMArena (2025). arena-human-preference-100k [Dataset]. https://huggingface.co/datasets/lmarena-ai/arena-human-preference-100k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 26, 2025
    Dataset authored and provided by
    LMArenahttps://lmarena.ai/
    Description

    Overview

    This dataset contains leaderboard conversation data collected between June 2024 and August 2024. It includes English human preference evaluations used to develop Arena Explorer. Additionally, we provide an embedding file, which contains precomputed embeddings for the English conversations. These embeddings are used in the topic modeling pipeline to categorize and analyze these conversations. For a detailed exploration of the dataset and analysis methods, refer to the… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/arena-human-preference-100k.

  16. h

    Nano-banana-150k

    • huggingface.co
    Updated Oct 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BitMind (2025). Nano-banana-150k [Dataset]. https://huggingface.co/datasets/bitmind/Nano-banana-150k
    Explore at:
    Dataset updated
    Oct 13, 2025
    Dataset authored and provided by
    BitMind
    Description

    Nano-consistent-150k. — the first dataset constructed using Nano-Banana that exceeds 150k high-quality samples, uniquely designed to preserve consistent human identity across diverse and complex editing scenarios

  17. h

    IWSLT_2016

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FrancophonIA, IWSLT_2016 [Dataset]. https://huggingface.co/datasets/FrancophonIA/IWSLT_2016
    Explore at:
    Dataset authored and provided by
    FrancophonIA
    Description

    [!NOTE] Dataset origin: https://live.european-language-grid.eu/catalogue/corpus/709/

      Description
    

    The human evaluation (HE) dataset created for English to German (EnDe) and English to French (EnFr) MT tasks was a subset of one of the official test sets of the IWSLT 2016 evaluation campaign. The resulting HE sets are composed of 600 segments for both EnDe and EnFr, each corresponding to around 10,000 words. Human evaluation was based on Post-Editing, i.e. the manual correction of… See the full description on the dataset page: https://huggingface.co/datasets/FrancophonIA/IWSLT_2016.

  18. h

    peer_qt21-de-en-pe

    • huggingface.co
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jannis Vamvas (2025). peer_qt21-de-en-pe [Dataset]. https://huggingface.co/datasets/jvamvas/peer_qt21-de-en-pe
    Explore at:
    Dataset updated
    Sep 16, 2025
    Authors
    Jannis Vamvas
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    QT21 De-En Task from the PEER Benchmark (Performance Evaluation of Edit Representations)

    Description from the benchmark paper:

    We work with the German-English portion of the QT21 dataset (Specia et al. 2017), which originally contains a total of 43,000 examples of machine translation human post-edits. The machine translation output over which post-editing is performed to create this dataset is an implementation of the attentional encoder-decoder architecture and uses byte-pair… See the full description on the dataset page: https://huggingface.co/datasets/jvamvas/peer_qt21-de-en-pe.

  19. h

    Housestyle_Training

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex B, Housestyle_Training [Dataset]. https://huggingface.co/datasets/Reszi/Housestyle_Training
    Explore at:
    Authors
    Alex B
    Description

    Instruction: Edit this text so that it follows the house style.

    Human:

    Headline: International Swaps and Derivatives Association Announces New Guidelines for Derivatives Trading The International Swaps and Derivatives Association (ISDA) announced on 3/18/2023, that it has introduced new guidelines for derivatives trading. The ISDA has been working tirelessly to ensure that the derivatives market is transparent, fair, and effective. The International Swaps and Derivatives Association… See the full description on the dataset page: https://huggingface.co/datasets/Reszi/Housestyle_Training.

  20. h

    after_visit_summary_simulated_edits

    • huggingface.co
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sai (2025). after_visit_summary_simulated_edits [Dataset]. https://huggingface.co/datasets/PrabhakarSai/after_visit_summary_simulated_edits
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2025
    Authors
    Sai
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card: AVS edits Dataset

      Dataset Summary
    

    The AVS edits dataset is designed to support human feedback research in for clinical summarization. It contains synthetic edit feedback generated by large language models (LLMs) to improve the factual consistency and quality of summaries. The dataset includes training, evaluation, and test splits with specific fields for modeling and evaluation tasks.

      Dataset Structure
    
    
    
    
    
      Train Split
    

    Keys: article: The… See the full description on the dataset page: https://huggingface.co/datasets/PrabhakarSai/after_visit_summary_simulated_edits.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
TencentAILab-CVC (2024). SEED-Data-Edit [Dataset]. https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit

SEED-Data-Edit

AILab-CVC/SEED-Data-Edit

Explore at:
34 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 1, 2024
Dataset authored and provided by
TencentAILab-CVC
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

SEED-Data-Edit

SEED-Data-Edit is a hybrid dataset for instruction-guided image editing with a total of 3.7 image editing pairs, which comprises three distinct types of data: Part-1: Large-scale high-quality editing data produced by automated pipelines (3.5M editing pairs). Part-2: Real-world scenario data collected from the internet (52K editing pairs). Part-3: High-precision multi-turn editing data annotated by humans (95K editing pairs, 21K multi-turn rounds with a maximum… See the full description on the dataset page: https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit.

Search
Clear search
Close search
Google apps
Main menu