20 datasets found

c
Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz...
repository.cam.ac.uk
zip
Updated Jul 10, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Budzianowski, Paweł; Mihail, Eric; Rahul, Goel; Shachi, Paul; Sethi, Abhishek; Agarwal, Sanchit; Gao, Shuyag; Hakkani-Tur, Dilek (2019). Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling" [Dataset]. http://doi.org/10.17863/CAM.41572
Explore at:
zip(13794372 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.41572
Dataset updated
Jul 10, 2019
Dataset provided by
Apollo
University of Cambridge
Authors
Budzianowski, Paweł; Mihail, Eric; Rahul, Goel; Shachi, Paul; Sethi, Abhishek; Agarwal, Sanchit; Gao, Shuyag; Hakkani-Tur, Dilek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains the following json files: 1. data.json: the woz dialogue dataset, which contains the conversation users and wizards, as well as a set of coarse labels for each user turn. 2. restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes. 3. attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes. 4. hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes. 5. train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes. 6. hospital_db.json: the Cambridge hospital database file, contatining information about departments. 7. police_db.json: the Cambridge police station information. 8. taxi_db.json: slot-value list for taxi domain. 9. valListFile.json: list of dialogues for validation. 10. testListFile.json: list of dialogues for testing. 11. system_acts.json: system acts annotations 12. ontology.json: Data-based ontology.

Important note: This dataset was previously entitled 'Research data supporting "MultiWOZ 2.1 - Multi-Domain Dialogue State Corrections and State Tracking Baselines"'. The change to the current title of this dataset was made at the request of the authors in July 2019.
h
TOD-Multiwoz
huggingface.co
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Việt Nam Tự Cường 2 (2025). TOD-Multiwoz [Dataset]. https://huggingface.co/datasets/vntc2/TOD-Multiwoz
Explore at:
Dataset updated
Apr 19, 2025
Dataset authored and provided by
Việt Nam Tự Cường 2
Description
vntc2/TOD-Multiwoz dataset hosted on Hugging Face and contributed by the HF Datasets community
t
MultiWOZ 2.1 - Dataset - LDM
service.tib.eu
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MultiWOZ 2.1 - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/multiwoz-2-1
Explore at:
Dataset updated
Nov 25, 2024
Description
The original MultiWOZ 2.1 dataset is a crowdsourced multi-domain dataset for task-oriented dialogue.
h
hr_multiwoz_tod_sgd
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weijie Xu, hr_multiwoz_tod_sgd [Dataset]. https://huggingface.co/datasets/xwjzds/hr_multiwoz_tod_sgd
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Weijie Xu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card

HR-Multiwoz is a fully-labeled dataset of 550 conversations spanning 10 HR domains to evaluate LLM Agent. It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. Please refer to HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent for details about the dataset construction.

Dataset Sources

Repository: xwjzds/extractive_qa_question_answering_hr Paper: HR-MultiWOZ: A Task Oriented Dialogue (TOD)… See the full description on the dataset page: https://huggingface.co/datasets/xwjzds/hr_multiwoz_tod_sgd.
t
MULTIWOZ-ENTR - Dataset - LDM
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MULTIWOZ-ENTR - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/multiwoz-entr
Explore at:
Dataset updated
Dec 2, 2024
Description
A new dataset named MULTIWOZ-ENTR, specifically designed for studying lexical entrainment (LE) in conversational systems.
h
MultiWOZ-2.1
huggingface.co
Updated Apr 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepPavlov (2025). MultiWOZ-2.1 [Dataset]. https://huggingface.co/datasets/DeepPavlov/MultiWOZ-2.1
Explore at:
Dataset updated
Apr 13, 2025
Dataset authored and provided by
DeepPavlov
Description
DeepPavlov/MultiWOZ-2.1 dataset hosted on Hugging Face and contributed by the HF Datasets community
O
MultiWOZ-corefMultiWOZ 2.3
opendatalab.com
zip
Updated Mar 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tsinghua University (2023). MultiWOZ-corefMultiWOZ 2.3 [Dataset]. https://opendatalab.com/OpenDataLab/MultiWOZ-coref
Explore at:
zipAvailable download formats
Dataset updated
Mar 17, 2023
Dataset provided by
University of Illinois at Chicago
Huawei
Tsinghua University
Description
MultiWOZ-coref, (or MultiWOZ 2.3) is an extension of the MultiWOZ dataset that adds co-reference annotations in addition to corrections of dialogue acts and dialogue states.
h
MultiWOZ-reponse-choice
huggingface.co
Updated Jul 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniil Larionov (2025). MultiWOZ-reponse-choice [Dataset]. https://huggingface.co/datasets/Rexhaif/MultiWOZ-reponse-choice
Explore at:
Dataset updated
Jul 27, 2025
Authors
Daniil Larionov
Description
Rexhaif/MultiWOZ-reponse-choice dataset hosted on Hugging Face and contributed by the HF Datasets community
Data from: EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion...
zenodo.org
bin, json
Updated May 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shutong Feng; Nurul Fithria Lubis; Nurul Fithria Lubis; Christian Geishauser; Hsien-Chin Lin; Michael Heck; Michael Heck; Carel van Niekerk; Milica Gašić; Milica Gašić; Shutong Feng; Christian Geishauser; Hsien-Chin Lin; Carel van Niekerk (2022). EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems [Dataset]. http://doi.org/10.5281/zenodo.6506504
Explore at:
json, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6506504
Dataset updated
May 18, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shutong Feng; Nurul Fithria Lubis; Nurul Fithria Lubis; Christian Geishauser; Hsien-Chin Lin; Michael Heck; Michael Heck; Carel van Niekerk; Milica Gašić; Milica Gašić; Shutong Feng; Christian Geishauser; Hsien-Chin Lin; Carel van Niekerk
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This is the dataset created for the paper, "EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems" (https://arxiv.org/abs/2109.04919).

EmoWOZ is based on MultiWOZ, a multi-domain task-oriented dialogue dataset (https://github.com/budzianowski/multiwoz). It contains more than 11K task-oriented dialogues with more than 83K emotion annotations of user utterances. In addition to Wizard-of-Oz dialogues from MultiWOZ, we collect human-machine dialogues within the same set of domains to sufficiently cover the space of various emotions that can happen during the lifetime of a data-driven dialogue system. There are 7 emotion labels, which are adapted from the OCC emotion models.

For data format and label definition, please refer to README.md.
Data from: Multi Woz
kaggle.com
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lê Văn Tuấn Anh (2025). Multi Woz [Dataset]. https://www.kaggle.com/datasets/lvntunanh/multi-woz/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lê Văn Tuấn Anh
Description
Dataset

This dataset was created by Lê Văn Tuấn Anh

Contents
h
extractive_qa_question_answering_hr
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weijie Xu, extractive_qa_question_answering_hr [Dataset]. https://huggingface.co/datasets/xwjzds/extractive_qa_question_answering_hr
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Weijie Xu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card

HR-Multiwoz is a fully-labeled dataset of 5980 extractive qa spanning 10 HR domains to evaluate LLM Agent. It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. Please refer to HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent for details about the dataset construction.

Dataset Sources

Repository: xwjzds/extractive_qa_question_answering_hr Paper: HR-MultiWOZ: A Task Oriented Dialogue (TOD)… See the full description on the dataset page: https://huggingface.co/datasets/xwjzds/extractive_qa_question_answering_hr.
h
multiwoz-chat
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajesh Manickadas, multiwoz-chat [Dataset]. https://huggingface.co/datasets/gisako/multiwoz-chat
Explore at:
Authors
Rajesh Manickadas
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
gisako/multiwoz-chat dataset hosted on Hugging Face and contributed by the HF Datasets community
c
Research data supporting "Large-Scale Multi-Domain Belief Tracking with...
repository.cam.ac.uk
zip
Updated Aug 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Budzianowski, PF; Ramadan, Osman; Gasic, Milica (2018). Research data supporting "Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing" [Dataset]. http://doi.org/10.17863/CAM.26059
Explore at:
zip(12639800 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.26059
Dataset updated
Aug 9, 2018
Dataset provided by
Apollo
University of Cambridge
Authors
Budzianowski, PF; Ramadan, Osman; Gasic, Milica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a collection of human-human written conversations spanning over multiple domains and topics. The dataset was collected based on the Wizard of Oz experiment on Amazon MTurk. Each dialogue contains a goal label and several exchanges between a visitor and the system. Each system turn has labels from the set of slot-value pairs representing a coarse representation of dialogue state. There are in total 9855 dialogues.
h
manifest_self_labelled_tod_zero_bqag3oyb_8324_htb95VGQDY1x
huggingface.co
Updated Oct 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brendan King (he/him) (2024). manifest_self_labelled_tod_zero_bqag3oyb_8324_htb95VGQDY1x [Dataset]. https://huggingface.co/datasets/Brendan/manifest_self_labelled_tod_zero_bqag3oyb_8324_htb95VGQDY1x
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 3, 2024
Authors
Brendan King (he/him)
Description
Self-Labelled MultiWOZ Dataset

Manifest Group: tod_zero_bqag3oyb Number of Dialogues: 8324 Number of Turns: 56344 This dataset was created via a self-labelling process composed of multiple runs, in which un-labelled dialogue data (utterances from the user and system only) is labelled with pseudo-annotations for the belief state, system acts, and de-lexicalized system response. Here is the list of W&B runs contributing to this dataset. Each run is a self-labelling run, and should be… See the full description on the dataset page: https://huggingface.co/datasets/Brendan/manifest_self_labelled_tod_zero_bqag3oyb_8324_htb95VGQDY1x.
h
emowoz
huggingface.co
Updated Jul 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heinrich-Heine-Universität Düsseldorf, Dialog Systems and Machine Learning (2023). emowoz [Dataset]. https://huggingface.co/datasets/hhu-dsml/emowoz
Explore at:
Dataset updated
Jul 20, 2023
Dataset authored and provided by
Heinrich-Heine-Universität Düsseldorf, Dialog Systems and Machine Learning
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
EmoWOZ is a user emotion recognition in task-oriented dialogues dataset, consisting all dialogues from MultiWOZ and 1000 additional human-machine dialogues (DialMAGE). Each user utterance is annotated with one of the following emotions: 0: neutral, 1: fearful, 2: dissatisfied, 3: apologetic, 4: abusive, 5: excited, 6: satisfied. System utterances are annotated with -1. For detailed label design and explanation, please refer to the paper and dataset homepage.
h
manifest_self_labelled_initial_labelling_7800_FRnXLaU7I_EX
huggingface.co
Updated Nov 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brendan King (he/him) (2024). manifest_self_labelled_initial_labelling_7800_FRnXLaU7I_EX [Dataset]. https://huggingface.co/datasets/Brendan/manifest_self_labelled_initial_labelling_7800_FRnXLaU7I_EX
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2024
Authors
Brendan King (he/him)
Description
Self-Labelled MultiWOZ Dataset

Manifest Group: initial_labelling Number of Dialogues: 7800 Number of Turns: 52897 This dataset was created via a self-labelling process composed of multiple runs, in which un-labelled dialogue data (utterances from the user and system only) is labelled with pseudo-annotations for the belief state, system acts, and de-lexicalized system response. Here is the list of W&B runs contributing to this dataset. Each run is a self-labelling run, and should be… See the full description on the dataset page: https://huggingface.co/datasets/Brendan/manifest_self_labelled_initial_labelling_7800_FRnXLaU7I_EX.
O
MobileCS
opendatalab.com
zip
Updated Jan 22, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tsinghua University (2007). MobileCS [Dataset]. https://opendatalab.com/OpenDataLab/MobileCS
Explore at:
zip(58576 bytes)Available download formats
Dataset updated
Jan 22, 2007
Dataset provided by
China Mobile
Tsinghua University
Description
We introduce a new shared task for challenge, aiming to benchmark semi-supervised and reinforced task-oriented dialog systems, built for automated customer-service for mobile operators. The task consists of two tracks:Information extraction from dialog transcripts (Track 1)Task-oriented dialog systems (Track 2)An important feature for this shared task is that we release around 100K dialogs (in Chinese), which come from real-world dialog transcripts between real users and customer-service staffs from China Mobile, with privacy information anonymized. We call this dataset as MobileCS (mobile customer-service) dialog dataset, which differs from existing TOD datasets in both size and nature significantly.To the best of our knowledge, MobileCS is not only the largest publicly available multi-domain TOD dataset, but also consists of real-life human-to-human data (namely collected in real-world scenarios). For comparison, the widely used MultiWOZ dataset consists of 10K dialogs and is in fact simulated data (namely collected in a Wizard-of-Oz simulated game).
h
multiwoz_all_versions
huggingface.co
Updated Nov 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pietro Lesci (2024). multiwoz_all_versions [Dataset]. https://huggingface.co/datasets/pietrolesci/multiwoz_all_versions
Explore at:
Dataset updated
Nov 12, 2024
Authors
Pietro Lesci
Description
This dataset is based on the "cumulative" configuration (i.e., slots previously filled are kept in the state) of the MultiWoz 2.2 dataset available also at pfb30/multi_woz_v22. Therefore, the system and user utterances, the active intents, and the services are exactly the same. In addition to the data present in version 2.2, this dataset contains the annotations from versions 2.1, 2.3, and 2.4 for each dialogue turn. This dataset is an artefact of the paper Diable: Efficient Dialogue State… See the full description on the dataset page: https://huggingface.co/datasets/pietrolesci/multiwoz_all_versions.
O
FusedChat
opendatalab.com
zip
Updated Sep 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nanyang Technological University (2021). FusedChat [Dataset]. https://opendatalab.com/OpenDataLab/FusedChat
Explore at:
zip(338157797 bytes)Available download formats
Dataset updated
Sep 19, 2021
Dataset provided by
Nanyang Technological University
National University of Singapore
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
FusedChat is an inter-mode dialogue dataset. It contains dialogue sessions fusing task-oriented dialogues (TOD) and open-domain dialogues (ODD). Based on MultiWOZ, FusedChat appends or prepends an ODD to every existing TOD. See more details in the paper.
h
SPADE-customer-service-dialogue
huggingface.co
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angela Yuan (2025). SPADE-customer-service-dialogue [Dataset]. https://huggingface.co/datasets/AngieYYF/SPADE-customer-service-dialogue
Explore at:
Dataset updated
Feb 5, 2025
Authors
Angela Yuan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
SPADE: Structured Prompting Augmentation for Dialogue Enhancement in Machine-Generated Text Detection

Paper | Code

SPADE contains a repository of customer service line synthetic user dialogues with goals, augmented from MultiWOZ 2.1 using GPT-3.5 and Llama 70B. The datasets are intended for training and evaluating machine generated text detectors in dialogue settings. There are 15 English datasets generated using 5 different augmentation methods and 2 large language models… See the full description on the dataset page: https://huggingface.co/datasets/AngieYYF/SPADE-customer-service-dialogue.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Budzianowski, Paweł; Mihail, Eric; Rahul, Goel; Shachi, Paul; Sethi, Abhishek; Agarwal, Sanchit; Gao, Shuyag; Hakkani-Tur, Dilek (2019). Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling" [Dataset]. http://doi.org/10.17863/CAM.41572

Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling"

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

zip(13794372 bytes)Available download formats

Unique identifier

https://doi.org/10.17863/CAM.41572

Dataset updated

Jul 10, 2019

Dataset provided by

Apollo
University of Cambridge

Authors

Budzianowski, Paweł; Mihail, Eric; Rahul, Goel; Shachi, Paul; Sethi, Abhishek; Agarwal, Sanchit; Gao, Shuyag; Hakkani-Tur, Dilek

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset contains the following json files: 1. data.json: the woz dialogue dataset, which contains the conversation users and wizards, as well as a set of coarse labels for each user turn. 2. restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes. 3. attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes. 4. hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes. 5. train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes. 6. hospital_db.json: the Cambridge hospital database file, contatining information about departments. 7. police_db.json: the Cambridge police station information. 8. taxi_db.json: slot-value list for taxi domain. 9. valListFile.json: list of dialogues for validation. 10. testListFile.json: list of dialogues for testing. 11. system_acts.json: system acts annotations 12. ontology.json: Data-based ontology.

Important note: This dataset was previously entitled 'Research data supporting "MultiWOZ 2.1 - Multi-Domain Dialogue State Corrections and State Tracking Baselines"'. The change to the current title of this dataset was made at the request of the authors in July 2019.

Clear search

Close search

Google apps

Main menu

Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz...

TOD-Multiwoz

MultiWOZ 2.1 - Dataset - LDM

hr_multiwoz_tod_sgd

MULTIWOZ-ENTR - Dataset - LDM

MultiWOZ-2.1

MultiWOZ-corefMultiWOZ 2.3

MultiWOZ-reponse-choice

Data from: EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion...

Data from: Multi Woz

Dataset

Contents

extractive_qa_question_answering_hr

multiwoz-chat

Research data supporting "Large-Scale Multi-Domain Belief Tracking with...

manifest_self_labelled_tod_zero_bqag3oyb_8324_htb95VGQDY1x

emowoz

manifest_self_labelled_initial_labelling_7800_FRnXLaU7I_EX

MobileCS

multiwoz_all_versions

FusedChat

SPADE-customer-service-dialogue

Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling"See More Versions

Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling"