Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains the following json files: 1. data.json: the woz dialogue dataset, which contains the conversation users and wizards, as well as a set of coarse labels for each user turn. 2. restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes. 3. attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes. 4. hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes. 5. train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes. 6. hospital_db.json: the Cambridge hospital database file, contatining information about departments. 7. police_db.json: the Cambridge police station information. 8. taxi_db.json: slot-value list for taxi domain. 9. valListFile.json: list of dialogues for validation. 10. testListFile.json: list of dialogues for testing. 11. system_acts.json: system acts annotations 12. ontology.json: Data-based ontology.
Important note: This dataset was previously entitled 'Research data supporting "MultiWOZ 2.1 - Multi-Domain Dialogue State Corrections and State Tracking Baselines"'. The change to the current title of this dataset was made at the request of the authors in July 2019.
vntc2/TOD-Multiwoz dataset hosted on Hugging Face and contributed by the HF Datasets community
The original MultiWOZ 2.1 dataset is a crowdsourced multi-domain dataset for task-oriented dialogue.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card
HR-Multiwoz is a fully-labeled dataset of 550 conversations spanning 10 HR domains to evaluate LLM Agent. It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. Please refer to HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent for details about the dataset construction.
Dataset Sources
Repository: xwjzds/extractive_qa_question_answering_hr Paper: HR-MultiWOZ: A Task Oriented Dialogue (TOD)… See the full description on the dataset page: https://huggingface.co/datasets/xwjzds/hr_multiwoz_tod_sgd.
A new dataset named MULTIWOZ-ENTR, specifically designed for studying lexical entrainment (LE) in conversational systems.
DeepPavlov/MultiWOZ-2.1 dataset hosted on Hugging Face and contributed by the HF Datasets community
MultiWOZ-coref, (or MultiWOZ 2.3) is an extension of the MultiWOZ dataset that adds co-reference annotations in addition to corrections of dialogue acts and dialogue states.
Rexhaif/MultiWOZ-reponse-choice dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is the dataset created for the paper, "EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems" (https://arxiv.org/abs/2109.04919).
EmoWOZ is based on MultiWOZ, a multi-domain task-oriented dialogue dataset (https://github.com/budzianowski/multiwoz). It contains more than 11K task-oriented dialogues with more than 83K emotion annotations of user utterances. In addition to Wizard-of-Oz dialogues from MultiWOZ, we collect human-machine dialogues within the same set of domains to sufficiently cover the space of various emotions that can happen during the lifetime of a data-driven dialogue system. There are 7 emotion labels, which are adapted from the OCC emotion models.
For data format and label definition, please refer to README.md.
This dataset was created by Lê Văn Tuấn Anh
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card
HR-Multiwoz is a fully-labeled dataset of 5980 extractive qa spanning 10 HR domains to evaluate LLM Agent. It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. Please refer to HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent for details about the dataset construction.
Dataset Sources
Repository: xwjzds/extractive_qa_question_answering_hr Paper: HR-MultiWOZ: A Task Oriented Dialogue (TOD)… See the full description on the dataset page: https://huggingface.co/datasets/xwjzds/extractive_qa_question_answering_hr.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
gisako/multiwoz-chat dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a collection of human-human written conversations spanning over multiple domains and topics. The dataset was collected based on the Wizard of Oz experiment on Amazon MTurk. Each dialogue contains a goal label and several exchanges between a visitor and the system. Each system turn has labels from the set of slot-value pairs representing a coarse representation of dialogue state. There are in total 9855 dialogues.
Self-Labelled MultiWOZ Dataset
Manifest Group: tod_zero_bqag3oyb Number of Dialogues: 8324 Number of Turns: 56344 This dataset was created via a self-labelling process composed of multiple runs, in which un-labelled dialogue data (utterances from the user and system only) is labelled with pseudo-annotations for the belief state, system acts, and de-lexicalized system response. Here is the list of W&B runs contributing to this dataset. Each run is a self-labelling run, and should be… See the full description on the dataset page: https://huggingface.co/datasets/Brendan/manifest_self_labelled_tod_zero_bqag3oyb_8324_htb95VGQDY1x.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
EmoWOZ is a user emotion recognition in task-oriented dialogues dataset, consisting all dialogues from MultiWOZ and 1000 additional human-machine dialogues (DialMAGE). Each user utterance is annotated with one of the following emotions: 0: neutral, 1: fearful, 2: dissatisfied, 3: apologetic, 4: abusive, 5: excited, 6: satisfied. System utterances are annotated with -1. For detailed label design and explanation, please refer to the paper and dataset homepage.
Self-Labelled MultiWOZ Dataset
Manifest Group: initial_labelling Number of Dialogues: 7800 Number of Turns: 52897 This dataset was created via a self-labelling process composed of multiple runs, in which un-labelled dialogue data (utterances from the user and system only) is labelled with pseudo-annotations for the belief state, system acts, and de-lexicalized system response. Here is the list of W&B runs contributing to this dataset. Each run is a self-labelling run, and should be… See the full description on the dataset page: https://huggingface.co/datasets/Brendan/manifest_self_labelled_initial_labelling_7800_FRnXLaU7I_EX.
We introduce a new shared task for challenge, aiming to benchmark semi-supervised and reinforced task-oriented dialog systems, built for automated customer-service for mobile operators. The task consists of two tracks:Information extraction from dialog transcripts (Track 1)Task-oriented dialog systems (Track 2)An important feature for this shared task is that we release around 100K dialogs (in Chinese), which come from real-world dialog transcripts between real users and customer-service staffs from China Mobile, with privacy information anonymized. We call this dataset as MobileCS (mobile customer-service) dialog dataset, which differs from existing TOD datasets in both size and nature significantly.To the best of our knowledge, MobileCS is not only the largest publicly available multi-domain TOD dataset, but also consists of real-life human-to-human data (namely collected in real-world scenarios). For comparison, the widely used MultiWOZ dataset consists of 10K dialogs and is in fact simulated data (namely collected in a Wizard-of-Oz simulated game).
This dataset is based on the "cumulative" configuration (i.e., slots previously filled are kept in the state) of the MultiWoz 2.2 dataset available also at pfb30/multi_woz_v22. Therefore, the system and user utterances, the active intents, and the services are exactly the same. In addition to the data present in version 2.2, this dataset contains the annotations from versions 2.1, 2.3, and 2.4 for each dialogue turn. This dataset is an artefact of the paper Diable: Efficient Dialogue State… See the full description on the dataset page: https://huggingface.co/datasets/pietrolesci/multiwoz_all_versions.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
FusedChat is an inter-mode dialogue dataset. It contains dialogue sessions fusing task-oriented dialogues (TOD) and open-domain dialogues (ODD). Based on MultiWOZ, FusedChat appends or prepends an ODD to every existing TOD. See more details in the paper.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
SPADE: Structured Prompting Augmentation for Dialogue Enhancement in Machine-Generated Text Detection
Paper | Code
SPADE contains a repository of customer service line synthetic user dialogues with goals, augmented from MultiWOZ 2.1 using GPT-3.5 and Llama 70B. The datasets are intended for training and evaluating machine generated text detectors in dialogue settings. There are 15 English datasets generated using 5 different augmentation methods and 2 large language models… See the full description on the dataset page: https://huggingface.co/datasets/AngieYYF/SPADE-customer-service-dialogue.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains the following json files: 1. data.json: the woz dialogue dataset, which contains the conversation users and wizards, as well as a set of coarse labels for each user turn. 2. restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes. 3. attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes. 4. hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes. 5. train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes. 6. hospital_db.json: the Cambridge hospital database file, contatining information about departments. 7. police_db.json: the Cambridge police station information. 8. taxi_db.json: slot-value list for taxi domain. 9. valListFile.json: list of dialogues for validation. 10. testListFile.json: list of dialogues for testing. 11. system_acts.json: system acts annotations 12. ontology.json: Data-based ontology.
Important note: This dataset was previously entitled 'Research data supporting "MultiWOZ 2.1 - Multi-Domain Dialogue State Corrections and State Tracking Baselines"'. The change to the current title of this dataset was made at the request of the authors in July 2019.