20 datasets found
  1. c

    Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz...

    • repository.cam.ac.uk
    zip
    Updated Jul 10, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Budzianowski, Paweł; Mihail, Eric; Rahul, Goel; Shachi, Paul; Sethi, Abhishek; Agarwal, Sanchit; Gao, Shuyag; Hakkani-Tur, Dilek (2019). Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling" [Dataset]. http://doi.org/10.17863/CAM.41572
    Explore at:
    zip(13794372 bytes)Available download formats
    Dataset updated
    Jul 10, 2019
    Dataset provided by
    Apollo
    University of Cambridge
    Authors
    Budzianowski, Paweł; Mihail, Eric; Rahul, Goel; Shachi, Paul; Sethi, Abhishek; Agarwal, Sanchit; Gao, Shuyag; Hakkani-Tur, Dilek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the following json files: 1. data.json: the woz dialogue dataset, which contains the conversation users and wizards, as well as a set of coarse labels for each user turn. 2. restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes. 3. attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes. 4. hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes. 5. train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes. 6. hospital_db.json: the Cambridge hospital database file, contatining information about departments. 7. police_db.json: the Cambridge police station information. 8. taxi_db.json: slot-value list for taxi domain. 9. valListFile.json: list of dialogues for validation. 10. testListFile.json: list of dialogues for testing. 11. system_acts.json: system acts annotations 12. ontology.json: Data-based ontology.

    Important note: This dataset was previously entitled 'Research data supporting "MultiWOZ 2.1 - Multi-Domain Dialogue State Corrections and State Tracking Baselines"'. The change to the current title of this dataset was made at the request of the authors in July 2019.

  2. h

    TOD-Multiwoz

    • huggingface.co
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Việt Nam Tự Cường 2 (2025). TOD-Multiwoz [Dataset]. https://huggingface.co/datasets/vntc2/TOD-Multiwoz
    Explore at:
    Dataset updated
    Apr 19, 2025
    Dataset authored and provided by
    Việt Nam Tự Cường 2
    Description

    vntc2/TOD-Multiwoz dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. t

    MultiWOZ 2.1 - Dataset - LDM

    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MultiWOZ 2.1 - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/multiwoz-2-1
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    The original MultiWOZ 2.1 dataset is a crowdsourced multi-domain dataset for task-oriented dialogue.

  4. h

    hr_multiwoz_tod_sgd

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weijie Xu, hr_multiwoz_tod_sgd [Dataset]. https://huggingface.co/datasets/xwjzds/hr_multiwoz_tod_sgd
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Weijie Xu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card

    HR-Multiwoz is a fully-labeled dataset of 550 conversations spanning 10 HR domains to evaluate LLM Agent. It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. Please refer to HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent for details about the dataset construction.

      Dataset Sources
    

    Repository: xwjzds/extractive_qa_question_answering_hr Paper: HR-MultiWOZ: A Task Oriented Dialogue (TOD)… See the full description on the dataset page: https://huggingface.co/datasets/xwjzds/hr_multiwoz_tod_sgd.

  5. t

    MULTIWOZ-ENTR - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MULTIWOZ-ENTR - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/multiwoz-entr
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    A new dataset named MULTIWOZ-ENTR, specifically designed for studying lexical entrainment (LE) in conversational systems.

  6. h

    MultiWOZ-2.1

    • huggingface.co
    Updated Apr 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepPavlov (2025). MultiWOZ-2.1 [Dataset]. https://huggingface.co/datasets/DeepPavlov/MultiWOZ-2.1
    Explore at:
    Dataset updated
    Apr 13, 2025
    Dataset authored and provided by
    DeepPavlov
    Description

    DeepPavlov/MultiWOZ-2.1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. O

    MultiWOZ-corefMultiWOZ 2.3

    • opendatalab.com
    zip
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tsinghua University (2023). MultiWOZ-corefMultiWOZ 2.3 [Dataset]. https://opendatalab.com/OpenDataLab/MultiWOZ-coref
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 17, 2023
    Dataset provided by
    University of Illinois at Chicago
    Huawei
    Tsinghua University
    Description

    MultiWOZ-coref, (or MultiWOZ 2.3) is an extension of the MultiWOZ dataset that adds co-reference annotations in addition to corrections of dialogue acts and dialogue states.

  8. h

    MultiWOZ-reponse-choice

    • huggingface.co
    Updated Jul 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniil Larionov (2025). MultiWOZ-reponse-choice [Dataset]. https://huggingface.co/datasets/Rexhaif/MultiWOZ-reponse-choice
    Explore at:
    Dataset updated
    Jul 27, 2025
    Authors
    Daniil Larionov
    Description

    Rexhaif/MultiWOZ-reponse-choice dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. Data from: EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion...

    • zenodo.org
    bin, json
    Updated May 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shutong Feng; Nurul Fithria Lubis; Nurul Fithria Lubis; Christian Geishauser; Hsien-Chin Lin; Michael Heck; Michael Heck; Carel van Niekerk; Milica Gašić; Milica Gašić; Shutong Feng; Christian Geishauser; Hsien-Chin Lin; Carel van Niekerk (2022). EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems [Dataset]. http://doi.org/10.5281/zenodo.6506504
    Explore at:
    json, binAvailable download formats
    Dataset updated
    May 18, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shutong Feng; Nurul Fithria Lubis; Nurul Fithria Lubis; Christian Geishauser; Hsien-Chin Lin; Michael Heck; Michael Heck; Carel van Niekerk; Milica Gašić; Milica Gašić; Shutong Feng; Christian Geishauser; Hsien-Chin Lin; Carel van Niekerk
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This is the dataset created for the paper, "EmoWOZ: A Large-Scale Corpus and Labelling Scheme for Emotion Recognition in Task-Oriented Dialogue Systems" (https://arxiv.org/abs/2109.04919).

    EmoWOZ is based on MultiWOZ, a multi-domain task-oriented dialogue dataset (https://github.com/budzianowski/multiwoz). It contains more than 11K task-oriented dialogues with more than 83K emotion annotations of user utterances. In addition to Wizard-of-Oz dialogues from MultiWOZ, we collect human-machine dialogues within the same set of domains to sufficiently cover the space of various emotions that can happen during the lifetime of a data-driven dialogue system. There are 7 emotion labels, which are adapted from the OCC emotion models.

    For data format and label definition, please refer to README.md.

  10. Data from: Multi Woz

    • kaggle.com
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lê Văn Tuấn Anh (2025). Multi Woz [Dataset]. https://www.kaggle.com/datasets/lvntunanh/multi-woz/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lê Văn Tuấn Anh
    Description

    Dataset

    This dataset was created by Lê Văn Tuấn Anh

    Contents

  11. h

    extractive_qa_question_answering_hr

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weijie Xu, extractive_qa_question_answering_hr [Dataset]. https://huggingface.co/datasets/xwjzds/extractive_qa_question_answering_hr
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Weijie Xu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card

    HR-Multiwoz is a fully-labeled dataset of 5980 extractive qa spanning 10 HR domains to evaluate LLM Agent. It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. Please refer to HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent for details about the dataset construction.

      Dataset Sources
    

    Repository: xwjzds/extractive_qa_question_answering_hr Paper: HR-MultiWOZ: A Task Oriented Dialogue (TOD)… See the full description on the dataset page: https://huggingface.co/datasets/xwjzds/extractive_qa_question_answering_hr.

  12. h

    multiwoz-chat

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajesh Manickadas, multiwoz-chat [Dataset]. https://huggingface.co/datasets/gisako/multiwoz-chat
    Explore at:
    Authors
    Rajesh Manickadas
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    gisako/multiwoz-chat dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. c

    Research data supporting "Large-Scale Multi-Domain Belief Tracking with...

    • repository.cam.ac.uk
    zip
    Updated Aug 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Budzianowski, PF; Ramadan, Osman; Gasic, Milica (2018). Research data supporting "Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing" [Dataset]. http://doi.org/10.17863/CAM.26059
    Explore at:
    zip(12639800 bytes)Available download formats
    Dataset updated
    Aug 9, 2018
    Dataset provided by
    Apollo
    University of Cambridge
    Authors
    Budzianowski, PF; Ramadan, Osman; Gasic, Milica
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a collection of human-human written conversations spanning over multiple domains and topics. The dataset was collected based on the Wizard of Oz experiment on Amazon MTurk. Each dialogue contains a goal label and several exchanges between a visitor and the system. Each system turn has labels from the set of slot-value pairs representing a coarse representation of dialogue state. There are in total 9855 dialogues.

  14. h

    manifest_self_labelled_tod_zero_bqag3oyb_8324_htb95VGQDY1x

    • huggingface.co
    Updated Oct 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brendan King (he/him) (2024). manifest_self_labelled_tod_zero_bqag3oyb_8324_htb95VGQDY1x [Dataset]. https://huggingface.co/datasets/Brendan/manifest_self_labelled_tod_zero_bqag3oyb_8324_htb95VGQDY1x
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2024
    Authors
    Brendan King (he/him)
    Description

    Self-Labelled MultiWOZ Dataset

    Manifest Group: tod_zero_bqag3oyb Number of Dialogues: 8324 Number of Turns: 56344 This dataset was created via a self-labelling process composed of multiple runs, in which un-labelled dialogue data (utterances from the user and system only) is labelled with pseudo-annotations for the belief state, system acts, and de-lexicalized system response. Here is the list of W&B runs contributing to this dataset. Each run is a self-labelling run, and should be… See the full description on the dataset page: https://huggingface.co/datasets/Brendan/manifest_self_labelled_tod_zero_bqag3oyb_8324_htb95VGQDY1x.

  15. h

    emowoz

    • huggingface.co
    Updated Jul 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heinrich-Heine-Universität Düsseldorf, Dialog Systems and Machine Learning (2023). emowoz [Dataset]. https://huggingface.co/datasets/hhu-dsml/emowoz
    Explore at:
    Dataset updated
    Jul 20, 2023
    Dataset authored and provided by
    Heinrich-Heine-Universität Düsseldorf, Dialog Systems and Machine Learning
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    EmoWOZ is a user emotion recognition in task-oriented dialogues dataset, consisting all dialogues from MultiWOZ and 1000 additional human-machine dialogues (DialMAGE). Each user utterance is annotated with one of the following emotions: 0: neutral, 1: fearful, 2: dissatisfied, 3: apologetic, 4: abusive, 5: excited, 6: satisfied. System utterances are annotated with -1. For detailed label design and explanation, please refer to the paper and dataset homepage.

  16. h

    manifest_self_labelled_initial_labelling_7800_FRnXLaU7I_EX

    • huggingface.co
    Updated Nov 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brendan King (he/him) (2024). manifest_self_labelled_initial_labelling_7800_FRnXLaU7I_EX [Dataset]. https://huggingface.co/datasets/Brendan/manifest_self_labelled_initial_labelling_7800_FRnXLaU7I_EX
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2024
    Authors
    Brendan King (he/him)
    Description

    Self-Labelled MultiWOZ Dataset

    Manifest Group: initial_labelling Number of Dialogues: 7800 Number of Turns: 52897 This dataset was created via a self-labelling process composed of multiple runs, in which un-labelled dialogue data (utterances from the user and system only) is labelled with pseudo-annotations for the belief state, system acts, and de-lexicalized system response. Here is the list of W&B runs contributing to this dataset. Each run is a self-labelling run, and should be… See the full description on the dataset page: https://huggingface.co/datasets/Brendan/manifest_self_labelled_initial_labelling_7800_FRnXLaU7I_EX.

  17. O

    MobileCS

    • opendatalab.com
    zip
    Updated Jan 22, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tsinghua University (2007). MobileCS [Dataset]. https://opendatalab.com/OpenDataLab/MobileCS
    Explore at:
    zip(58576 bytes)Available download formats
    Dataset updated
    Jan 22, 2007
    Dataset provided by
    China Mobile
    Tsinghua University
    Description

    We introduce a new shared task for challenge, aiming to benchmark semi-supervised and reinforced task-oriented dialog systems, built for automated customer-service for mobile operators. The task consists of two tracks:Information extraction from dialog transcripts (Track 1)Task-oriented dialog systems (Track 2)An important feature for this shared task is that we release around 100K dialogs (in Chinese), which come from real-world dialog transcripts between real users and customer-service staffs from China Mobile, with privacy information anonymized. We call this dataset as MobileCS (mobile customer-service) dialog dataset, which differs from existing TOD datasets in both size and nature significantly.To the best of our knowledge, MobileCS is not only the largest publicly available multi-domain TOD dataset, but also consists of real-life human-to-human data (namely collected in real-world scenarios). For comparison, the widely used MultiWOZ dataset consists of 10K dialogs and is in fact simulated data (namely collected in a Wizard-of-Oz simulated game).

  18. h

    multiwoz_all_versions

    • huggingface.co
    Updated Nov 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pietro Lesci (2024). multiwoz_all_versions [Dataset]. https://huggingface.co/datasets/pietrolesci/multiwoz_all_versions
    Explore at:
    Dataset updated
    Nov 12, 2024
    Authors
    Pietro Lesci
    Description

    This dataset is based on the "cumulative" configuration (i.e., slots previously filled are kept in the state) of the MultiWoz 2.2 dataset available also at pfb30/multi_woz_v22. Therefore, the system and user utterances, the active intents, and the services are exactly the same. In addition to the data present in version 2.2, this dataset contains the annotations from versions 2.1, 2.3, and 2.4 for each dialogue turn. This dataset is an artefact of the paper Diable: Efficient Dialogue State… See the full description on the dataset page: https://huggingface.co/datasets/pietrolesci/multiwoz_all_versions.

  19. O

    FusedChat

    • opendatalab.com
    zip
    Updated Sep 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nanyang Technological University (2021). FusedChat [Dataset]. https://opendatalab.com/OpenDataLab/FusedChat
    Explore at:
    zip(338157797 bytes)Available download formats
    Dataset updated
    Sep 19, 2021
    Dataset provided by
    Nanyang Technological University
    National University of Singapore
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    FusedChat is an inter-mode dialogue dataset. It contains dialogue sessions fusing task-oriented dialogues (TOD) and open-domain dialogues (ODD). Based on MultiWOZ, FusedChat appends or prepends an ODD to every existing TOD. See more details in the paper.

  20. h

    SPADE-customer-service-dialogue

    • huggingface.co
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angela Yuan (2025). SPADE-customer-service-dialogue [Dataset]. https://huggingface.co/datasets/AngieYYF/SPADE-customer-service-dialogue
    Explore at:
    Dataset updated
    Feb 5, 2025
    Authors
    Angela Yuan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    SPADE: Structured Prompting Augmentation for Dialogue Enhancement in Machine-Generated Text Detection

    Paper | Code

    SPADE contains a repository of customer service line synthetic user dialogues with goals, augmented from MultiWOZ 2.1 using GPT-3.5 and Llama 70B. The datasets are intended for training and evaluating machine generated text detectors in dialogue settings. There are 15 English datasets generated using 5 different augmentation methods and 2 large language models… See the full description on the dataset page: https://huggingface.co/datasets/AngieYYF/SPADE-customer-service-dialogue.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Budzianowski, Paweł; Mihail, Eric; Rahul, Goel; Shachi, Paul; Sethi, Abhishek; Agarwal, Sanchit; Gao, Shuyag; Hakkani-Tur, Dilek (2019). Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling" [Dataset]. http://doi.org/10.17863/CAM.41572

Research data supporting "MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling"

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
zip(13794372 bytes)Available download formats
Dataset updated
Jul 10, 2019
Dataset provided by
Apollo
University of Cambridge
Authors
Budzianowski, Paweł; Mihail, Eric; Rahul, Goel; Shachi, Paul; Sethi, Abhishek; Agarwal, Sanchit; Gao, Shuyag; Hakkani-Tur, Dilek
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset contains the following json files: 1. data.json: the woz dialogue dataset, which contains the conversation users and wizards, as well as a set of coarse labels for each user turn. 2. restaurant_db.json: the Cambridge restaurant database file, containing restaurants in the Cambridge UK area and a set of attributes. 3. attraction_db.json: the Cambridge attraction database file, contining attractions in the Cambridge UK area and a set of attributes. 4. hotel_db.json: the Cambridge hotel database file, containing hotels in the Cambridge UK area and a set of attributes. 5. train_db.json: the Cambridge train (with artificial connections) database file, containing trains in the Cambridge UK area and a set of attributes. 6. hospital_db.json: the Cambridge hospital database file, contatining information about departments. 7. police_db.json: the Cambridge police station information. 8. taxi_db.json: slot-value list for taxi domain. 9. valListFile.json: list of dialogues for validation. 10. testListFile.json: list of dialogues for testing. 11. system_acts.json: system acts annotations 12. ontology.json: Data-based ontology.

Important note: This dataset was previously entitled 'Research data supporting "MultiWOZ 2.1 - Multi-Domain Dialogue State Corrections and State Tracking Baselines"'. The change to the current title of this dataset was made at the request of the authors in July 2019.

Search
Clear search
Close search
Google apps
Main menu