9 datasets found
  1. P

    ConvAI2 Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Jun 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Dinan; Varvara Logacheva; Valentin Malykh; Alexander Miller; Kurt Shuster; Jack Urbanek; Douwe Kiela; Arthur Szlam; Iulian Serban; Ryan Lowe; Shrimai Prabhumoye; Alan W. black; Alexander Rudnicky; Jason Williams; Joelle Pineau; Mikhail Burtsev; Jason Weston (2022). ConvAI2 Dataset [Dataset]. https://paperswithcode.com/dataset/convai2
    Explore at:
    Dataset updated
    Jun 14, 2022
    Authors
    Emily Dinan; Varvara Logacheva; Valentin Malykh; Alexander Miller; Kurt Shuster; Jack Urbanek; Douwe Kiela; Arthur Szlam; Iulian Serban; Ryan Lowe; Shrimai Prabhumoye; Alan W. black; Alexander Rudnicky; Jason Williams; Joelle Pineau; Mikhail Burtsev; Jason Weston
    Description

    The ConvAI2 NeurIPS competition aimed at finding approaches to creating high-quality dialogue agents capable of meaningful open domain conversation. The ConvAI2 dataset for training models is based on the PERSONA-CHAT dataset. The speaker pairs each have assigned profiles coming from a set of 1155 possible personas (at training time), each consisting of at least 5 profile sentences, setting aside 100 never seen before personas for validation. As the original PERSONA-CHAT test set was released, a new hidden test set consisted of 100 new personas and over 1,015 dialogs was created by crowdsourced workers.

    To avoid modeling that takes advantage of trivial word overlap, additional rewritten sets of the same train and test personas were crowdsourced, with related sentences that are rephrases, generalizations or specializations, rendering the task much more challenging. For example “I just got my nails done” is revised as “I love to pamper myself on a regular basis” and “I am on a diet now” is revised as “I need to lose weight.”

    The training, validation and hidden test sets consists of 17,878, 1,000 and 1,015 dialogues, respectively.

  2. P

    BPersona-chat Dataset

    • paperswithcode.com
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yunmeng Li; Jun Suzuki; Makoto Morishita; Kaori Abe; Ryoko Tokuhisa; Ana Brassard; Kentaro Inui (2023). BPersona-chat Dataset [Dataset]. https://paperswithcode.com/dataset/bpersona-chat
    Explore at:
    Dataset updated
    Aug 1, 2023
    Authors
    Yunmeng Li; Jun Suzuki; Makoto Morishita; Kaori Abe; Ryoko Tokuhisa; Ana Brassard; Kentaro Inui
    Description

    BPersona-chat is an evaluation dataset based on the English multiturn chat corpus Persona-chat and the Japanese multiturn chat corpus JPersona-chat.

    Each chat was performed between two crowd workers assuming artificial personas. The speakers discuss a given personality trait, including but not limited to self-introduction, hobby, and others. (Notice that they are not translations of each other.)

    Chats are translated into Japanese/English by professional translators, a low-quality machine translation model A and a high-quality machine translation model B.

    Translations are evaluated by crowdworkers as either good or bad, depending on the correctness and coherence.

    Each chat is included in one .xlsx file with the following structure:

    person - the speaker on the current utterance, source - the utterance in the source language, translation - the translation in the target language, evaluation: is this a good translation? - the evaluation of the translation's quality, y - the current translation is a correct translation of the source utterance, n - the current translation is an erroneous translation of the source utterance.

  3. PMPC (Persona Match on Persona-Chat)

    • opendatalab.com
    zip
    Updated Sep 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iFlytek Research (2022). PMPC (Persona Match on Persona-Chat) [Dataset]. https://opendatalab.com/OpenDataLab/PMPC
    Explore at:
    zip(141185672 bytes)Available download formats
    Dataset updated
    Sep 22, 2022
    Dataset provided by
    科大讯飞http://www.iflytek.com/
    Microsoft Research Asia
    Queen’s University
    University of Science and Technology of China
    Description

    PMPC (Persona Match on Persona-Chat) is a dataset for Speaker Persona Detection (SPD) which aims to detect speaker personas based on the plain conversational text.

  4. h

    persona-chat-en2bn-azure

    • huggingface.co
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intelsense AI (2025). persona-chat-en2bn-azure [Dataset]. https://huggingface.co/datasets/intelsense/persona-chat-en2bn-azure
    Explore at:
    Dataset updated
    Mar 26, 2025
    Dataset authored and provided by
    Intelsense AI
    Description

    intelsense/persona-chat-en2bn-azure dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. t

    USR-PersonaChat - Dataset - LDM

    • service.tib.eu
    Updated Jan 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). USR-PersonaChat - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/usr-personachat
    Explore at:
    Dataset updated
    Jan 2, 2025
    Description

    This dataset is used for dialogue response evaluation.

  6. w

    persona.chat - Historical whois Lookup

    • whoisdatacenter.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, persona.chat - Historical whois Lookup [Dataset]. https://whoisdatacenter.com/domain/persona.chat/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Mar 15, 1985 - Mar 27, 2025
    Description

    Explore the historical Whois records related to persona.chat (Domain). Get insights into ownership history and changes over time.

  7. h

    persona-chat-en2bn

    • huggingface.co
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intelsense AI (2025). persona-chat-en2bn [Dataset]. https://huggingface.co/datasets/intelsense/persona-chat-en2bn
    Explore at:
    Dataset updated
    Mar 26, 2025
    Dataset authored and provided by
    Intelsense AI
    Description

    intelsense/persona-chat-en2bn dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. P

    PersonalDialog Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Dec 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yinhe Zheng; Guanyi Chen; Minlie Huang; Song Liu; Xuan Zhu (2021). PersonalDialog Dataset [Dataset]. https://paperswithcode.com/dataset/personaldialog
    Explore at:
    Dataset updated
    Dec 16, 2021
    Authors
    Yinhe Zheng; Guanyi Chen; Minlie Huang; Song Liu; Xuan Zhu
    Description

    PersonalDialog is a large-scale multi-turn dialogue dataset containing various traits from a large number of speakers. The dataset consists of 20.83M sessions and 56.25M utterances from 8.47M speakers. Each utterance is associated with a speaker who is marked with traits like Age, Gender, Location, Interest Tags, etc. Several anonymization schemes are designed to protect the privacy of each speaker.

  9. Ranking de aplicaciones de mensajería según usuarios activos mensuales...

    • es.statista.com
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Ranking de aplicaciones de mensajería según usuarios activos mensuales mundiales 2024 [Dataset]. https://es.statista.com/estadisticas/599043/aplicaciones-de-mensajeria-mas-populares-a-nivel-mundial-de/
    Explore at:
    Dataset updated
    Jul 31, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Mundial
    Description

    En enero de 2024, 2.000 millones de usuarios accedían al chat de WhatsApp mensualmente. El uso de la aplicación es particularmente fuerte en mercados en Estados Unidos, aunque cabe destacar que es una de las aplicaciones sociales móviles más populares en todo el mundo. En febrero de 2014, la red social Facebook adquirió la aplicación móvil por 19.000 millones de dólares estadounidenses.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Emily Dinan; Varvara Logacheva; Valentin Malykh; Alexander Miller; Kurt Shuster; Jack Urbanek; Douwe Kiela; Arthur Szlam; Iulian Serban; Ryan Lowe; Shrimai Prabhumoye; Alan W. black; Alexander Rudnicky; Jason Williams; Joelle Pineau; Mikhail Burtsev; Jason Weston (2022). ConvAI2 Dataset [Dataset]. https://paperswithcode.com/dataset/convai2

ConvAI2 Dataset

Conversational Intelligence Challenge 2

Explore at:
Dataset updated
Jun 14, 2022
Authors
Emily Dinan; Varvara Logacheva; Valentin Malykh; Alexander Miller; Kurt Shuster; Jack Urbanek; Douwe Kiela; Arthur Szlam; Iulian Serban; Ryan Lowe; Shrimai Prabhumoye; Alan W. black; Alexander Rudnicky; Jason Williams; Joelle Pineau; Mikhail Burtsev; Jason Weston
Description

The ConvAI2 NeurIPS competition aimed at finding approaches to creating high-quality dialogue agents capable of meaningful open domain conversation. The ConvAI2 dataset for training models is based on the PERSONA-CHAT dataset. The speaker pairs each have assigned profiles coming from a set of 1155 possible personas (at training time), each consisting of at least 5 profile sentences, setting aside 100 never seen before personas for validation. As the original PERSONA-CHAT test set was released, a new hidden test set consisted of 100 new personas and over 1,015 dialogs was created by crowdsourced workers.

To avoid modeling that takes advantage of trivial word overlap, additional rewritten sets of the same train and test personas were crowdsourced, with related sentences that are rephrases, generalizations or specializations, rendering the task much more challenging. For example “I just got my nails done” is revised as “I love to pamper myself on a regular basis” and “I am on a diet now” is revised as “I need to lose weight.”

The training, validation and hidden test sets consists of 17,878, 1,000 and 1,015 dialogues, respectively.

Search
Clear search
Close search
Google apps
Main menu