7 datasets found
  1. UR-FUNNY-V2

    • kaggle.com
    Updated May 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RyuenK (2025). UR-FUNNY-V2 [Dataset]. https://www.kaggle.com/datasets/ryuenk/ur-funny/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    RyuenK
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This work is done by Rochester Human-Computer Interaction (ROC HCI) lab, University of Rochester, USA with the collaboration of Language Technologies Institute, SCS, CMU, USA.

    ROC-HCI Website: (https://roc-hci.com/)

    This repository includes the UR-FUNNY dataset: first dataset for multimodal humor detection .

    Please read the folllwoing paper for the details of the dataset and models. If you use the data and models, please consider citing the research paper:

    @inproceedings{hasan-etal-2019-ur,
      title = "{UR}-{FUNNY}: A Multimodal Language Dataset for Understanding Humor",
      author = "Hasan, Md Kamrul and
       Rahman, Wasifur and
       Bagher Zadeh, AmirAli and
       Zhong, Jianyuan and
       Tanveer, Md Iftekhar and
       Morency, Louis-Philippe and
       Hoque, Mohammed (Ehsan)",
      booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
      month = nov,
      year = "2019",
      address = "Hong Kong, China",
      publisher = "Association for Computational Linguistics",
      url = "https://www.aclweb.org/anthology/D19-1211",
      doi = "10.18653/v1/D19-1211",
      pages = "2046--2056",
      abstract = "",
    }
    

    There are six pickle files in the extracted features folder:

    1.data_folds 2.langauge_sdk 3.openface_features_sdk 4.covarep_features_sdk 5.humor_label_sdk 6.word_embedding_list

    Data folds: data_folds.pkl has the dictionary that contains train, dev and test list of humor/not humor video segments id. These folds are speaker independent and homogenous. Please use these folds for valid comparison.

    Langauge Features: word_embedding_list.pkl has the list of word embeddings of all unique words that are present in the UR-FUNNY dataset. We use the word indexes from this list as language feature. These word indexes are used to retrive the glove embeddings of the corresposnding words. We followed this approach to reduce the space. Because same word appears multiple times.

    language_sdk.pkl contains a dictionary. The keys of the dictionary are the unique id of each humor / not humor video utterance. The corresponding raw video uttereances are also named by these unique id's.

    The structure of the dictionary:

    langauge_sdk{
      id1: {
        punchline_embedding_indexes : [ idx1,idx2,.... ]
        context_embedding_indexes : [[ idx2,idx30,.... ],[idx5,idx6......],..] 
        punchline_sentence : [....]
        context_sentences : [[sen1], [sen2],...]
        punchline_intervals : [ intervals of words in punchline ]
        context_intervals : [[ intervals of words in sen1 ], [ intervals of words in sen2 ],.......]
        }
      id2: {
        punchline_embedding_indexes : [ idx10,idx12,.... ]
        context_embedding_indexes : [[ idx21,idx4,.... ],[idx91,idx100......],..]  
        punchline_sentence : [....]
        context_sentences : [[sen1], [sen2],...]
        punchline_intervals : [ intervals of words in punchline ]
        context_intervals : [[ intervals of words in sen1 ], [ intervals of words in sen2 ],.......]         
        }
      .....
      .....
    }
    

    Each video segments has four kind of features:

    1.punchline_features: It contanis the list of word indexes (described above) of punchline sentence. We will use this word index to retrive the word embedding (glove.840B.300d) from word_embedding_list (described above). So if the punchline has n words then the dimension will be n.

    2.context_features: It contanis the list of word indexes for the sentences in context. It is two dimensional list. First dimension is the number of sentences in context. Second dimension is the number of word for each sentence.

    3.punchline_sentence: It contains the punchline sentence

    4.context_sentences: It contanis the sentences used in context

    Acoustic Features: covarep_features_sdk.pkl contains a dictionary. The keys of the dictionary are the unique id of each humor / not humor video utterance. We used COVAREP (https://covarep.github.io/covarep/) to extract acoustic features. See the extracted_features.txt for the names of the features.

    The structure of the covarep_features_sdk:

    covarep_features_sdk{
      id1: {
        punchline_features : [ [ .... ],[ .... ], ...]
        context_features : [ [[ .... ],[......],..], [[ .... ],[......],..], ... ]               ....
        }
    
      id2:{
        punchline_features : [ [ .... ],[ .... ], ...]
        context_features : [ [[ .... ],[......],..], [[ .... ],[......],..], ... ]
        ....
      }
      ....
      ....
    }
    

    Each humor/not humor video segment has two kind of features:

    1.punchline_features: It contanis the average covarep features for each word in the punchline sentence. We aligned our features on word level. The dimension of covarep fetaures is 81. So if the punchline has n words then the dimension will be n * 81.

    2.context_features: It contanis the average covarep features for each word in the context sentences. It is a three dimen...

  2. O

    UR-FUNNY

    • opendatalab.com
    zip
    Updated Sep 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Rochester (2023). UR-FUNNY [Dataset]. https://opendatalab.com/OpenDataLab/UR-FUNNY
    Explore at:
    zip(1308176439 bytes)Available download formats
    Dataset updated
    Sep 30, 2023
    Dataset provided by
    Language Technology Institute, Carnegie Mellon University
    University of Rochester
    Description

    For understanding multimodal language used in expressing humor.

  3. R

    Funny Call Dataset

    • universe.roboflow.com
    zip
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    unknown (2024). Funny Call Dataset [Dataset]. https://universe.roboflow.com/unknown-ctvi2/funny-call/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 7, 2024
    Dataset authored and provided by
    unknown
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Enemy Bounding Boxes
    Description

    Funny Call

    ## Overview
    
    Funny Call is a dataset for object detection tasks - it contains Enemy annotations for 211 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  4. MIT Restaurant Corpus 🍔 CRF Dataset

    • kaggle.com
    Updated Feb 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagar Maru (2025). MIT Restaurant Corpus 🍔 CRF Dataset [Dataset]. https://www.kaggle.com/datasets/marusagar/mit-restaurant-corpus-crf-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sagar Maru
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MIT Restaurant Corpus - CRFs (Conditional Random Fields) Dataset

    A Funny Dive into Restaurant Reviews 🥳🍽️

    Welcome to MIT Restaurant Corpus - CRF Dataset! If you are someone who loves food, restaurant and all the jargings that come with it, then you are for a treat! (Pun intended! 😉), Let's break it in the most delicious way!

    This dataset obtained from MIT Restaurant Corpus (https://sls.csail.mit.edu/downloads/restaurant/) provides valuable restaurant review data for the NER (Named Entity Recognition) functions. With institutions such as ratings, locations and cuisine, it is perfect for the manufacture of CRF models. 🏷️🍴 Let's dive into this rich resource and find out its ability! 📊📍

    🍔 What's Inside This Feast?

    The MIT Restaurant Corpus is designed to help you understand the intricacies of restaurant reviews and data about restaurants can be pars and classified. It has a set of files that are structured to give you all ingredients required to make CRF (Conditional Random Field) models for NER (Named Entity Recognition). What is served here:

    1.**‘sent_train’** 📝: This file contains a collection of sentences. But not just any sentences. These are sentences taken from real - world restaurant reviews! Each sentence is separated by a new line. It is like a dish of text, a sentence at a time.

    2.**‘sent_test’** 🍽️: Just like the ‘sent_train’ file, this one contains sentences, but they’re for testing purposes. Think of it as the "taste test" phase of your restaurant review trip. The sentences here help you assess how well your model has learned the art of NER.

    3.**‘label_train’** 🏷️: Now here’s where the magic happens. This file holds the NER labels or tags corresponding to each token in the ‘sent_train’ file. So, for every word in a sentence, there is a related label. It helps the model know what is - whether it’s a restaurant name, location, or dish. This review is like a guide to identify the stars of the show!

    4.**‘label_test’** 📋: This file is just like ‘label_train’, but for testing. This allows you to verify if your model predictions are with the reality of the restaurant world. Will your model guess that "Burtito Palace" is the name of a restaurant? You will know here!

    Therefore, in short, there is a beautiful one-to-one mapping between ‘sent_train’/‘sent_test’ files and ‘label_train’/‘label_test’ files. Each sentence is combined with its NER tag, which makes your model an ideal recipe for training and testing.

    🍕 The NER Labels – What are we Tagging?

    The real star of this dataset is the NER tags. If you’re thinking, "Okay, but in reality we are trying to identify in these restaurants reviews?" Well, here is the menu of NER label with which you are working:

    • Rating ⭐: The stars or ratings (literally) that gives a reviewer to the restaurant. We all know those stars are important when it is to choosing where to eat!
    • Amenity 🛋️: Think about that in the form of comfortable extra that comes with a restaurant, such as free Wi-Fi, or a pet-friendly courtyard.
    • Location 📍: This tag marks the location of the restaurant. Therefore, when you see "on Fifth Avenue", you know that it points to the place.
    • Restaurant_Name 🍴: Ah, place name! Is this "Burger Bonanza" or "Sushi Central"? What this tag recognizes.
    • Price 💰: How much are we talking here? The price tag can be called "$ $ $" or "appropriate". We all need to know how different we are about, right?
    • Hours ⏰: Because who wants to show in a restaurant when it is closed? It marks the tag opening and closing time.
    • Dish 🍲: Which food is being talked about? "Pad Thai" or "Cheeseberger" are your examples.
    • Cuisine 🍣: A tag for food type - whether it is Italian, Japanese, or good OL 'American comfort food.

    These NER tags help create an understanding of all the data you encounter in a restaurant review. You will be able to easily pull names, prices, ratings, dishes, and more. Talk about a full-recourse data food!

    🍤 CRF Model – The Chef’s Special!

    Now, once you get your hand on this delicious dataset, what do you do with it? A ** CRF model ** cooking time!🍳

    CRF (conditional random field) is a great way to label the sequences of data - such as sentences. Since NER work is about tagging each token (word) in a sentence, CRF models are ideal. They use reference around each word to perform predictions. So, when you were "wonderful for Sushi in Sushi Central!" As the sentence passes in, the model can find out that "Sushi Central" is a Restaurant_Name, and “sushi” is a Dish.

    🍜 Features – Spice It Up!

    Next, we dive into defines features for CRF model. Features are like secret materials that work your model. You will learn how to define them in the python, so your model can recognize the pattern and make accurate predictions.

    ...

  5. R

    Loai4so Dataset

    • universe.roboflow.com
    zip
    Updated Sep 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NTC funny 1 (2024). Loai4so Dataset [Dataset]. https://universe.roboflow.com/ntc-funny-1/loai4so
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 19, 2024
    Dataset authored and provided by
    NTC funny 1
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    So Bounding Boxes
    Description

    Loai4so

    ## Overview
    
    Loai4so is a dataset for object detection tasks - it contains So annotations for 780 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  6. R

    Loai5 Dataset

    • universe.roboflow.com
    zip
    Updated Dec 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NTC funny 1 (2024). Loai5 Dataset [Dataset]. https://universe.roboflow.com/ntc-funny-1/loai5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 7, 2024
    Dataset authored and provided by
    NTC funny 1
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Text Bounding Boxes
    Description

    Loai5

    ## Overview
    
    Loai5 is a dataset for object detection tasks - it contains Text annotations for 819 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  7. R

    Hghhghdfsbfdsbuaasdsa Dataset

    • universe.roboflow.com
    zip
    Updated Dec 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NTC funny 1 (2023). Hghhghdfsbfdsbuaasdsa Dataset [Dataset]. https://universe.roboflow.com/ntc-funny-1/hghhghdfsbfdsbuaasdsa/dataset/5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 3, 2023
    Dataset authored and provided by
    NTC funny 1
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Dfw Bounding Boxes
    Description

    Hghhghdfsbfdsbuaasdsa

    ## Overview
    
    Hghhghdfsbfdsbuaasdsa is a dataset for object detection tasks - it contains Dfw annotations for 1,152 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
RyuenK (2025). UR-FUNNY-V2 [Dataset]. https://www.kaggle.com/datasets/ryuenk/ur-funny/discussion
Organization logo

UR-FUNNY-V2

This repository includes the UR-FUNNY dataset: first dataset for multimodal humo

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
RyuenK
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This work is done by Rochester Human-Computer Interaction (ROC HCI) lab, University of Rochester, USA with the collaboration of Language Technologies Institute, SCS, CMU, USA.

ROC-HCI Website: (https://roc-hci.com/)

This repository includes the UR-FUNNY dataset: first dataset for multimodal humor detection .

Please read the folllwoing paper for the details of the dataset and models. If you use the data and models, please consider citing the research paper:

@inproceedings{hasan-etal-2019-ur,
  title = "{UR}-{FUNNY}: A Multimodal Language Dataset for Understanding Humor",
  author = "Hasan, Md Kamrul and
   Rahman, Wasifur and
   Bagher Zadeh, AmirAli and
   Zhong, Jianyuan and
   Tanveer, Md Iftekhar and
   Morency, Louis-Philippe and
   Hoque, Mohammed (Ehsan)",
  booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
  month = nov,
  year = "2019",
  address = "Hong Kong, China",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/D19-1211",
  doi = "10.18653/v1/D19-1211",
  pages = "2046--2056",
  abstract = "",
}

There are six pickle files in the extracted features folder:

1.data_folds 2.langauge_sdk 3.openface_features_sdk 4.covarep_features_sdk 5.humor_label_sdk 6.word_embedding_list

Data folds: data_folds.pkl has the dictionary that contains train, dev and test list of humor/not humor video segments id. These folds are speaker independent and homogenous. Please use these folds for valid comparison.

Langauge Features: word_embedding_list.pkl has the list of word embeddings of all unique words that are present in the UR-FUNNY dataset. We use the word indexes from this list as language feature. These word indexes are used to retrive the glove embeddings of the corresposnding words. We followed this approach to reduce the space. Because same word appears multiple times.

language_sdk.pkl contains a dictionary. The keys of the dictionary are the unique id of each humor / not humor video utterance. The corresponding raw video uttereances are also named by these unique id's.

The structure of the dictionary:

langauge_sdk{
  id1: {
    punchline_embedding_indexes : [ idx1,idx2,.... ]
    context_embedding_indexes : [[ idx2,idx30,.... ],[idx5,idx6......],..] 
    punchline_sentence : [....]
    context_sentences : [[sen1], [sen2],...]
    punchline_intervals : [ intervals of words in punchline ]
    context_intervals : [[ intervals of words in sen1 ], [ intervals of words in sen2 ],.......]
    }
  id2: {
    punchline_embedding_indexes : [ idx10,idx12,.... ]
    context_embedding_indexes : [[ idx21,idx4,.... ],[idx91,idx100......],..]  
    punchline_sentence : [....]
    context_sentences : [[sen1], [sen2],...]
    punchline_intervals : [ intervals of words in punchline ]
    context_intervals : [[ intervals of words in sen1 ], [ intervals of words in sen2 ],.......]         
    }
  .....
  .....
}

Each video segments has four kind of features:

1.punchline_features: It contanis the list of word indexes (described above) of punchline sentence. We will use this word index to retrive the word embedding (glove.840B.300d) from word_embedding_list (described above). So if the punchline has n words then the dimension will be n.

2.context_features: It contanis the list of word indexes for the sentences in context. It is two dimensional list. First dimension is the number of sentences in context. Second dimension is the number of word for each sentence.

3.punchline_sentence: It contains the punchline sentence

4.context_sentences: It contanis the sentences used in context

Acoustic Features: covarep_features_sdk.pkl contains a dictionary. The keys of the dictionary are the unique id of each humor / not humor video utterance. We used COVAREP (https://covarep.github.io/covarep/) to extract acoustic features. See the extracted_features.txt for the names of the features.

The structure of the covarep_features_sdk:

covarep_features_sdk{
  id1: {
    punchline_features : [ [ .... ],[ .... ], ...]
    context_features : [ [[ .... ],[......],..], [[ .... ],[......],..], ... ]               ....
    }

  id2:{
    punchline_features : [ [ .... ],[ .... ], ...]
    context_features : [ [[ .... ],[......],..], [[ .... ],[......],..], ... ]
    ....
  }
  ....
  ....
}

Each humor/not humor video segment has two kind of features:

1.punchline_features: It contanis the average covarep features for each word in the punchline sentence. We aligned our features on word level. The dimension of covarep fetaures is 81. So if the punchline has n words then the dimension will be n * 81.

2.context_features: It contanis the average covarep features for each word in the context sentences. It is a three dimen...

Search
Clear search
Close search
Google apps
Main menu