7 datasets found

UR-FUNNY-V2
kaggle.com
Updated May 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RyuenK (2025). UR-FUNNY-V2 [Dataset]. https://www.kaggle.com/datasets/ryuenk/ur-funny/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
RyuenK
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This work is done by Rochester Human-Computer Interaction (ROC HCI) lab, University of Rochester, USA with the collaboration of Language Technologies Institute, SCS, CMU, USA.

ROC-HCI Website: (https://roc-hci.com/)

This repository includes the UR-FUNNY dataset: first dataset for multimodal humor detection .

Please read the folllwoing paper for the details of the dataset and models. If you use the data and models, please consider citing the research paper:

@inproceedings{hasan-etal-2019-ur, title = "{UR}-{FUNNY}: A Multimodal Language Dataset for Understanding Humor", author = "Hasan, Md Kamrul and Rahman, Wasifur and Bagher Zadeh, AmirAli and Zhong, Jianyuan and Tanveer, Md Iftekhar and Morency, Louis-Philippe and Hoque, Mohammed (Ehsan)", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)", month = nov, year = "2019", address = "Hong Kong, China", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D19-1211", doi = "10.18653/v1/D19-1211", pages = "2046--2056", abstract = "", }

There are six pickle files in the extracted features folder:

1.data_folds 2.langauge_sdk 3.openface_features_sdk 4.covarep_features_sdk 5.humor_label_sdk 6.word_embedding_list

Data folds: data_folds.pkl has the dictionary that contains train, dev and test list of humor/not humor video segments id. These folds are speaker independent and homogenous. Please use these folds for valid comparison.

Langauge Features: word_embedding_list.pkl has the list of word embeddings of all unique words that are present in the UR-FUNNY dataset. We use the word indexes from this list as language feature. These word indexes are used to retrive the glove embeddings of the corresposnding words. We followed this approach to reduce the space. Because same word appears multiple times.

language_sdk.pkl contains a dictionary. The keys of the dictionary are the unique id of each humor / not humor video utterance. The corresponding raw video uttereances are also named by these unique id's.

The structure of the dictionary:

langauge_sdk{ id1: { punchline_embedding_indexes : [ idx1,idx2,.... ] context_embedding_indexes : [[ idx2,idx30,.... ],[idx5,idx6......],..] punchline_sentence : [....] context_sentences : [[sen1], [sen2],...] punchline_intervals : [ intervals of words in punchline ] context_intervals : [[ intervals of words in sen1 ], [ intervals of words in sen2 ],.......] } id2: { punchline_embedding_indexes : [ idx10,idx12,.... ] context_embedding_indexes : [[ idx21,idx4,.... ],[idx91,idx100......],..] punchline_sentence : [....] context_sentences : [[sen1], [sen2],...] punchline_intervals : [ intervals of words in punchline ] context_intervals : [[ intervals of words in sen1 ], [ intervals of words in sen2 ],.......] } ..... ..... }

Each video segments has four kind of features:

1.punchline_features: It contanis the list of word indexes (described above) of punchline sentence. We will use this word index to retrive the word embedding (glove.840B.300d) from word_embedding_list (described above). So if the punchline has n words then the dimension will be n.

2.context_features: It contanis the list of word indexes for the sentences in context. It is two dimensional list. First dimension is the number of sentences in context. Second dimension is the number of word for each sentence.

3.punchline_sentence: It contains the punchline sentence

4.context_sentences: It contanis the sentences used in context

Acoustic Features: covarep_features_sdk.pkl contains a dictionary. The keys of the dictionary are the unique id of each humor / not humor video utterance. We used COVAREP (https://covarep.github.io/covarep/) to extract acoustic features. See the extracted_features.txt for the names of the features.

The structure of the covarep_features_sdk:

covarep_features_sdk{ id1: { punchline_features : [ [ .... ],[ .... ], ...] context_features : [ [[ .... ],[......],..], [[ .... ],[......],..], ... ] .... } id2:{ punchline_features : [ [ .... ],[ .... ], ...] context_features : [ [[ .... ],[......],..], [[ .... ],[......],..], ... ] .... } .... .... }

Each humor/not humor video segment has two kind of features:

1.punchline_features: It contanis the average covarep features for each word in the punchline sentence. We aligned our features on word level. The dimension of covarep fetaures is 81. So if the punchline has n words then the dimension will be n * 81.

2.context_features: It contanis the average covarep features for each word in the context sentences. It is a three dimen...
O
UR-FUNNY
opendatalab.com
zip
Updated Sep 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Rochester (2023). UR-FUNNY [Dataset]. https://opendatalab.com/OpenDataLab/UR-FUNNY
Explore at:
zip(1308176439 bytes)Available download formats
Dataset updated
Sep 30, 2023
Dataset provided by
Language Technology Institute, Carnegie Mellon University
University of Rochester
Description
For understanding multimodal language used in expressing humor.
R
Funny Call Dataset
universe.roboflow.com
zip
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
unknown (2024). Funny Call Dataset [Dataset]. https://universe.roboflow.com/unknown-ctvi2/funny-call/model/1
Explore at:
zipAvailable download formats
Dataset updated
Oct 7, 2024
Dataset authored and provided by
unknown
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Enemy Bounding Boxes
Description
Funny Call

## Overview Funny Call is a dataset for object detection tasks - it contains Enemy annotations for 211 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
MIT Restaurant Corpus 🍔 CRF Dataset
kaggle.com
Updated Feb 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sagar Maru (2025). MIT Restaurant Corpus 🍔 CRF Dataset [Dataset]. https://www.kaggle.com/datasets/marusagar/mit-restaurant-corpus-crf-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 23, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sagar Maru
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MIT Restaurant Corpus - CRFs (Conditional Random Fields) Dataset

A Funny Dive into Restaurant Reviews 🥳🍽️

Welcome to MIT Restaurant Corpus - CRF Dataset! If you are someone who loves food, restaurant and all the jargings that come with it, then you are for a treat! (Pun intended! 😉), Let's break it in the most delicious way!

This dataset obtained from MIT Restaurant Corpus (https://sls.csail.mit.edu/downloads/restaurant/) provides valuable restaurant review data for the NER (Named Entity Recognition) functions. With institutions such as ratings, locations and cuisine, it is perfect for the manufacture of CRF models. 🏷️🍴 Let's dive into this rich resource and find out its ability! 📊📍

🍔 What's Inside This Feast?

The MIT Restaurant Corpus is designed to help you understand the intricacies of restaurant reviews and data about restaurants can be pars and classified. It has a set of files that are structured to give you all ingredients required to make CRF (Conditional Random Field) models for NER (Named Entity Recognition). What is served here:

1.**‘sent_train’** 📝: This file contains a collection of sentences. But not just any sentences. These are sentences taken from real - world restaurant reviews! Each sentence is separated by a new line. It is like a dish of text, a sentence at a time.

2.**‘sent_test’** 🍽️: Just like the ‘sent_train’ file, this one contains sentences, but they’re for testing purposes. Think of it as the "taste test" phase of your restaurant review trip. The sentences here help you assess how well your model has learned the art of NER.

3.**‘label_train’** 🏷️: Now here’s where the magic happens. This file holds the NER labels or tags corresponding to each token in the ‘sent_train’ file. So, for every word in a sentence, there is a related label. It helps the model know what is - whether it’s a restaurant name, location, or dish. This review is like a guide to identify the stars of the show!

4.**‘label_test’** 📋: This file is just like ‘label_train’, but for testing. This allows you to verify if your model predictions are with the reality of the restaurant world. Will your model guess that "Burtito Palace" is the name of a restaurant? You will know here!

Therefore, in short, there is a beautiful one-to-one mapping between ‘sent_train’/‘sent_test’ files and ‘label_train’/‘label_test’ files. Each sentence is combined with its NER tag, which makes your model an ideal recipe for training and testing.

🍕 The NER Labels – What are we Tagging?

The real star of this dataset is the NER tags. If you’re thinking, "Okay, but in reality we are trying to identify in these restaurants reviews?" Well, here is the menu of NER label with which you are working:

Rating ⭐: The stars or ratings (literally) that gives a reviewer to the restaurant. We all know those stars are important when it is to choosing where to eat!

Amenity 🛋️: Think about that in the form of comfortable extra that comes with a restaurant, such as free Wi-Fi, or a pet-friendly courtyard.

Location 📍: This tag marks the location of the restaurant. Therefore, when you see "on Fifth Avenue", you know that it points to the place.

Restaurant_Name 🍴: Ah, place name! Is this "Burger Bonanza" or "Sushi Central"? What this tag recognizes.

Price 💰: How much are we talking here? The price tag can be called "$ $ $" or "appropriate". We all need to know how different we are about, right?

Hours ⏰: Because who wants to show in a restaurant when it is closed? It marks the tag opening and closing time.

Dish 🍲: Which food is being talked about? "Pad Thai" or "Cheeseberger" are your examples.

Cuisine 🍣: A tag for food type - whether it is Italian, Japanese, or good OL 'American comfort food.

These NER tags help create an understanding of all the data you encounter in a restaurant review. You will be able to easily pull names, prices, ratings, dishes, and more. Talk about a full-recourse data food!

🍤 CRF Model – The Chef’s Special!

Now, once you get your hand on this delicious dataset, what do you do with it? A ** CRF model ** cooking time!🍳

CRF (conditional random field) is a great way to label the sequences of data - such as sentences. Since NER work is about tagging each token (word) in a sentence, CRF models are ideal. They use reference around each word to perform predictions. So, when you were "wonderful for Sushi in Sushi Central!" As the sentence passes in, the model can find out that "Sushi Central" is a Restaurant_Name, and “sushi” is a Dish.

🍜 Features – Spice It Up!

Next, we dive into defines features for CRF model. Features are like secret materials that work your model. You will learn how to define them in the python, so your model can recognize the pattern and make accurate predictions.

...
R
Loai4so Dataset
universe.roboflow.com
zip
Updated Sep 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NTC funny 1 (2024). Loai4so Dataset [Dataset]. https://universe.roboflow.com/ntc-funny-1/loai4so
Explore at:
zipAvailable download formats
Dataset updated
Sep 19, 2024
Dataset authored and provided by
NTC funny 1
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
So Bounding Boxes
Description
Loai4so

## Overview Loai4so is a dataset for object detection tasks - it contains So annotations for 780 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Loai5 Dataset
universe.roboflow.com
zip
Updated Dec 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NTC funny 1 (2024). Loai5 Dataset [Dataset]. https://universe.roboflow.com/ntc-funny-1/loai5
Explore at:
zipAvailable download formats
Dataset updated
Dec 7, 2024
Dataset authored and provided by
NTC funny 1
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Text Bounding Boxes
Description
Loai5

## Overview Loai5 is a dataset for object detection tasks - it contains Text annotations for 819 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Hghhghdfsbfdsbuaasdsa Dataset
universe.roboflow.com
zip
Updated Dec 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NTC funny 1 (2023). Hghhghdfsbfdsbuaasdsa Dataset [Dataset]. https://universe.roboflow.com/ntc-funny-1/hghhghdfsbfdsbuaasdsa/dataset/5
Explore at:
zipAvailable download formats
Dataset updated
Dec 3, 2023
Dataset authored and provided by
NTC funny 1
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Dfw Bounding Boxes
Description
Hghhghdfsbfdsbuaasdsa

## Overview Hghhghdfsbfdsbuaasdsa is a dataset for object detection tasks - it contains Dfw annotations for 1,152 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

RyuenK (2025). UR-FUNNY-V2 [Dataset]. https://www.kaggle.com/datasets/ryuenk/ur-funny/discussion

UR-FUNNY-V2

This repository includes the UR-FUNNY dataset: first dataset for multimodal humo

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 17, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

RyuenK

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This work is done by Rochester Human-Computer Interaction (ROC HCI) lab, University of Rochester, USA with the collaboration of Language Technologies Institute, SCS, CMU, USA.

ROC-HCI Website: (https://roc-hci.com/)

This repository includes the UR-FUNNY dataset: first dataset for multimodal humor detection .

Please read the folllwoing paper for the details of the dataset and models. If you use the data and models, please consider citing the research paper:

@inproceedings{hasan-etal-2019-ur,
  title = "{UR}-{FUNNY}: A Multimodal Language Dataset for Understanding Humor",
  author = "Hasan, Md Kamrul and
   Rahman, Wasifur and
   Bagher Zadeh, AmirAli and
   Zhong, Jianyuan and
   Tanveer, Md Iftekhar and
   Morency, Louis-Philippe and
   Hoque, Mohammed (Ehsan)",
  booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
  month = nov,
  year = "2019",
  address = "Hong Kong, China",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/D19-1211",
  doi = "10.18653/v1/D19-1211",
  pages = "2046--2056",
  abstract = "",
}

There are six pickle files in the extracted features folder:

1.data_folds 2.langauge_sdk 3.openface_features_sdk 4.covarep_features_sdk 5.humor_label_sdk 6.word_embedding_list

Data folds: data_folds.pkl has the dictionary that contains train, dev and test list of humor/not humor video segments id. These folds are speaker independent and homogenous. Please use these folds for valid comparison.

Langauge Features: word_embedding_list.pkl has the list of word embeddings of all unique words that are present in the UR-FUNNY dataset. We use the word indexes from this list as language feature. These word indexes are used to retrive the glove embeddings of the corresposnding words. We followed this approach to reduce the space. Because same word appears multiple times.

language_sdk.pkl contains a dictionary. The keys of the dictionary are the unique id of each humor / not humor video utterance. The corresponding raw video uttereances are also named by these unique id's.

The structure of the dictionary:

langauge_sdk{
  id1: {
    punchline_embedding_indexes : [ idx1,idx2,.... ]
    context_embedding_indexes : [[ idx2,idx30,.... ],[idx5,idx6......],..] 
    punchline_sentence : [....]
    context_sentences : [[sen1], [sen2],...]
    punchline_intervals : [ intervals of words in punchline ]
    context_intervals : [[ intervals of words in sen1 ], [ intervals of words in sen2 ],.......]
    }
  id2: {
    punchline_embedding_indexes : [ idx10,idx12,.... ]
    context_embedding_indexes : [[ idx21,idx4,.... ],[idx91,idx100......],..]  
    punchline_sentence : [....]
    context_sentences : [[sen1], [sen2],...]
    punchline_intervals : [ intervals of words in punchline ]
    context_intervals : [[ intervals of words in sen1 ], [ intervals of words in sen2 ],.......]         
    }
  .....
  .....
}

Each video segments has four kind of features:

1.punchline_features: It contanis the list of word indexes (described above) of punchline sentence. We will use this word index to retrive the word embedding (glove.840B.300d) from word_embedding_list (described above). So if the punchline has n words then the dimension will be n.

2.context_features: It contanis the list of word indexes for the sentences in context. It is two dimensional list. First dimension is the number of sentences in context. Second dimension is the number of word for each sentence.

3.punchline_sentence: It contains the punchline sentence

4.context_sentences: It contanis the sentences used in context

Acoustic Features: covarep_features_sdk.pkl contains a dictionary. The keys of the dictionary are the unique id of each humor / not humor video utterance. We used COVAREP (https://covarep.github.io/covarep/) to extract acoustic features. See the extracted_features.txt for the names of the features.

The structure of the covarep_features_sdk:

covarep_features_sdk{
  id1: {
    punchline_features : [ [ .... ],[ .... ], ...]
    context_features : [ [[ .... ],[......],..], [[ .... ],[......],..], ... ]               ....
    }

  id2:{
    punchline_features : [ [ .... ],[ .... ], ...]
    context_features : [ [[ .... ],[......],..], [[ .... ],[......],..], ... ]
    ....
  }
  ....
  ....
}

Each humor/not humor video segment has two kind of features:

1.punchline_features: It contanis the average covarep features for each word in the punchline sentence. We aligned our features on word level. The dimension of covarep fetaures is 81. So if the punchline has n words then the dimension will be n * 81.

2.context_features: It contanis the average covarep features for each word in the context sentences. It is a three dimen...

Clear search

Close search

Google apps

Main menu

UR-FUNNY-V2

UR-FUNNY

Funny Call Dataset

Funny Call

MIT Restaurant Corpus 🍔 CRF Dataset

MIT Restaurant Corpus - CRFs (Conditional Random Fields) Dataset

🍔 What's Inside This Feast?

🍕 The NER Labels – What are we Tagging?

🍤 CRF Model – The Chef’s Special!

🍜 Features – Spice It Up!

Loai4so Dataset

Loai4so

Loai5 Dataset

Loai5

Hghhghdfsbfdsbuaasdsa Dataset

Hghhghdfsbfdsbuaasdsa

UR-FUNNY-V2

This repository includes the UR-FUNNY dataset: first dataset for multimodal humo