70 datasets found

h
short_jokes
huggingface.co
Updated Feb 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yuvraj sharma (2024). short_jokes [Dataset]. https://huggingface.co/datasets/ysharma/short_jokes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 22, 2024
Authors
yuvraj sharma
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes. You can visit the Github… See the full description on the dataset page: https://huggingface.co/datasets/ysharma/short_jokes.
Dad-A-Base Of Jokes
kaggle.com
Updated Mar 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arya Shah (2023). Dad-A-Base Of Jokes [Dataset]. https://www.kaggle.com/datasets/aryashah2k/dad-a-base-of-jokes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 10, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arya Shah
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Web Scraped Database of Dad Jokes in the form of signature one-liners that possibly a dad could say and chuckle by himself while the rest of the family facepalms!

The dataset is created by collecting one liner dad jokes from icanhazdadjokes.

Future work includes cleaning reddit data and extracting jokes from popular books published in this genre.
h
joke_explaination
huggingface.co
Updated Aug 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
joke_explaination [Dataset]. https://huggingface.co/datasets/theblackcat102/joke_explaination
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2023
Authors
theblackcat102
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Dataset Name

Dataset Summary

Corpus for testing whether your LLM can explain the joke well. But this is a rather small dataset, if someone can point to a larger ones would be very nice.

Languages

English

Dataset Structure Data Fields

url : link to the explaination

joke : the original joke

explaination : the explaination of the joke

Data Splits

Since its so small, there's no splits just like gsm8k
K
Specific Humor Scandal Database
rdr.kuleuven.be
docx, txt, xlsx
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Władysław Chłopicki; Władysław Chłopicki; Giselinde Kuipers; Giselinde Kuipers; Liisi Laineste; Liisi Laineste; Guillem Castañar; Guillem Castañar; Anastasiya Fiadotava; Anastasiya Fiadotava; Agata Hołobut; Agata Hołobut; Jonas Nicolaï; Jonas Nicolaï (2024). Specific Humor Scandal Database [Dataset]. http://doi.org/10.48804/PTPQVB
Explore at:
docx(10629), xlsx(147455), txt(12721)Available download formats
Unique identifier
https://doi.org/10.48804/PTPQVB
Dataset updated
Dec 2, 2024
Dataset provided by
KU Leuven RDR
Authors
Władysław Chłopicki; Władysław Chłopicki; Giselinde Kuipers; Giselinde Kuipers; Liisi Laineste; Liisi Laineste; Guillem Castañar; Guillem Castañar; Anastasiya Fiadotava; Anastasiya Fiadotava; Agata Hołobut; Agata Hołobut; Jonas Nicolaï; Jonas Nicolaï
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This dataset is developed for the CELSA research project 'Humour and Conflict in the Public Sphere: Communication styles, humour controversies and contested freedoms in contemporary Europe'. The project sets out to conduct an interdisciplinary analysis of the interrelatedness between digital humor and social conflict. The dataset contains data for 550 items of digitally mediated humor (e.g. online memes, cartoons, video's, posts) created in the context of specific cases of socio-political conflict in four European countries (i.e. Belgium, Belarus, Estonia and Poland). The dataset offers coding of linguistic markers such as genre, humor mechanisms and communication style as well as a mapping of the discourse which the humorous items spark on social media. Here, comments made as a reactions to the humor on social media platforms are coded for types of response (e.g. positive, negative, humorous, non-humorous) as well as the incidence of meta-comments (comments on comments) and other linguistic metrics for analysis (e.g. types of speech used in audience reactions). The data was coded indepentently by four researchers with a background in each respective country in 2023-2024. This dataset can be used, for example, to analyse audience reception of digitally mediated humor, or allow the (cross-national) analysis of the impact of different humoristic genres in digital public spheres.
E
Data from: Corpus of daily jokes from the 24ur.com portal Šale24 1.0
live.european-language-grid.eu
binary format
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Corpus of daily jokes from the 24ur.com portal Šale24 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/23698
Explore at:
binary formatAvailable download formats
Dataset updated
Oct 2, 2024
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This is a corpus of 1915 "jokes of the day" ("šala dneva") published by the Slovenian news portal 24ur.com. The jokes were scraped from their archive on September 18th, 2024. The initial list is lightly curated: shorter texts found in the original collection were removed from the corpus since they appear to be illustration captions without the accompanying illustrations.

Readers of the news portal vote on the jokes themselves with thumbs up and thumbs down buttons. The voting results are included as metadata with each joke. Several jokes have been published more than once. Each joke (distinguished based on exact text matches) is identified by a hash of its text and presents a list of voting results for every instance of its publication. The normalised_text field contains text with punctuation corrections. For now, this is limited to replacing '' (two consecutive apostrophes U+0027) with " (a single straight/dumb/vertical quotation mark U+0022). The former (two apostrophes) is consistently used in place of the latter in the original corpus.

Based on the name ("Šala dneva" i.e. "Joke of the day") and observed frequency of posting during September 2024 we assume each entry corresponds to a day starting from the day of data collection counting backwards. Each voting event for has an associated estimated publication date calculated with the above algorithm.

The jokes are linguistically annotated with CLASSLA-Stanza (https://github.com/clarinsi/classla), using the models for standard Slovenian. The JSONL file contains entries representing individual jokes containing: - a hash of the original joke text used for duplicate identification (key: hash) - original scraped text (key: original_text) - normalised text (key: normalised_text) - linguistically annotated normalised text in CoNLL-U format (key: processed_text) - a list of vote objects containing joke vote metadata (key: votes) - votes for (key: votes.for) - votes against (key: votes.against) - estimated dates of joke publication and voting (key: estimated_date)

The corpus contains 16658 sentences, 129063 tokens, and 662 recognised named entities.
K
General European Public Humour Database
rdr.kuleuven.be
docx, txt, xlsx
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Władysław Chłopicki; Władysław Chłopicki; Giselinde Kuipers; Giselinde Kuipers; Liisi Laineste; Liisi Laineste; Guillem Castañar; Guillem Castañar; Anastasiya Fiadotava; Anastasiya Fiadotava; Agata Hołobut; Agata Hołobut; Jonas Nicolaï; Jonas Nicolaï (2024). General European Public Humour Database [Dataset]. http://doi.org/10.48804/X9YTI5
Explore at:
docx(22787), txt(8349), xlsx(278179)Available download formats
Unique identifier
https://doi.org/10.48804/X9YTI5
Dataset updated
Jul 15, 2024
Dataset provided by
KU Leuven RDR
Authors
Władysław Chłopicki; Władysław Chłopicki; Giselinde Kuipers; Giselinde Kuipers; Liisi Laineste; Liisi Laineste; Guillem Castañar; Guillem Castañar; Anastasiya Fiadotava; Anastasiya Fiadotava; Agata Hołobut; Agata Hołobut; Jonas Nicolaï; Jonas Nicolaï
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This dataset is developed for a research project “Humour Scandals (Hu-Sca): A cross-national analysis of humour controversies in Europe”, funded by KU Leuven, Una Europa Research Acceleration Fund. The data contains an overview of humor scandals i.e. public controversies originating from humor and dealing with the boundaries of transgressive humor in public debate and their reception in legacy media for eight European countries between 1990 and 2022. The data contains quantatively coded descriptive markers of each humor scandal (e.g. nature of norm transgression, actors involved, duration, timespan) as well as qualitative analysis of the way that the humor scandal was either justified or condemned in national legacy media. This data can be used for the analysis of the role of humor in socio-political conflict and the role of media in the creation and mediation of humor-related controversies.
Email Jokes 1998-2004
services.fsd.tuni.fi
zip
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aro, Jari (2025). Email Jokes 1998-2004 [Dataset]. http://doi.org/10.60686/t-fsd1271
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.60686/t-fsd1271
Dataset updated
Jan 9, 2025
Dataset provided by
Finnish Social Science Data Archive
Authors
Aro, Jari
Description
The archived data consist of jokes, anecdotes and other humorous texts distributed through email messages. The researcher sent the request for email humour to the staff members of the Department of Sociology and Social Psychology at the University of Tampere, Finland, in February 2003. The staff members in their turn distributed the request further. Texts were received from university staff members and students as well as from outsiders. Data collection continued till the year 2004. The total number of email messages received was 217, some of which contained more than one joke or anecdote. The jokes/anecdotes were mostly in Finnish, but approximately 20% were in English. The themes of the email messages varied greatly. Many were connected to current events, for instance, the Iraq war, September 11 terrorist attacks in the USA, and the doping scandal of Finnish skiers in 2001. Other recurring themes included sexuality, gender and ethnicity stereotypes, and professional jokes. As is typical in email humor, the original creator of the jokes/anecdotes often remained unknown. The dataset is only available in the original languages.
h
one-million-reddit-jokes
huggingface.co
Updated Nov 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SocialGrep (2021). one-million-reddit-jokes [Dataset]. https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 1, 2021
Authors
SocialGrep
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for one-million-reddit-jokes

Dataset Summary

This corpus contains a million posts from /r/jokes. Posts are annotated with their score.

Languages

Mainly English.

Dataset Structure Data Instances

A data point is a Reddit post.

Data Fields

'type': the type of the data point. Can be 'post' or 'comment'. 'id': the base-36 Reddit ID of the data point. Unique when combined with type. 'subreddit.id': the base-36 Reddit ID… See the full description on the dataset page: https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes.
Jester Collaborative Filtering Dataset
kaggle.com
Updated Jun 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aakaash Jois (2017). Jester Collaborative Filtering Dataset [Dataset]. https://www.kaggle.com/aakaashjois/jester-collaborative-filtering-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aakaash Jois
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

The funniness of joke is very subjective. Having more than 70,000 users rate jokes, can an algorithm be written to identify the universally funny joke?

Content

The data file are in .csv format.

The complete dataset is 100 rows and 73422 columns.

The complete dataset is split into 3 .csv files.

JokeText.csv contains the Id of the joke and the complete joke string.

UserRatings1.csv contains the ratings provided by the first 36710 users.

UserRatings2.csv contains the ratings provided by the last 36711 users.

The dataset is arranged such that the initial users have rated higher number of jokes than the later users.

The rating is a real value between -10.0 and +10.0.

The empty values indicate that the user has not provided any rating for that particular joke.

Acknowledgements

The dataset is associated with the below research paper.

Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151. July 2001.

More information and datasets can be found at http://eigentaste.berkeley.edu/dataset/

Inspiration

Since funniness is a very subjective matter, it will be very interesting to see if data science can bring out the details on what makes something funny.
t
Humor Creator Distribution Data
topyappers.com
json
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TopYappers (2025). Humor Creator Distribution Data [Dataset]. https://www.topyappers.com/humor
Explore at:
jsonAvailable download formats
Dataset updated
Jul 24, 2025
Dataset authored and provided by
TopYappers
Variables measured
Creator Count, Follower Range
Description
Statistical distribution of social media creators and influencers in the Humor category
Joke event ag c/o siren studios highland USA Import & Buyer Data
seair.co.in
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim, Joke event ag c/o siren studios highland USA Import & Buyer Data [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
United States
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Z
Data from: SemEval-2020 Task 7: Assessing Humor in Edited News Headlines
data.niaid.nih.gov
Updated Aug 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krumm, John (2020). SemEval-2020 Task 7: Assessing Humor in Edited News Headlines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3969508
Explore at:
Dataset updated
Aug 2, 2020
Dataset provided by
Gamon, Michael
Kautz, Henry
Krumm, John
Hossain, Nabil
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the task dataset for SemEval-2020 Task 7: Assessing Humor in Edited News Headlines.

The task’s dataset contains news headlines in which short edits were applied to make them funny, and the funniness of these edited headlines was rated using crowdsourcing. This task includes two subtasks, the first of which is to estimate the funniness of headlines on a humor scale in the interval 0-3. The second subtask is to predict, for a pair of edited versions of the same original headline, which is the funnier version.

CodaLab page hosting the competition: https://competitions.codalab.org/competitions/20970

Starter Github code (scripts for running baseline and evaluation): https://github.com/n-hossain/semeval-2020-task-7-humicroedit

Task mailing list:

https://groups.google.com/forum/#!forum/semeval-2020-task-7-all

ZIP contents:

Folders: - subtask-1: Dataset for the funniness regression subtask. - subtask-2: Dataset for the "Funnier of the Two" classification subtask.

Files: - {train, dev, test}.csv: the task's dataset including labels - train_funlines.csv: additional training data gathered from the FunLines competition (https://funlines.co) - baseline.zip: contains csv file which is the output of the BASELINE system. This is a template of the output format that can be submitted to CodaLab for scoring.

Reference

Please cite the task paper when using this dataset:

Nabil Hossain, John Krumm, Michael Gamon and Henry Kautz. 2020. Semeval-2020 Task 7: Assessing Humor in Edited News Headlines. In Proceedings of International Workshop on Semantic Evaluation (SemEval-2020).

BIBTEX: @InProceedings{hossainSemEval2020Task7, author = {Hossain, Nabil and Krumm, John and Gamon, Michael and Kautz,Henry}, title = {SemEval-2020 {T}ask 7: {A}ssessing Humor in Edited News Headlines}, booktitle = {Proceedings of the 14th International Workshop on Semantic Evaluation ({S}em{E}val-2020)}, address = {Barcelona, Spain}, year = {2020}}
United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&...
ceicdata.com
Updated Feb 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2022). United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt& Accs [Dataset]. https://www.ceicdata.com/en/united-states/imports-by-commodity-6-digit-hs-code-hs-85-to-99/imports-festve-excl-chrtmscrnivl-magic-trk-joke-artpt-accs
Explore at:
Dataset updated
Feb 6, 2022
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 1, 2024 - Jan 1, 2025
Area covered
United States
Description
United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt& Accs data was reported at 98.281 USD mn in Jan 2025. This records an increase from the previous number of 80.876 USD mn for Dec 2024. United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt& Accs data is updated monthly, averaging 52.122 USD mn from Jan 2002 (Median) to Jan 2025, with 277 observations. The data reached an all-time high of 418.819 USD mn in Jul 2022 and a record low of 10.093 USD mn in Mar 2002. United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt& Accs data remains active status in CEIC and is reported by U.S. Census Bureau. The data is categorized under Global Database’s United States – Table US.JA136: Imports: by Commodity: 6 Digit HS Code: HS 85 to 99.
a
Self-repair jokes in Plautus (quantitative data)
researchportal.amu.edu.pl
Updated Dec 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Self-repair jokes in Plautus (quantitative data) [Dataset]. http://doi.org/10.60629/j236-3333
Explore at:
Unique identifier
https://doi.org/10.60629/j236-3333
Dataset updated
Dec 1, 2023
Description
Quantitative data concerning self-repair jokes in the comedies by Plautus.
United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs...
ceicdata.com
Updated Feb 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2022). United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs [Dataset]. https://www.ceicdata.com/en/united-states/exports-by-commodity-6-digit-hs-code-hs-85-to-98/exports-festve-excl-chrtmscrnivl-magic-trk-joke-artptaccs
Explore at:
Dataset updated
Feb 6, 2022
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 1, 2024 - Jan 1, 2025
Area covered
United States
Description
United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs data was reported at 8.068 USD mn in Jan 2025. This records an increase from the previous number of 6.434 USD mn for Dec 2024. United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs data is updated monthly, averaging 8.757 USD mn from Jan 2002 (Median) to Jan 2025, with 277 observations. The data reached an all-time high of 27.291 USD mn in Sep 2011 and a record low of 2.349 USD mn in Jan 2003. United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs data remains active status in CEIC and is reported by U.S. Census Bureau. The data is categorized under Global Database’s United States – Table US.JA027: Exports: by Commodity: 6 Digit HS Code: HS 85 to 98.
f
Data from: Humor in Parenting: Does It Have a Role?
figshare.com
pdf
Updated Aug 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucy Emery; Terri Smith; Benjamin Levi (2022). Humor in Parenting: Does It Have a Role? [Dataset]. http://doi.org/10.6084/m9.figshare.20404107.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20404107.v2
Dataset updated
Aug 12, 2022
Dataset provided by
figshare
Authors
Lucy Emery; Terri Smith; Benjamin Levi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Objectives: To gather pilot data on the use of humor in the raising of children. Methodology: We developed and field-tested a 10-item survey to measure people’s experiences being raised with humor, and their views regarding humor as a parenting tool. Responses were aggregated into Disagree, Indeterminate, and Agree, and analyzed using standard statistical methods. Results: Of the 312 respondents, most identified as male (63.6%) and white (76.6%); and 11.3% reported being 18-25 years old, 49.4% 26-35 years old, and 39.4% 36-45 years old. The majority reported that: the people who raised them used humor in their parenting (55.2%); humor could be an effective parenting tool (71.8%); humor as a parenting tool has more potential benefit than harm (63.3%); they either use or plan to use humor in parenting their own children (61.8%); and they would value a course on how to utilize humor in parenting (69.7%). Conclusions: In this pilot study, respondents of child-bearing/rearing age reported positive views about humor as a parenting tool.
Humour practices and European attitudes towards democracy - data...
zenodo.org
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcos Engelken-Jorge; Marcos Engelken-Jorge; Carmelo Moreno; Carmelo Moreno; Aitor Castañeda-Zumeta; Aitor Castañeda-Zumeta; Anastasiya Astapova; Anastasiya Astapova; Andrew Bricker; Andrew Bricker; Nina Cingerova; Nina Cingerova; Irina Dulebova; Irina Dulebova; Julia Fleischhack; Julia Fleischhack; Alberto Godioli; Alberto Godioli; Amber Kempynck; Amber Kempynck; Katarína Motyková; Katarína Motyková; Anna Sámelová; Anna Sámelová; Ismet Suleimanov; Ismet Suleimanov; Jeroen Vandaele; Jeroen Vandaele; Oğuzhan Zobar; Oğuzhan Zobar; Isaza-Ibarra Luisa Fernanda; Isaza-Ibarra Luisa Fernanda (2025). Humour practices and European attitudes towards democracy - data documentation [Dataset]. http://doi.org/10.5281/zenodo.15756409
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15756409
Dataset updated
Jul 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marcos Engelken-Jorge; Marcos Engelken-Jorge; Carmelo Moreno; Carmelo Moreno; Aitor Castañeda-Zumeta; Aitor Castañeda-Zumeta; Anastasiya Astapova; Anastasiya Astapova; Andrew Bricker; Andrew Bricker; Nina Cingerova; Nina Cingerova; Irina Dulebova; Irina Dulebova; Julia Fleischhack; Julia Fleischhack; Alberto Godioli; Alberto Godioli; Amber Kempynck; Amber Kempynck; Katarína Motyková; Katarína Motyková; Anna Sámelová; Anna Sámelová; Ismet Suleimanov; Ismet Suleimanov; Jeroen Vandaele; Jeroen Vandaele; Oğuzhan Zobar; Oğuzhan Zobar; Isaza-Ibarra Luisa Fernanda; Isaza-Ibarra Luisa Fernanda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 30, 2025
Description
This document outlines the scope and key features of the 'Humour Practices and European Attitudes Toward Democracy' dataset, including the procedure followed for its production.
The database collects and classifies relevant studies on humour practices, attitudes toward democracy, and modes of civic engagement in the six countries of the DELIAH consortium: Belgium, Estonia, Germany, the Netherlands, Slovakia, and Spain.

The dataset pursues two goals:
1) it contributes to a subsequent meta-analysis of humour studies, democratic participation, and civic engagement in online and offline spaces across Europe, which will be carried out by the DELIAH consortium, and
2) it serves as a collective resource for additional DELIAH project tasks, including the design of focus groups and surveys.

More broadly, the dataset has also been designed to appeal to scholars outside of the DELIAH consortium who work at the intersection of humour and democracy.
BigStirlitz new Russian jokes about Stirlitz
kaggle.com
Updated Jul 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikuson (2022). BigStirlitz new Russian jokes about Stirlitz [Dataset]. https://www.kaggle.com/datasets/nikuson/bigstirlitz
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nikuson
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
Russia
Description
Dataset

This dataset was created by Nikuson

Released under ODC Attribution License (ODC-By)

Contents
h
humor-labeled-data
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rwan Ashraf, humor-labeled-data [Dataset]. https://huggingface.co/datasets/RwanAshraf/humor-labeled-data
Explore at:
Authors
Rwan Ashraf
Description
RwanAshraf/humor-labeled-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Data from: Humor Receptivity Data
figshare.com
scholarship.miami.edu
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Carcioppolo (2025). Humor Receptivity Data [Dataset]. http://doi.org/10.6084/m9.figshare.28458647.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.28458647.v1
Dataset updated
Feb 21, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Nick Carcioppolo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are the data sets for a study on audience segmentation to predict receptivity to humorous persuasive messages

Facebook

Twitter

Click to copy link

Link copied

Cite

yuvraj sharma (2024). short_jokes [Dataset]. https://huggingface.co/datasets/ysharma/short_jokes

short_jokes

ysharma/short_jokes

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 22, 2024

Authors

yuvraj sharma

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Context Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes. You can visit the Github… See the full description on the dataset page: https://huggingface.co/datasets/ysharma/short_jokes.

Clear search

Close search

Google apps

Main menu

short_jokes

Dad-A-Base Of Jokes

joke_explaination

Specific Humor Scandal Database

Data from: Corpus of daily jokes from the 24ur.com portal Šale24 1.0

General European Public Humour Database

Email Jokes 1998-2004

one-million-reddit-jokes

Jester Collaborative Filtering Dataset

Context

Content

Acknowledgements

Inspiration

Humor Creator Distribution Data

Joke event ag c/o siren studios highland USA Import & Buyer Data

Data from: SemEval-2020 Task 7: Assessing Humor in Edited News Headlines

https://groups.google.com/forum/#!forum/semeval-2020-task-7-all

ZIP contents:

United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&...

Self-repair jokes in Plautus (quantitative data)

United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs...

Data from: Humor in Parenting: Does It Have a Role?

Humour practices and European attitudes towards democracy - data...

BigStirlitz new Russian jokes about Stirlitz

Dataset

Contents

humor-labeled-data

Data from: Humor Receptivity Data

short_jokesSee More Versions

ysharma/short_jokes

short_jokes