70 datasets found
  1. h

    short_jokes

    • huggingface.co
    Updated Feb 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yuvraj sharma (2024). short_jokes [Dataset]. https://huggingface.co/datasets/ysharma/short_jokes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 22, 2024
    Authors
    yuvraj sharma
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes. You can visit the Github… See the full description on the dataset page: https://huggingface.co/datasets/ysharma/short_jokes.

  2. Dad-A-Base Of Jokes

    • kaggle.com
    Updated Mar 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arya Shah (2023). Dad-A-Base Of Jokes [Dataset]. https://www.kaggle.com/datasets/aryashah2k/dad-a-base-of-jokes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arya Shah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Web Scraped Database of Dad Jokes in the form of signature one-liners that possibly a dad could say and chuckle by himself while the rest of the family facepalms!

    The dataset is created by collecting one liner dad jokes from icanhazdadjokes.

    Future work includes cleaning reddit data and extracting jokes from popular books published in this genre.

  3. h

    joke_explaination

    • huggingface.co
    Updated Aug 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    joke_explaination [Dataset]. https://huggingface.co/datasets/theblackcat102/joke_explaination
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2023
    Authors
    theblackcat102
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Dataset Name

      Dataset Summary
    

    Corpus for testing whether your LLM can explain the joke well. But this is a rather small dataset, if someone can point to a larger ones would be very nice.

      Languages
    

    English

      Dataset Structure
    
    
    
    
    
      Data Fields
    

    url : link to the explaination

    joke : the original joke

    explaination : the explaination of the joke

      Data Splits
    

    Since its so small, there's no splits just like gsm8k

  4. K

    Specific Humor Scandal Database

    • rdr.kuleuven.be
    docx, txt, xlsx
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Władysław Chłopicki; Władysław Chłopicki; Giselinde Kuipers; Giselinde Kuipers; Liisi Laineste; Liisi Laineste; Guillem Castañar; Guillem Castañar; Anastasiya Fiadotava; Anastasiya Fiadotava; Agata Hołobut; Agata Hołobut; Jonas Nicolaï; Jonas Nicolaï (2024). Specific Humor Scandal Database [Dataset]. http://doi.org/10.48804/PTPQVB
    Explore at:
    docx(10629), xlsx(147455), txt(12721)Available download formats
    Dataset updated
    Dec 2, 2024
    Dataset provided by
    KU Leuven RDR
    Authors
    Władysław Chłopicki; Władysław Chłopicki; Giselinde Kuipers; Giselinde Kuipers; Liisi Laineste; Liisi Laineste; Guillem Castañar; Guillem Castañar; Anastasiya Fiadotava; Anastasiya Fiadotava; Agata Hołobut; Agata Hołobut; Jonas Nicolaï; Jonas Nicolaï
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset is developed for the CELSA research project 'Humour and Conflict in the Public Sphere: Communication styles, humour controversies and contested freedoms in contemporary Europe'. The project sets out to conduct an interdisciplinary analysis of the interrelatedness between digital humor and social conflict. The dataset contains data for 550 items of digitally mediated humor (e.g. online memes, cartoons, video's, posts) created in the context of specific cases of socio-political conflict in four European countries (i.e. Belgium, Belarus, Estonia and Poland). The dataset offers coding of linguistic markers such as genre, humor mechanisms and communication style as well as a mapping of the discourse which the humorous items spark on social media. Here, comments made as a reactions to the humor on social media platforms are coded for types of response (e.g. positive, negative, humorous, non-humorous) as well as the incidence of meta-comments (comments on comments) and other linguistic metrics for analysis (e.g. types of speech used in audience reactions). The data was coded indepentently by four researchers with a background in each respective country in 2023-2024. This dataset can be used, for example, to analyse audience reception of digitally mediated humor, or allow the (cross-national) analysis of the impact of different humoristic genres in digital public spheres.

  5. E

    Data from: Corpus of daily jokes from the 24ur.com portal Šale24 1.0

    • live.european-language-grid.eu
    binary format
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Corpus of daily jokes from the 24ur.com portal Šale24 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/23698
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Oct 2, 2024
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This is a corpus of 1915 "jokes of the day" ("šala dneva") published by the Slovenian news portal 24ur.com. The jokes were scraped from their archive on September 18th, 2024. The initial list is lightly curated: shorter texts found in the original collection were removed from the corpus since they appear to be illustration captions without the accompanying illustrations.

    Readers of the news portal vote on the jokes themselves with thumbs up and thumbs down buttons. The voting results are included as metadata with each joke. Several jokes have been published more than once. Each joke (distinguished based on exact text matches) is identified by a hash of its text and presents a list of voting results for every instance of its publication. The normalised_text field contains text with punctuation corrections. For now, this is limited to replacing '' (two consecutive apostrophes U+0027) with " (a single straight/dumb/vertical quotation mark U+0022). The former (two apostrophes) is consistently used in place of the latter in the original corpus.

    Based on the name ("Šala dneva" i.e. "Joke of the day") and observed frequency of posting during September 2024 we assume each entry corresponds to a day starting from the day of data collection counting backwards. Each voting event for has an associated estimated publication date calculated with the above algorithm.

    The jokes are linguistically annotated with CLASSLA-Stanza (https://github.com/clarinsi/classla), using the models for standard Slovenian. The JSONL file contains entries representing individual jokes containing: - a hash of the original joke text used for duplicate identification (key: hash) - original scraped text (key: original_text) - normalised text (key: normalised_text) - linguistically annotated normalised text in CoNLL-U format (key: processed_text) - a list of vote objects containing joke vote metadata (key: votes) - votes for (key: votes.for) - votes against (key: votes.against) - estimated dates of joke publication and voting (key: estimated_date)

    The corpus contains 16658 sentences, 129063 tokens, and 662 recognised named entities.

  6. K

    General European Public Humour Database

    • rdr.kuleuven.be
    docx, txt, xlsx
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Władysław Chłopicki; Władysław Chłopicki; Giselinde Kuipers; Giselinde Kuipers; Liisi Laineste; Liisi Laineste; Guillem Castañar; Guillem Castañar; Anastasiya Fiadotava; Anastasiya Fiadotava; Agata Hołobut; Agata Hołobut; Jonas Nicolaï; Jonas Nicolaï (2024). General European Public Humour Database [Dataset]. http://doi.org/10.48804/X9YTI5
    Explore at:
    docx(22787), txt(8349), xlsx(278179)Available download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    KU Leuven RDR
    Authors
    Władysław Chłopicki; Władysław Chłopicki; Giselinde Kuipers; Giselinde Kuipers; Liisi Laineste; Liisi Laineste; Guillem Castañar; Guillem Castañar; Anastasiya Fiadotava; Anastasiya Fiadotava; Agata Hołobut; Agata Hołobut; Jonas Nicolaï; Jonas Nicolaï
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset is developed for a research project “Humour Scandals (Hu-Sca): A cross-national analysis of humour controversies in Europe”, funded by KU Leuven, Una Europa Research Acceleration Fund. The data contains an overview of humor scandals i.e. public controversies originating from humor and dealing with the boundaries of transgressive humor in public debate and their reception in legacy media for eight European countries between 1990 and 2022. The data contains quantatively coded descriptive markers of each humor scandal (e.g. nature of norm transgression, actors involved, duration, timespan) as well as qualitative analysis of the way that the humor scandal was either justified or condemned in national legacy media. This data can be used for the analysis of the role of humor in socio-political conflict and the role of media in the creation and mediation of humor-related controversies.

  7. Email Jokes 1998-2004

    • services.fsd.tuni.fi
    zip
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aro, Jari (2025). Email Jokes 1998-2004 [Dataset]. http://doi.org/10.60686/t-fsd1271
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2025
    Dataset provided by
    Finnish Social Science Data Archive
    Authors
    Aro, Jari
    Description

    The archived data consist of jokes, anecdotes and other humorous texts distributed through email messages. The researcher sent the request for email humour to the staff members of the Department of Sociology and Social Psychology at the University of Tampere, Finland, in February 2003. The staff members in their turn distributed the request further. Texts were received from university staff members and students as well as from outsiders. Data collection continued till the year 2004. The total number of email messages received was 217, some of which contained more than one joke or anecdote. The jokes/anecdotes were mostly in Finnish, but approximately 20% were in English. The themes of the email messages varied greatly. Many were connected to current events, for instance, the Iraq war, September 11 terrorist attacks in the USA, and the doping scandal of Finnish skiers in 2001. Other recurring themes included sexuality, gender and ethnicity stereotypes, and professional jokes. As is typical in email humor, the original creator of the jokes/anecdotes often remained unknown. The dataset is only available in the original languages.

  8. h

    one-million-reddit-jokes

    • huggingface.co
    Updated Nov 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SocialGrep (2021). one-million-reddit-jokes [Dataset]. https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 1, 2021
    Authors
    SocialGrep
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for one-million-reddit-jokes

      Dataset Summary
    

    This corpus contains a million posts from /r/jokes. Posts are annotated with their score.

      Languages
    

    Mainly English.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    A data point is a Reddit post.

      Data Fields
    

    'type': the type of the data point. Can be 'post' or 'comment'. 'id': the base-36 Reddit ID of the data point. Unique when combined with type. 'subreddit.id': the base-36 Reddit ID… See the full description on the dataset page: https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes.

  9. Jester Collaborative Filtering Dataset

    • kaggle.com
    Updated Jun 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aakaash Jois (2017). Jester Collaborative Filtering Dataset [Dataset]. https://www.kaggle.com/aakaashjois/jester-collaborative-filtering-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aakaash Jois
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    The funniness of joke is very subjective. Having more than 70,000 users rate jokes, can an algorithm be written to identify the universally funny joke?

    Content

    • The data file are in .csv format.
    • The complete dataset is 100 rows and 73422 columns.
    • The complete dataset is split into 3 .csv files.
    • JokeText.csv contains the Id of the joke and the complete joke string.
    • UserRatings1.csv contains the ratings provided by the first 36710 users.
    • UserRatings2.csv contains the ratings provided by the last 36711 users.
    • The dataset is arranged such that the initial users have rated higher number of jokes than the later users.
    • The rating is a real value between -10.0 and +10.0.
    • The empty values indicate that the user has not provided any rating for that particular joke.

    Acknowledgements

    The dataset is associated with the below research paper.

    Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151. July 2001.

    More information and datasets can be found at http://eigentaste.berkeley.edu/dataset/

    Inspiration

    Since funniness is a very subjective matter, it will be very interesting to see if data science can bring out the details on what makes something funny.

  10. t

    Humor Creator Distribution Data

    • topyappers.com
    json
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TopYappers (2025). Humor Creator Distribution Data [Dataset]. https://www.topyappers.com/humor
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset authored and provided by
    TopYappers
    Variables measured
    Creator Count, Follower Range
    Description

    Statistical distribution of social media creators and influencers in the Humor category

  11. Joke event ag c/o siren studios highland USA Import & Buyer Data

    • seair.co.in
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, Joke event ag c/o siren studios highland USA Import & Buyer Data [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  12. Z

    Data from: SemEval-2020 Task 7: Assessing Humor in Edited News Headlines

    • data.niaid.nih.gov
    Updated Aug 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krumm, John (2020). SemEval-2020 Task 7: Assessing Humor in Edited News Headlines [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3969508
    Explore at:
    Dataset updated
    Aug 2, 2020
    Dataset provided by
    Gamon, Michael
    Kautz, Henry
    Krumm, John
    Hossain, Nabil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the task dataset for SemEval-2020 Task 7: Assessing Humor in Edited News Headlines.

    The task’s dataset contains news headlines in which short edits were applied to make them funny, and the funniness of these edited headlines was rated using crowdsourcing. This task includes two subtasks, the first of which is to estimate the funniness of headlines on a humor scale in the interval 0-3. The second subtask is to predict, for a pair of edited versions of the same original headline, which is the funnier version.

    CodaLab page hosting the competition: https://competitions.codalab.org/competitions/20970

    Starter Github code (scripts for running baseline and evaluation): https://github.com/n-hossain/semeval-2020-task-7-humicroedit

    Task mailing list:

    https://groups.google.com/forum/#!forum/semeval-2020-task-7-all

    ZIP contents:

    Folders: - subtask-1: Dataset for the funniness regression subtask. - subtask-2: Dataset for the "Funnier of the Two" classification subtask.

    Files: - {train, dev, test}.csv: the task's dataset including labels - train_funlines.csv: additional training data gathered from the FunLines competition (https://funlines.co) - baseline.zip: contains csv file which is the output of the BASELINE system. This is a template of the output format that can be submitted to CodaLab for scoring.

    Reference

    Please cite the task paper when using this dataset:

    Nabil Hossain, John Krumm, Michael Gamon and Henry Kautz. 2020. Semeval-2020 Task 7: Assessing Humor in Edited News Headlines. In Proceedings of International Workshop on Semantic Evaluation (SemEval-2020).

    BIBTEX: @InProceedings{hossainSemEval2020Task7, author = {Hossain, Nabil and Krumm, John and Gamon, Michael and Kautz,Henry}, title = {SemEval-2020 {T}ask 7: {A}ssessing Humor in Edited News Headlines}, booktitle = {Proceedings of the 14th International Workshop on Semantic Evaluation ({S}em{E}val-2020)}, address = {Barcelona, Spain}, year = {2020}}

  13. United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&...

    • ceicdata.com
    Updated Feb 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2022). United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt& Accs [Dataset]. https://www.ceicdata.com/en/united-states/imports-by-commodity-6-digit-hs-code-hs-85-to-99/imports-festve-excl-chrtmscrnivl-magic-trk-joke-artpt-accs
    Explore at:
    Dataset updated
    Feb 6, 2022
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 1, 2024 - Jan 1, 2025
    Area covered
    United States
    Description

    United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt& Accs data was reported at 98.281 USD mn in Jan 2025. This records an increase from the previous number of 80.876 USD mn for Dec 2024. United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt& Accs data is updated monthly, averaging 52.122 USD mn from Jan 2002 (Median) to Jan 2025, with 277 observations. The data reached an all-time high of 418.819 USD mn in Jul 2022 and a record low of 10.093 USD mn in Mar 2002. United States Imports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt& Accs data remains active status in CEIC and is reported by U.S. Census Bureau. The data is categorized under Global Database’s United States – Table US.JA136: Imports: by Commodity: 6 Digit HS Code: HS 85 to 99.

  14. a

    Self-repair jokes in Plautus (quantitative data)

    • researchportal.amu.edu.pl
    Updated Dec 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Self-repair jokes in Plautus (quantitative data) [Dataset]. http://doi.org/10.60629/j236-3333
    Explore at:
    Dataset updated
    Dec 1, 2023
    Description

    Quantitative data concerning self-repair jokes in the comedies by Plautus.

  15. United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs...

    • ceicdata.com
    Updated Feb 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2022). United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs [Dataset]. https://www.ceicdata.com/en/united-states/exports-by-commodity-6-digit-hs-code-hs-85-to-98/exports-festve-excl-chrtmscrnivl-magic-trk-joke-artptaccs
    Explore at:
    Dataset updated
    Feb 6, 2022
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 1, 2024 - Jan 1, 2025
    Area covered
    United States
    Description

    United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs data was reported at 8.068 USD mn in Jan 2025. This records an increase from the previous number of 6.434 USD mn for Dec 2024. United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs data is updated monthly, averaging 8.757 USD mn from Jan 2002 (Median) to Jan 2025, with 277 observations. The data reached an all-time high of 27.291 USD mn in Sep 2011 and a record low of 2.349 USD mn in Jan 2003. United States Exports: Festve, excl Chrtms;Crnivl Magic Trk Joke Art;Pt&Accs data remains active status in CEIC and is reported by U.S. Census Bureau. The data is categorized under Global Database’s United States – Table US.JA027: Exports: by Commodity: 6 Digit HS Code: HS 85 to 98.

  16. f

    Data from: Humor in Parenting: Does It Have a Role?

    • figshare.com
    pdf
    Updated Aug 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucy Emery; Terri Smith; Benjamin Levi (2022). Humor in Parenting: Does It Have a Role? [Dataset]. http://doi.org/10.6084/m9.figshare.20404107.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 12, 2022
    Dataset provided by
    figshare
    Authors
    Lucy Emery; Terri Smith; Benjamin Levi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Objectives: To gather pilot data on the use of humor in the raising of children. Methodology: We developed and field-tested a 10-item survey to measure people’s experiences being raised with humor, and their views regarding humor as a parenting tool. Responses were aggregated into Disagree, Indeterminate, and Agree, and analyzed using standard statistical methods. Results: Of the 312 respondents, most identified as male (63.6%) and white (76.6%); and 11.3% reported being 18-25 years old, 49.4% 26-35 years old, and 39.4% 36-45 years old. The majority reported that: the people who raised them used humor in their parenting (55.2%); humor could be an effective parenting tool (71.8%); humor as a parenting tool has more potential benefit than harm (63.3%); they either use or plan to use humor in parenting their own children (61.8%); and they would value a course on how to utilize humor in parenting (69.7%). Conclusions: In this pilot study, respondents of child-bearing/rearing age reported positive views about humor as a parenting tool.

  17. Humour practices and European attitudes towards democracy - data...

    • zenodo.org
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcos Engelken-Jorge; Marcos Engelken-Jorge; Carmelo Moreno; Carmelo Moreno; Aitor Castañeda-Zumeta; Aitor Castañeda-Zumeta; Anastasiya Astapova; Anastasiya Astapova; Andrew Bricker; Andrew Bricker; Nina Cingerova; Nina Cingerova; Irina Dulebova; Irina Dulebova; Julia Fleischhack; Julia Fleischhack; Alberto Godioli; Alberto Godioli; Amber Kempynck; Amber Kempynck; Katarína Motyková; Katarína Motyková; Anna Sámelová; Anna Sámelová; Ismet Suleimanov; Ismet Suleimanov; Jeroen Vandaele; Jeroen Vandaele; Oğuzhan Zobar; Oğuzhan Zobar; Isaza-Ibarra Luisa Fernanda; Isaza-Ibarra Luisa Fernanda (2025). Humour practices and European attitudes towards democracy - data documentation [Dataset]. http://doi.org/10.5281/zenodo.15756409
    Explore at:
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marcos Engelken-Jorge; Marcos Engelken-Jorge; Carmelo Moreno; Carmelo Moreno; Aitor Castañeda-Zumeta; Aitor Castañeda-Zumeta; Anastasiya Astapova; Anastasiya Astapova; Andrew Bricker; Andrew Bricker; Nina Cingerova; Nina Cingerova; Irina Dulebova; Irina Dulebova; Julia Fleischhack; Julia Fleischhack; Alberto Godioli; Alberto Godioli; Amber Kempynck; Amber Kempynck; Katarína Motyková; Katarína Motyková; Anna Sámelová; Anna Sámelová; Ismet Suleimanov; Ismet Suleimanov; Jeroen Vandaele; Jeroen Vandaele; Oğuzhan Zobar; Oğuzhan Zobar; Isaza-Ibarra Luisa Fernanda; Isaza-Ibarra Luisa Fernanda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 30, 2025
    Description

    This document outlines the scope and key features of the 'Humour Practices and European Attitudes Toward Democracy' dataset, including the procedure followed for its production.
    The database collects and classifies relevant studies on humour practices, attitudes toward democracy, and modes of civic engagement in the six countries of the DELIAH consortium: Belgium, Estonia, Germany, the Netherlands, Slovakia, and Spain.

    The dataset pursues two goals:
    1) it contributes to a subsequent meta-analysis of humour studies, democratic participation, and civic engagement in online and offline spaces across Europe, which will be carried out by the DELIAH consortium, and
    2) it serves as a collective resource for additional DELIAH project tasks, including the design of focus groups and surveys.

    More broadly, the dataset has also been designed to appeal to scholars outside of the DELIAH consortium who work at the intersection of humour and democracy.

  18. BigStirlitz new Russian jokes about Stirlitz

    • kaggle.com
    Updated Jul 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikuson (2022). BigStirlitz new Russian jokes about Stirlitz [Dataset]. https://www.kaggle.com/datasets/nikuson/bigstirlitz
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nikuson
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Area covered
    Russia
    Description

    Dataset

    This dataset was created by Nikuson

    Released under ODC Attribution License (ODC-By)

    Contents

  19. h

    humor-labeled-data

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rwan Ashraf, humor-labeled-data [Dataset]. https://huggingface.co/datasets/RwanAshraf/humor-labeled-data
    Explore at:
    Authors
    Rwan Ashraf
    Description

    RwanAshraf/humor-labeled-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. Data from: Humor Receptivity Data

    • figshare.com
    • scholarship.miami.edu
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Carcioppolo (2025). Humor Receptivity Data [Dataset]. http://doi.org/10.6084/m9.figshare.28458647.v1
    Explore at:
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Nick Carcioppolo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are the data sets for a study on audience segmentation to predict receptivity to humorous persuasive messages

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
yuvraj sharma (2024). short_jokes [Dataset]. https://huggingface.co/datasets/ysharma/short_jokes

short_jokes

ysharma/short_jokes

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 22, 2024
Authors
yuvraj sharma
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Context Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes. You can visit the Github… See the full description on the dataset page: https://huggingface.co/datasets/ysharma/short_jokes.

Search
Clear search
Close search
Google apps
Main menu