100+ datasets found
  1. h

    short-jokes

    • huggingface.co
    • kaggle.com
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fraser Greenlee (2021). short-jokes [Dataset]. https://huggingface.co/datasets/Fraser/short-jokes
    Explore at:
    Dataset updated
    Mar 9, 2021
    Authors
    Fraser Greenlee
    Description

    Copy of Kaggle dataset, adding to Huggingface for ease of use.

    Description from Kaggle:

    Context

    Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes.

    Visit my Github repository for more information regarding collection of data and the scripts used.

    Content

    This dataset is in the form of a csv file containing 231,657 jokes. Length of jokes ranges from 10 to 200 characters. Each line in the file contains a unique ID and joke.

    Disclaimer

    It has been attempted to keep the jokes as clean as possible. Since the data has been collected by scraping websites, it is possible that there may be a few jokes that are inappropriate or offensive to some people.

  2. h

    one-million-reddit-jokes

    • huggingface.co
    Updated Nov 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SocialGrep (2021). one-million-reddit-jokes [Dataset]. https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 1, 2021
    Authors
    SocialGrep
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for one-million-reddit-jokes

      Dataset Summary
    

    This corpus contains a million posts from /r/jokes. Posts are annotated with their score.

      Languages
    

    Mainly English.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    A data point is a Reddit post.

      Data Fields
    

    'type': the type of the data point. Can be 'post' or 'comment'. 'id': the base-36 Reddit ID of the data point. Unique when combined with type. 'subreddit.id': the base-36 Reddit ID… See the full description on the dataset page: https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes.

  3. h

    programming-jokes-dataset

    • huggingface.co
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rayhana Rafiai (2025). programming-jokes-dataset [Dataset]. https://huggingface.co/datasets/rayhanti/programming-jokes-dataset
    Explore at:
    Dataset updated
    Feb 7, 2025
    Authors
    Rayhana Rafiai
    Description

    rayhanti/programming-jokes-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. Joke Dataset

    • kaggle.com
    Updated Feb 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brendan Finan (2018). Joke Dataset [Dataset]. https://www.kaggle.com/datasets/bfinan/jokes-question-and-answer/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Brendan Finan
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    My goal with this dataset is to create the largest and most organized dataset of jokes.

    Tools for this dataset are on my Github

    Content

    • Jokes reduced to only the Question and the Answer.
    • Duplicates NOT removed
    • Offensive jokes NOT removed

    Acknowledgements

    Question-Answer Jokes by Jiri Roznovjak

    Short Jokes by Abhinav Moudgil

    Inspiration

    Humor is one of the most difficult domains of natural language processing.

    Contribute

    If you want to help rate the jokes based on funniness and/or vulgarity, download the .csv and make new column(s) with your rating(s). Email that to bfinan@iastate.edu, and I'll add your ratings as part of the dataset.

  5. w

    Dataset of books called Joke-tionary jokes : more than 444 jokes for kids!

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Joke-tionary jokes : more than 444 jokes for kids! [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Joke-tionary+jokes+%3A+more+than+444+jokes+for+kids%21
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Joke-tionary jokes : more than 444 jokes for kids!. It features 7 columns including author, publication date, language, and book publisher.

  6. Email Jokes 1998-2004

    • services.fsd.tuni.fi
    zip
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aro, Jari (2025). Email Jokes 1998-2004 [Dataset]. http://doi.org/10.60686/t-fsd1271
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2025
    Dataset provided by
    Finnish Social Science Data Archive
    Authors
    Aro, Jari
    Description

    The archived data consist of jokes, anecdotes and other humorous texts distributed through email messages. The researcher sent the request for email humour to the staff members of the Department of Sociology and Social Psychology at the University of Tampere, Finland, in February 2003. The staff members in their turn distributed the request further. Texts were received from university staff members and students as well as from outsiders. Data collection continued till the year 2004. The total number of email messages received was 217, some of which contained more than one joke or anecdote. The jokes/anecdotes were mostly in Finnish, but approximately 20% were in English. The themes of the email messages varied greatly. Many were connected to current events, for instance, the Iraq war, September 11 terrorist attacks in the USA, and the doping scandal of Finnish skiers in 2001. Other recurring themes included sexuality, gender and ethnicity stereotypes, and professional jokes. As is typical in email humor, the original creator of the jokes/anecdotes often remained unknown. The dataset is only available in the original languages.

  7. h

    short-jokes-punchline

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timxjl (2024). short-jokes-punchline [Dataset]. https://huggingface.co/datasets/Timxjl/short-jokes-punchline
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Authors
    Timxjl
    License

    https://choosealicense.com/licenses/gpl-2.0/https://choosealicense.com/licenses/gpl-2.0/

    Description

    Short Jokes Punchline

    This dataset contains information about jokes, visitors, labels, and label segments used in a joke labeling application. The data is stored in four CSV files: joke.csv, visitor.csv, label.csv, and label_segment.csv.

      Files
    
    
    
    
    
      joke.csv
    

    This file contains 200 jokes randomly sampled from the Kaggle dataset "Short Jokes." Each row represents a joke with the following columns:

    id: The unique identifier for the joke. text: The text content of the… See the full description on the dataset page: https://huggingface.co/datasets/Timxjl/short-jokes-punchline.

  8. Russian Jokes

    • kaggle.com
    Updated Nov 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantin Albul (2021). Russian Jokes [Dataset]. https://www.kaggle.com/konstantinalbul/russian-jokes/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 6, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Konstantin Albul
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Russia
    Description

    Context

    This dataset is a good way to practice in text classification. Try to predict the theme of the joke from the text. Or define more rated joke.

    Links

  9. w

    Dataset of books called Jokes, jests and jollies

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Jokes, jests and jollies [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Jokes%2C+jests+and+jollies
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Jokes, jests and jollies. It features 7 columns including author, publication date, language, and book publisher.

  10. e

    Corpus of daily jokes from the 24ur.com portal Å ale24 1.0 - Dataset - B2FIND...

    • b2find.eudat.eu
    Updated Jul 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Corpus of daily jokes from the 24ur.com portal Å ale24 1.0 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/c29f095d-fa29-59aa-b494-c85caa0622c4
    Explore at:
    Dataset updated
    Jul 28, 2025
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This is a corpus of 1915 "jokes of the day" ("šala dneva") published by the Slovenian news portal 24ur.com. The jokes were scraped from their archive on September 18th, 2024. The initial list is lightly curated: shorter texts found in the original collection were removed from the corpus since they appear to be illustration captions without the accompanying illustrations. Readers of the news portal vote on the jokes themselves with thumbs up and thumbs down buttons. The voting results are included as metadata with each joke. Several jokes have been published more than once. Each joke (distinguished based on exact text matches) is identified by a hash of its text and presents a list of voting results for every instance of its publication. The normalised_text field contains text with punctuation corrections. For now, this is limited to replacing '' (two consecutive apostrophes U+0027) with " (a single straight/dumb/vertical quotation mark U+0022). The former (two apostrophes) is consistently used in place of the latter in the original corpus. Based on the name ("Šala dneva" i.e. "Joke of the day") and observed frequency of posting during September 2024 we assume each entry corresponds to a day starting from the day of data collection counting backwards. Each voting event for has an associated estimated publication date calculated with the above algorithm. The jokes are linguistically annotated with CLASSLA-Stanza (https://github.com/clarinsi/classla), using the models for standard Slovenian. The JSONL file contains entries representing individual jokes containing: - a hash of the original joke text used for duplicate identification (key: hash) - original scraped text (key: original_text) - normalised text (key: normalised_text) - linguistically annotated normalised text in CoNLL-U format (key: processed_text) - a list of vote objects containing joke vote metadata (key: votes) - votes for (key: votes.for) - votes against (key: votes.against) - estimated dates of joke publication and voting (key: estimated_date)

  11. Rated short jokes

    • kaggle.com
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacopo Le Pera (2025). Rated short jokes [Dataset]. https://www.kaggle.com/datasets/jacopolepera/rated-short-jokes/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jacopo Le Pera
    Description

    Dataset

    This dataset was created by Jacopo Le Pera

    Contents

  12. w

    Dataset of books called Dirty jokes every man should know

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Dirty jokes every man should know [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Dirty+jokes+every+man+should+know
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Dirty jokes every man should know. It features 7 columns including author, publication date, language, and book publisher.

  13. A

    Hungarian Covid jokes and memes

    • repo.researchdata.hu
    jpeg, pdf, png
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katalin Vargha; Katalin Vargha (2024). Hungarian Covid jokes and memes [Dataset]. https://repo.researchdata.hu/dataset.xhtml?persistentId=hdl:21.15109/CONCORDA/WEF6GB
    Explore at:
    jpeg(58524), jpeg(47183), jpeg(125756), jpeg(33716), jpeg(39797), jpeg(37204), jpeg(85549), pdf(89381), jpeg(29167), jpeg(71654), jpeg(53382), jpeg(22656), jpeg(67821), jpeg(165438), pdf(90467), jpeg(25594), jpeg(65998), jpeg(37912), jpeg(58766), pdf(260030), jpeg(37794), jpeg(139645), jpeg(44722), jpeg(65618), jpeg(132144), jpeg(43880), jpeg(78150), jpeg(110836), pdf(87611), jpeg(19817), jpeg(54007), jpeg(81739), jpeg(82960), jpeg(89443), jpeg(100031), jpeg(28705), jpeg(119346), jpeg(65265), jpeg(69211), jpeg(43335), pdf(95352), jpeg(82171), jpeg(59015), jpeg(27512), jpeg(49612), jpeg(147611), jpeg(53115), pdf(96664), jpeg(55006), jpeg(62622), jpeg(96343), jpeg(53452), jpeg(52603), jpeg(43133), jpeg(69445), jpeg(101247), jpeg(48250), jpeg(75661), pdf(92466), jpeg(36685), jpeg(70537), jpeg(71302), jpeg(145465), jpeg(40710), jpeg(83383), jpeg(61333), jpeg(9046), jpeg(80442), jpeg(96445), jpeg(81797), jpeg(60587), jpeg(74652), pdf(90301), pdf(91037), jpeg(95478), jpeg(40198), jpeg(117156), jpeg(72318), pdf(128299), pdf(84791), jpeg(83359), jpeg(55766), jpeg(138002), jpeg(132059), jpeg(26761), jpeg(43259), pdf(91990), jpeg(184586), jpeg(18087), jpeg(55779), jpeg(146108), jpeg(42427), jpeg(154117), jpeg(60708), pdf(87274), pdf(90370), jpeg(136209), pdf(388276), pdf(89193), jpeg(91880), jpeg(118814), jpeg(67690), jpeg(62250), jpeg(44643), jpeg(192409), pdf(83300), jpeg(51566), jpeg(102318), jpeg(83186), jpeg(116378), jpeg(191976), jpeg(84013), jpeg(97862), jpeg(83231), jpeg(102113), jpeg(62109), jpeg(54274), jpeg(48370), jpeg(57373), jpeg(44848), pdf(89876), jpeg(82808), jpeg(22741), jpeg(34134), jpeg(150142), jpeg(57479), pdf(89199), jpeg(63573), jpeg(62054), pdf(376887), png(287379), jpeg(81240), jpeg(56965), jpeg(63252), pdf(89235), pdf(89785), jpeg(37292), jpeg(46420), pdf(89684), jpeg(89748), jpeg(31776), jpeg(78542), jpeg(46107), jpeg(104983), jpeg(23190), jpeg(70395), jpeg(127176), jpeg(63031), jpeg(110174), pdf(266957), jpeg(60289), jpeg(48735), jpeg(78352), jpeg(149482), pdf(201527), jpeg(76238), jpeg(42417), jpeg(58612), jpeg(74748), pdf(268243), jpeg(139995), jpeg(54227), pdf(90665), jpeg(34435), jpeg(34905), jpeg(158762), jpeg(38048), pdf(88682), jpeg(112450), jpeg(44860), jpeg(131044), jpeg(41602), pdf(92425), jpeg(212874), png(84965), jpeg(25415), jpeg(91126), pdf(84861), pdf(284315), jpeg(47865), pdf(87864), pdf(83424), jpeg(62378), pdf(89514), jpeg(101237), jpeg(7774), pdf(93091), jpeg(140198), jpeg(147261), jpeg(65073), jpeg(82149), png(627550), jpeg(72229), jpeg(63444), jpeg(76376), pdf(92769), jpeg(83959), jpeg(70827), jpeg(51799), jpeg(64051), jpeg(300061), jpeg(42644), jpeg(80096), jpeg(46540), jpeg(26131), jpeg(55669), jpeg(92710), jpeg(35436), png(222530), jpeg(87944), jpeg(49581), jpeg(63807), jpeg(142761), pdf(86541), png(123355), jpeg(68753), jpeg(60900), jpeg(54530), jpeg(57655), jpeg(100615), pdf(76594), pdf(270112), pdf(87576), jpeg(54384), pdf(90194), jpeg(57913), jpeg(79530), jpeg(35821), jpeg(11340), jpeg(22875), jpeg(7451), jpeg(49231), pdf(94000), jpeg(83232), jpeg(77305), png(492831), jpeg(35056), jpeg(73608), jpeg(69290), jpeg(78712), jpeg(47408), jpeg(110816), jpeg(9145), pdf(90743), jpeg(52358), jpeg(59513), pdf(94629), jpeg(74750), jpeg(72775), jpeg(58006), jpeg(78537), pdf(104726), jpeg(97147), jpeg(98931), jpeg(41760), jpeg(64580), jpeg(87158), pdf(93154), pdf(89092), jpeg(14995), pdf(194333), pdf(3200), jpeg(93742), pdf(93965), jpeg(53834), jpeg(67868), pdf(91696), jpeg(9307), jpeg(115943), jpeg(98871), jpeg(69975), jpeg(76271), jpeg(74877), jpeg(60619), jpeg(31103), jpeg(84022), jpeg(94425), pdf(92116), pdf(3187), jpeg(43833), jpeg(60640), jpeg(80755), pdf(92212), pdf(92178), jpeg(129853), jpeg(63996), jpeg(39604), jpeg(71427), jpeg(110140), pdf(91789), jpeg(113391), jpeg(46622), jpeg(110892), jpeg(123087), jpeg(32758), pdf(262240), jpeg(38446), png(119547), jpeg(62975), jpeg(104454), jpeg(74644), jpeg(49597), pdf(91371), jpeg(176632), jpeg(144920), jpeg(56669), jpeg(108174), jpeg(104787), jpeg(5525), pdf(94520), jpeg(94231), jpeg(47453), jpeg(92253), jpeg(210173), jpeg(24166), pdf(90868), pdf(90387), jpeg(133769), jpeg(136705), pdf(94610), jpeg(42060), jpeg(77372), jpeg(18799), pdf(88344), pdf(92373), jpeg(50594), jpeg(85010), jpeg(75659), jpeg(60420), jpeg(61079), jpeg(75425), jpeg(55632), pdf(184248), jpeg(57786), jpeg(95975), jpeg(67646), jpeg(82945), jpeg(19137), jpeg(105934), jpeg(55780), pdf(85982), jpeg(65249), jpeg(56744), pdf(88786)Available download formats
    Dataset updated
    May 2, 2024
    Dataset provided by
    ARP
    Authors
    Katalin Vargha; Katalin Vargha
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset includes a collection of Hungarian Internet humour connected to the Covid-19 pandemic. The collection includes 344 items (mostly jokes and memes) that were collected online during the first wave of Covid in Hungary, between January and June 2020.

  14. Joke Dataset rating prediction

    • kaggle.com
    Updated Apr 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sachin (2019). Joke Dataset rating prediction [Dataset]. https://www.kaggle.com/sachin619/jester-dataset/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sachin
    Description

    Context

    this data is collected from various sources.

    Content

    It contains user's whos id is given and joke_id with Rating. Another file contains the jokes with their Ids

  15. h

    jokes

    • huggingface.co
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    foong chee how (2024). jokes [Dataset]. https://huggingface.co/datasets/kentfoong/jokes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 24, 2024
    Authors
    foong chee how
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    kentfoong/jokes dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. Offense Classification Jokes

    • kaggle.com
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avneet Singh (2024). Offense Classification Jokes [Dataset]. https://www.kaggle.com/avneets2103/offense-classification-jokes/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 10, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Avneet Singh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Avneet Singh

    Released under Apache 2.0

    Contents

  17. Dataset of Russian jokes

    • kaggle.com
    Updated Feb 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kirill Ovcharenko (2023). Dataset of Russian jokes [Dataset]. https://www.kaggle.com/datasets/kovcharenko51/dataset-of-russian-jokes/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kirill Ovcharenko
    Area covered
    Russia
    Description

    Dataset

    This dataset was created by Kirill Ovcharenko

    Contents

  18. f

    Data from: Global geography of jokes

    • scielo.figshare.com
    jpeg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hervé Théry (2023). Global geography of jokes [Dataset]. http://doi.org/10.6084/m9.figshare.14307540.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELO journals
    Authors
    Hervé Théry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Abstract: Jokes between countries are useful to reveal the ethnotypes existing in each of them, to represent them in cartographic form allows to perceive their distribution and the spatial projection of mockery: who are we laughing at, who are the scapegoats for the inhabitants of each country? Based on the analysis of an ad hoc database covering more than 60% of the countries and territories of the world and 90% of its population, the text shows that these jokes are social constructions, have a temporality and are divided basically in two categories, from top to bottom and from bottom to top.

  19. c

    dad jokes Price Prediction Data

    • coinbase.com
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). dad jokes Price Prediction Data [Dataset]. https://www.coinbase.com/en-pt/price-prediction/base-dad-jokes
    Explore at:
    Dataset updated
    Oct 8, 2025
    Variables measured
    Growth Rate, Predicted Price
    Measurement technique
    User-defined projections based on compound growth. This is not a formal financial forecast.
    Description

    This dataset contains the predicted prices of the asset dad jokes over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.

  20. s

    Plaintext Jokes

    • marketplace.sshopencloud.eu
    Updated Sep 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Plaintext Jokes [Dataset]. https://marketplace.sshopencloud.eu/dataset/nCeh4z
    Explore at:
    Dataset updated
    Sep 10, 2018
    Description

    Approximately 208,000 jokes scraped from various websites

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fraser Greenlee (2021). short-jokes [Dataset]. https://huggingface.co/datasets/Fraser/short-jokes

short-jokes

Fraser/short-jokes

Explore at:
Dataset updated
Mar 9, 2021
Authors
Fraser Greenlee
Description

Copy of Kaggle dataset, adding to Huggingface for ease of use.

Description from Kaggle:

Context

Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes.

Visit my Github repository for more information regarding collection of data and the scripts used.

Content

This dataset is in the form of a csv file containing 231,657 jokes. Length of jokes ranges from 10 to 200 characters. Each line in the file contains a unique ID and joke.

Disclaimer

It has been attempted to keep the jokes as clean as possible. Since the data has been collected by scraping websites, it is possible that there may be a few jokes that are inappropriate or offensive to some people.

Search
Clear search
Close search
Google apps
Main menu