Saved datasets
Last updated
Download format
Croissant
Croissant is a format for Machine Learning datasets
Learn more about this at mlcommons.org/croissant.
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Provider
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. Short Jokes Dataset

    • kaggle.com
    zip
    Updated Dec 5, 2023
  2. h

    short-jokes

    • huggingface.co
    • kaggle.com
    Updated Mar 9, 2021
  3. Dad Jokes

    • kaggle.com
    zip
    Updated Mar 18, 2026
  4. h

    dadjokes

    • huggingface.co
    Updated Oct 11, 2023
  5. 1 Million Reddit Jokes (r/jokes)

    • kaggle.com
    zip
    Updated Jul 9, 2024
    + more versions
  6. h

    short_jokes

    • huggingface.co
    Updated Feb 22, 2024
    + more versions
  7. h

    programming-jokes-dataset

    • huggingface.co
    Updated May 4, 2024
  8. Dad Jokes Dataset (From icanhazdadjoke.com API)

    • kaggle.com
    zip
    Updated Dec 28, 2025
  9. jokes dataset

    • kaggle.com
    zip
    Updated Jan 18, 2022
  10. h

    jokes-dataset

    • huggingface.co
    Updated Feb 7, 2025
    + more versions
  11. Email Jokes 1998-2004

    • services.fsd.tuni.fi
    • datacatalogue.cessda.eu
    zip
    Updated Mar 19, 2026
  12. Chuck Norris Jokes

    • kaggle.com
    zip
    Updated Nov 2, 2023
  13. Reddit r/Jokes Dataset

    • kaggle.com
    zip
    Updated Nov 6, 2024
  14. h

    Jokes

    • huggingface.co
    Updated Nov 4, 2023
  15. h

    10k-jokes-dataset

    • huggingface.co
    Updated Feb 3, 2024
  16. o

    Data from: What’s Brown and Sticky? Peering Into the Ineluctable Comedic...

    • osf.io
    Updated Mar 15, 2026
  17. Jokes dataset (id , joke)

    • kaggle.com
    zip
    Updated Jul 3, 2025
  18. h

    oig-jokes

    • huggingface.co
    Updated Jul 28, 2024
  19. Joke Dataset

    • kaggle.com
    zip
    Updated Feb 10, 2018
  20. s

    Country distribution of Jokes surname

    • surnam.es
    Updated Jul 1, 2023
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fraser Greenlee (2021). short-jokes [Dataset]. https://huggingface.co/datasets/Fraser/short-jokes

short-jokes

Fraser/short-jokes

Explore at:
Dataset updated
Mar 9, 2021
Authors
Fraser Greenlee
Description

Copy of Kaggle dataset, adding to Huggingface for ease of use.

Description from Kaggle:

Context

Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes.

Visit my Github repository for more information regarding collection of data and the scripts used.

Content

This dataset is in the form of a csv file containing 231,657 jokes. Length of jokes ranges from 10 to 200 characters. Each line in the file contains a unique ID and joke.

Disclaimer

It has been attempted to keep the jokes as clean as possible. Since the data has been collected by scraping websites, it is possible that there may be a few jokes that are inappropriate or offensive to some people.

Search
Clear search
Close search
Google apps
Main menu