100+ datasets found

h
short-jokes
huggingface.co
kaggle.com
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fraser Greenlee (2021). short-jokes [Dataset]. https://huggingface.co/datasets/Fraser/short-jokes
Explore at:
Dataset updated
Mar 9, 2021
Authors
Fraser Greenlee
Description
Copy of Kaggle dataset, adding to Huggingface for ease of use.

Description from Kaggle:

Context

Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes.

Visit my Github repository for more information regarding collection of data and the scripts used.

Content

This dataset is in the form of a csv file containing 231,657 jokes. Length of jokes ranges from 10 to 200 characters. Each line in the file contains a unique ID and joke.

Disclaimer

It has been attempted to keep the jokes as clean as possible. Since the data has been collected by scraping websites, it is possible that there may be a few jokes that are inappropriate or offensive to some people.
Dad Jokes
kaggle.com
zip
Updated Nov 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Usama Buttar (2025). Dad Jokes [Dataset]. https://www.kaggle.com/datasets/usamabuttar/dad-jokes
Explore at:
zip(4247529 bytes)Available download formats
Dataset updated
Nov 8, 2025
Authors
Usama Buttar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Get ready to groan and giggle your way through 'Grin and Dad Joke It,' a dataset that's all about those classic, eye-roll-inducing dad jokes. This pun-tastic collection brings together a treasure trove of one-liners, puns, and witty quips that dads everywhere love to share. Whether you're a dad joke aficionado or just looking to add some humor to your day, this dataset is your go-to source for timeless, family-friendly humor. From cheesy wordplay to clever punchlines, 'Grin and Dad Joke It' has you covered, ensuring that a chuckle is just a punchline away.

And the fun never stops! With 200 new jokes added daily, 'Grin and Dad Joke It' keeps the laughter flowing and your pun tolerance growing. It's a never-ending source of dad-approved humor that's always fresh and ready to make you smile.
h
short_jokes
huggingface.co
Updated Feb 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yuvraj sharma (2024). short_jokes [Dataset]. https://huggingface.co/datasets/ysharma/short_jokes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 22, 2024
Authors
yuvraj sharma
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes. You can visit the Github… See the full description on the dataset page: https://huggingface.co/datasets/ysharma/short_jokes.
jokes dataset
kaggle.com
zip
Updated Jan 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yaroslav-siberia (2022). jokes dataset [Dataset]. https://www.kaggle.com/datasets/yaroslav62/jokes-dataset
Explore at:
zip(7775133 bytes)Available download formats
Dataset updated
Jan 18, 2022
Authors
Yaroslav-siberia
Description
Dataset

This dataset was created by Yaroslav-siberia

Contents
h
one-million-reddit-jokes
huggingface.co
Updated Nov 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SocialGrep (2021). one-million-reddit-jokes [Dataset]. https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 1, 2021
Authors
SocialGrep
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for one-million-reddit-jokes

Dataset Summary

This corpus contains a million posts from /r/jokes. Posts are annotated with their score.

Languages

Mainly English.

Dataset Structure Data Instances

A data point is a Reddit post.

Data Fields

'type': the type of the data point. Can be 'post' or 'comment'. 'id': the base-36 Reddit ID of the data point. Unique when combined with type. 'subreddit.id': the base-36 Reddit ID… See the full description on the dataset page: https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes.
Joke Dataset
kaggle.com
zip
Updated Feb 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brendan Finan (2018). Joke Dataset [Dataset]. https://www.kaggle.com/datasets/bfinan/jokes-question-and-answer
Explore at:
zip(6121780 bytes)Available download formats
Dataset updated
Feb 10, 2018
Authors
Brendan Finan
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Context

My goal with this dataset is to create the largest and most organized dataset of jokes.

Tools for this dataset are on my Github

Content

Jokes reduced to only the Question and the Answer.

Duplicates NOT removed

Offensive jokes NOT removed

Acknowledgements

Question-Answer Jokes by Jiri Roznovjak

Short Jokes by Abhinav Moudgil

Inspiration

Humor is one of the most difficult domains of natural language processing.

Contribute

If you want to help rate the jokes based on funniness and/or vulgarity, download the .csv and make new column(s) with your rating(s). Email that to bfinan@iastate.edu, and I'll add your ratings as part of the dataset.
h
programming-jokes-dataset
huggingface.co
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asfandyar Azhar (2024). programming-jokes-dataset [Dataset]. https://huggingface.co/datasets/asfandyarazhar/programming-jokes-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2024
Authors
Asfandyar Azhar
Description
Programming Jokes Dataset

Dataset Summary

This dataset contains programming-related jokes scraped from the website Punny Funny. The jokes are organized into different categories based on the structure of the original webpage. The dataset is intended for use in natural language processing tasks, such as fine-tuning language models to generate humor or analyze textual content in the programming domain. Number of Jokes: [220]

Usage

This dataset is suitable for… See the full description on the dataset page: https://huggingface.co/datasets/asfandyarazhar/programming-jokes-dataset.
h
jokes-dataset
huggingface.co
Updated Feb 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rayhana Rafiai (2025). jokes-dataset [Dataset]. https://huggingface.co/datasets/rayhanti/jokes-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2025
Authors
Rayhana Rafiai
Description
rayhanti/jokes-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Short Jokes Dataset
kaggle.com
zip
Updated Dec 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Short Jokes Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/short-jokes-dataset/suggestions?status=pending&yourSuggestions=true
Explore at:
zip(9673796 bytes)Available download formats
Dataset updated
Dec 5, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Short Jokes Dataset

Humorous Short Jokes

By Fraser Greenlee (From Huggingface) [source]

About this dataset

This dataset offers a valuable resource for various applications such as natural language processing, sentiment analysis, joke generation algorithms, or simply for entertainment purposes. Whether you're a data scientist looking to analyze humor patterns or an individual seeking some quick comedic relief, this dataset has got you covered.

By utilizing this dataset, researchers can explore different aspects of humor and study the linguistic features that make these short jokes amusing. Moreover, it provides an opportunity for developing computer models capable of generating similar humorous content based on learned patterns.

How to use the dataset

Understanding the Columns:

text: This column contains the text of the short joke.

**text: No information is provided about this column.

Exploring the Jokes:

Start by exploring the text column, which contains the actual jokes. You can read through them and have a good laugh!

Analyzing the Jokes:

To gain insights from this dataset, you can perform various analyses:

Sentiment Analysis: Use Natural Language Processing techniques to analyze the sentiment of each joke.

Categorization: Group jokes based on common themes or subjects, such as animals, professions, etc.

Length Distribution: Analyze and visualize the distribution of joke lengths.

Creating New Content or Applications: Since this dataset provides a large collection of short jokes, you can utilize it creatively:

Generating Random Jokes: Develop an algorithm that generates new jokes based on patterns found in this dataset.

Humor Classification: Build a model that predicts if a given piece of text is funny or not using machine learning techniques.

Sharing Your Findings: If you make interesting discoveries or create unique applications using this dataset, consider sharing them with others in Kaggle community.

Please note that no information regarding dates is available in train.csv; therefore, any temporal analysis or date-based insights won't be feasible with this specific file.

Research Ideas

Analyzing humor patterns: This dataset can be used to analyze different types of humor and identify patterns or common elements in jokes that make them funny. Researchers and linguists can use this dataset to gain insights into the structure, wordplay, or comedic techniques used in short jokes.

Natural language processing: With the text data available in this dataset, it can be used for training models in natural language processing (NLP) tasks such as sentiment analysis, joke generation, or understanding humor from written text. NLP researchers and developers can utilize this dataset to build and improve algorithms for detecting or generating funny content.

Social media analysis: Short jokes are popular on social media platforms like Twitter or Reddit where users frequently share humorous content. This dataset can be valuable for analyzing the reception and impact of these jokes on social media platforms. By examining trends, engagement metrics, or user reactions to specific jokes from the dataset, marketers or social media analysts can gain insights into what type of humor resonates with different online communities. Overall, this dataset provides a rich resource for exploring various aspects related to humor analysis and NLP tasks while offering opportunities for sociocultural studies related to online comedy culture

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:----------------------------------------------| | text | The actual content of the short jokes. (Text) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Fraser Greenlee (From Huggingface).
w
Dataset of books called Jokes, jests and jollies
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Jokes, jests and jollies [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Jokes%2C+jests+and+jollies
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Jokes, jests and jollies. It features 7 columns including author, publication date, language, and book publisher.
w
Dataset of books called Monster jokes
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Monster jokes [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Monster+jokes
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 3 rows and is filtered where the book is Monster jokes. It features 7 columns including author, publication date, language, and book publisher.
A
Hungarian Covid jokes and memes
repo.researchdata.hu
jpeg, pdf, png
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katalin Vargha; Katalin Vargha (2024). Hungarian Covid jokes and memes [Dataset]. https://repo.researchdata.hu/dataset.xhtml?persistentId=hdl:21.15109/CONCORDA/WEF6GB
Explore at:
jpeg(58524), jpeg(47183), jpeg(125756), jpeg(33716), jpeg(39797), jpeg(37204), jpeg(85549), pdf(89381), jpeg(29167), jpeg(71654), jpeg(53382), jpeg(22656), jpeg(67821), jpeg(165438), pdf(90467), jpeg(25594), jpeg(65998), jpeg(37912), jpeg(58766), pdf(260030), jpeg(37794), jpeg(139645), jpeg(44722), jpeg(65618), jpeg(132144), jpeg(43880), jpeg(78150), jpeg(110836), pdf(87611), jpeg(19817), jpeg(54007), jpeg(81739), jpeg(82960), jpeg(89443), jpeg(100031), jpeg(28705), jpeg(119346), jpeg(65265), jpeg(69211), jpeg(43335), pdf(95352), jpeg(82171), jpeg(59015), jpeg(27512), jpeg(49612), jpeg(147611), jpeg(53115), pdf(96664), jpeg(55006), jpeg(62622), jpeg(96343), jpeg(53452), jpeg(52603), jpeg(43133), jpeg(69445), jpeg(101247), jpeg(48250), jpeg(75661), pdf(92466), jpeg(36685), jpeg(70537), jpeg(71302), jpeg(145465), jpeg(40710), jpeg(83383), jpeg(61333), jpeg(9046), jpeg(80442), jpeg(96445), jpeg(81797), jpeg(60587), jpeg(74652), pdf(90301), pdf(91037), jpeg(95478), jpeg(40198), jpeg(117156), jpeg(72318), pdf(128299), pdf(84791), jpeg(83359), jpeg(55766), jpeg(138002), jpeg(132059), jpeg(26761), jpeg(43259), pdf(91990), jpeg(184586), jpeg(18087), jpeg(55779), jpeg(146108), jpeg(42427), jpeg(154117), jpeg(60708), pdf(87274), pdf(90370), jpeg(136209), pdf(388276), pdf(89193), jpeg(91880), jpeg(118814), jpeg(67690), jpeg(62250), jpeg(44643), jpeg(192409), pdf(83300), jpeg(51566), jpeg(102318), jpeg(83186), jpeg(116378), jpeg(191976), jpeg(84013), jpeg(97862), jpeg(83231), jpeg(102113), jpeg(62109), jpeg(54274), jpeg(48370), jpeg(57373), jpeg(44848), pdf(89876), jpeg(82808), jpeg(22741), jpeg(34134), jpeg(150142), jpeg(57479), pdf(89199), jpeg(63573), jpeg(62054), pdf(376887), png(287379), jpeg(81240), jpeg(56965), jpeg(63252), pdf(89235), pdf(89785), jpeg(37292), jpeg(46420), pdf(89684), jpeg(89748), jpeg(31776), jpeg(78542), jpeg(46107), jpeg(104983), jpeg(23190), jpeg(70395), jpeg(127176), jpeg(63031), jpeg(110174), pdf(266957), jpeg(60289), jpeg(48735), jpeg(78352), jpeg(149482), pdf(201527), jpeg(76238), jpeg(42417), jpeg(58612), jpeg(74748), pdf(268243), jpeg(139995), jpeg(54227), pdf(90665), jpeg(34435), jpeg(34905), jpeg(158762), jpeg(38048), pdf(88682), jpeg(112450), jpeg(44860), jpeg(131044), jpeg(41602), pdf(92425), jpeg(212874), png(84965), jpeg(25415), jpeg(91126), pdf(84861), pdf(284315), jpeg(47865), pdf(87864), pdf(83424), jpeg(62378), pdf(89514), jpeg(101237), jpeg(7774), pdf(93091), jpeg(140198), jpeg(147261), jpeg(65073), jpeg(82149), png(627550), jpeg(72229), jpeg(63444), jpeg(76376), pdf(92769), jpeg(83959), jpeg(70827), jpeg(51799), jpeg(64051), jpeg(300061), jpeg(42644), jpeg(80096), jpeg(46540), jpeg(26131), jpeg(55669), jpeg(92710), jpeg(35436), png(222530), jpeg(87944), jpeg(49581), jpeg(63807), jpeg(142761), pdf(86541), png(123355), jpeg(68753), jpeg(60900), jpeg(54530), jpeg(57655), jpeg(100615), pdf(76594), pdf(270112), pdf(87576), jpeg(54384), pdf(90194), jpeg(57913), jpeg(79530), jpeg(35821), jpeg(11340), jpeg(22875), jpeg(7451), jpeg(49231), pdf(94000), jpeg(83232), jpeg(77305), png(492831), jpeg(35056), jpeg(73608), jpeg(69290), jpeg(78712), jpeg(47408), jpeg(110816), jpeg(9145), pdf(90743), jpeg(52358), jpeg(59513), pdf(94629), jpeg(74750), jpeg(72775), jpeg(58006), jpeg(78537), pdf(104726), jpeg(97147), jpeg(98931), jpeg(41760), jpeg(64580), jpeg(87158), pdf(93154), pdf(89092), jpeg(14995), pdf(194333), pdf(3200), jpeg(93742), pdf(93965), jpeg(53834), jpeg(67868), pdf(91696), jpeg(9307), jpeg(115943), jpeg(98871), jpeg(69975), jpeg(76271), jpeg(74877), jpeg(60619), jpeg(31103), jpeg(84022), jpeg(94425), pdf(92116), pdf(3187), jpeg(43833), jpeg(60640), jpeg(80755), pdf(92212), pdf(92178), jpeg(129853), jpeg(63996), jpeg(39604), jpeg(71427), jpeg(110140), pdf(91789), jpeg(113391), jpeg(46622), jpeg(110892), jpeg(123087), jpeg(32758), pdf(262240), jpeg(38446), png(119547), jpeg(62975), jpeg(104454), jpeg(74644), jpeg(49597), pdf(91371), jpeg(176632), jpeg(144920), jpeg(56669), jpeg(108174), jpeg(104787), jpeg(5525), pdf(94520), jpeg(94231), jpeg(47453), jpeg(92253), jpeg(210173), jpeg(24166), pdf(90868), pdf(90387), jpeg(133769), jpeg(136705), pdf(94610), jpeg(42060), jpeg(77372), jpeg(18799), pdf(88344), pdf(92373), jpeg(50594), jpeg(85010), jpeg(75659), jpeg(60420), jpeg(61079), jpeg(75425), jpeg(55632), pdf(184248), jpeg(57786), jpeg(95975), jpeg(67646), jpeg(82945), jpeg(19137), jpeg(105934), jpeg(55780), pdf(85982), jpeg(65249), jpeg(56744), pdf(88786)Available download formats
Dataset updated
May 2, 2024
Dataset provided by
ARP
Authors
Katalin Vargha; Katalin Vargha
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes a collection of Hungarian Internet humour connected to the Covid-19 pandemic. The collection includes 344 items (mostly jokes and memes) that were collected online during the first wave of Covid in Hungary, between January and June 2020.
h
jokes
huggingface.co
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
foong chee how (2024). jokes [Dataset]. https://huggingface.co/datasets/kentfoong/jokes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 24, 2024
Authors
foong chee how
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
kentfoong/jokes dataset hosted on Hugging Face and contributed by the HF Datasets community
c
dad jokes Price Prediction Data
coinbase.com
Updated Nov 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). dad jokes Price Prediction Data [Dataset]. https://www.coinbase.com/en/price-prediction/base-dad-jokes
Explore at:
Dataset updated
Nov 13, 2025
Variables measured
Growth Rate, Predicted Price
Measurement technique
User-defined projections based on compound growth. This is not a formal financial forecast.
Description
This dataset contains the predicted prices of the asset dad jokes over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
E
Data from: Corpus of daily jokes from the 24ur.com portal Šale24 1.0
live.european-language-grid.eu
binary format
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Corpus of daily jokes from the 24ur.com portal Šale24 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/23698
Explore at:
binary formatAvailable download formats
Dataset updated
Oct 2, 2024
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This is a corpus of 1915 "jokes of the day" ("šala dneva") published by the Slovenian news portal 24ur.com. The jokes were scraped from their archive on September 18th, 2024. The initial list is lightly curated: shorter texts found in the original collection were removed from the corpus since they appear to be illustration captions without the accompanying illustrations.

Readers of the news portal vote on the jokes themselves with thumbs up and thumbs down buttons. The voting results are included as metadata with each joke. Several jokes have been published more than once. Each joke (distinguished based on exact text matches) is identified by a hash of its text and presents a list of voting results for every instance of its publication. The normalised_text field contains text with punctuation corrections. For now, this is limited to replacing '' (two consecutive apostrophes U+0027) with " (a single straight/dumb/vertical quotation mark U+0022). The former (two apostrophes) is consistently used in place of the latter in the original corpus.

Based on the name ("Šala dneva" i.e. "Joke of the day") and observed frequency of posting during September 2024 we assume each entry corresponds to a day starting from the day of data collection counting backwards. Each voting event for has an associated estimated publication date calculated with the above algorithm.

The jokes are linguistically annotated with CLASSLA-Stanza (https://github.com/clarinsi/classla), using the models for standard Slovenian. The JSONL file contains entries representing individual jokes containing: - a hash of the original joke text used for duplicate identification (key: hash) - original scraped text (key: original_text) - normalised text (key: normalised_text) - linguistically annotated normalised text in CoNLL-U format (key: processed_text) - a list of vote objects containing joke vote metadata (key: votes) - votes for (key: votes.for) - votes against (key: votes.against) - estimated dates of joke publication and voting (key: estimated_date)

The corpus contains 16658 sentences, 129063 tokens, and 662 recognised named entities.
f
Data from: Global geography of jokes
scielo.figshare.com
jpeg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hervé Théry (2023). Global geography of jokes [Dataset]. http://doi.org/10.6084/m9.figshare.14307540.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14307540.v1
Dataset updated
May 31, 2023
Dataset provided by
SciELO journals
Authors
Hervé Théry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Abstract: Jokes between countries are useful to reveal the ethnotypes existing in each of them, to represent them in cartographic form allows to perceive their distribution and the spatial projection of mockery: who are we laughing at, who are the scapegoats for the inhabitants of each country? Based on the analysis of an ad hoc database covering more than 60% of the countries and territories of the world and 90% of its population, the text shows that these jokes are social constructions, have a temporality and are divided basically in two categories, from top to bottom and from bottom to top.
h
short-jokes-punchline
huggingface.co
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timxjl (2024). short-jokes-punchline [Dataset]. https://huggingface.co/datasets/Timxjl/short-jokes-punchline
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2024
Authors
Timxjl
License
https://choosealicense.com/licenses/gpl-2.0/https://choosealicense.com/licenses/gpl-2.0/
Description
Short Jokes Punchline

This dataset contains information about jokes, visitors, labels, and label segments used in a joke labeling application. The data is stored in four CSV files: joke.csv, visitor.csv, label.csv, and label_segment.csv.

Files joke.csv

This file contains 200 jokes randomly sampled from the Kaggle dataset "Short Jokes." Each row represents a joke with the following columns:

id: The unique identifier for the joke. text: The text content of the… See the full description on the dataset page: https://huggingface.co/datasets/Timxjl/short-jokes-punchline.
w
Dataset of book subjects that contain Jokes my father never taught me :...
workwithdata.com
Updated Nov 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Jokes my father never taught me : life, love, and loss with Richard Pryor [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Jokes+my+father+never+taught+me+:+life%2C+love%2C+and+loss+with+Richard+Pryor&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 4 rows and is filtered where the books is Jokes my father never taught me : life, love, and loss with Richard Pryor. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Jester Jokes Dataset v4
kaggle.com
zip
Updated Jun 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Amine DHIAB (2023). Jester Jokes Dataset v4 [Dataset]. https://www.kaggle.com/datasets/mohamedaminedhiab/jester-jokes-dataset-v4
Explore at:
zip(1419440 bytes)Available download formats
Dataset updated
Jun 22, 2023
Authors
Mohamed Amine DHIAB
Description
Dataset

This dataset was created by Mohamed Amine DHIAB

Contents
h
chinese-joke
huggingface.co
Updated Apr 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wangxh (2023). chinese-joke [Dataset]. https://huggingface.co/datasets/notsobad9527/chinese-joke
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 29, 2023
Authors
wangxh
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
notsobad9527/chinese-joke dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Fraser Greenlee (2021). short-jokes [Dataset]. https://huggingface.co/datasets/Fraser/short-jokes

short-jokes

Fraser/short-jokes

Explore at:

Dataset updated

Mar 9, 2021

Authors

Fraser Greenlee

Description

Copy of Kaggle dataset, adding to Huggingface for ease of use.

Description from Kaggle:

Context

Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes.

Visit my Github repository for more information regarding collection of data and the scripts used.

Content

This dataset is in the form of a csv file containing 231,657 jokes. Length of jokes ranges from 10 to 200 characters. Each line in the file contains a unique ID and joke.

Disclaimer

It has been attempted to keep the jokes as clean as possible. Since the data has been collected by scraping websites, it is possible that there may be a few jokes that are inappropriate or offensive to some people.

Clear search

Close search

Google apps

Main menu

short-jokes

Dad Jokes

short_jokes

jokes dataset

Dataset

Contents

one-million-reddit-jokes

Joke Dataset

Context

Content

Acknowledgements

Inspiration

Contribute

programming-jokes-dataset

jokes-dataset

Short Jokes Dataset

Short Jokes Dataset

Humorous Short Jokes

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Dataset of books called Jokes, jests and jollies

Dataset of books called Monster jokes

Hungarian Covid jokes and memes

jokes

dad jokes Price Prediction Data

Data from: Corpus of daily jokes from the 24ur.com portal Šale24 1.0

Data from: Global geography of jokes

short-jokes-punchline

Dataset of book subjects that contain Jokes my father never taught me :...

Jester Jokes Dataset v4

Dataset

Contents

chinese-joke

short-jokes

Fraser/short-jokes