100+ datasets found
  1. h

    short-jokes

    • huggingface.co
    • kaggle.com
    Updated Mar 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fraser Greenlee (2021). short-jokes [Dataset]. https://huggingface.co/datasets/Fraser/short-jokes
    Explore at:
    Dataset updated
    Mar 9, 2021
    Authors
    Fraser Greenlee
    Description

    Copy of Kaggle dataset, adding to Huggingface for ease of use.

    Description from Kaggle:

    Context

    Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes.

    Visit my Github repository for more information regarding collection of data and the scripts used.

    Content

    This dataset is in the form of a csv file containing 231,657 jokes. Length of jokes ranges from 10 to 200 characters. Each line in the file contains a unique ID and joke.

    Disclaimer

    It has been attempted to keep the jokes as clean as possible. Since the data has been collected by scraping websites, it is possible that there may be a few jokes that are inappropriate or offensive to some people.

  2. Dad Jokes

    • kaggle.com
    zip
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Usama Buttar (2025). Dad Jokes [Dataset]. https://www.kaggle.com/datasets/usamabuttar/dad-jokes
    Explore at:
    zip(4247529 bytes)Available download formats
    Dataset updated
    Nov 8, 2025
    Authors
    Usama Buttar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Get ready to groan and giggle your way through 'Grin and Dad Joke It,' a dataset that's all about those classic, eye-roll-inducing dad jokes. This pun-tastic collection brings together a treasure trove of one-liners, puns, and witty quips that dads everywhere love to share. Whether you're a dad joke aficionado or just looking to add some humor to your day, this dataset is your go-to source for timeless, family-friendly humor. From cheesy wordplay to clever punchlines, 'Grin and Dad Joke It' has you covered, ensuring that a chuckle is just a punchline away.

    And the fun never stops! With 200 new jokes added daily, 'Grin and Dad Joke It' keeps the laughter flowing and your pun tolerance growing. It's a never-ending source of dad-approved humor that's always fresh and ready to make you smile.

  3. h

    short_jokes

    • huggingface.co
    Updated Feb 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yuvraj sharma (2024). short_jokes [Dataset]. https://huggingface.co/datasets/ysharma/short_jokes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 22, 2024
    Authors
    yuvraj sharma
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes. You can visit the Github… See the full description on the dataset page: https://huggingface.co/datasets/ysharma/short_jokes.

  4. jokes dataset

    • kaggle.com
    zip
    Updated Jan 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaroslav-siberia (2022). jokes dataset [Dataset]. https://www.kaggle.com/datasets/yaroslav62/jokes-dataset
    Explore at:
    zip(7775133 bytes)Available download formats
    Dataset updated
    Jan 18, 2022
    Authors
    Yaroslav-siberia
    Description

    Dataset

    This dataset was created by Yaroslav-siberia

    Contents

  5. h

    one-million-reddit-jokes

    • huggingface.co
    Updated Nov 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SocialGrep (2021). one-million-reddit-jokes [Dataset]. https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 1, 2021
    Authors
    SocialGrep
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for one-million-reddit-jokes

      Dataset Summary
    

    This corpus contains a million posts from /r/jokes. Posts are annotated with their score.

      Languages
    

    Mainly English.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    A data point is a Reddit post.

      Data Fields
    

    'type': the type of the data point. Can be 'post' or 'comment'. 'id': the base-36 Reddit ID of the data point. Unique when combined with type. 'subreddit.id': the base-36 Reddit ID… See the full description on the dataset page: https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes.

  6. Joke Dataset

    • kaggle.com
    zip
    Updated Feb 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brendan Finan (2018). Joke Dataset [Dataset]. https://www.kaggle.com/datasets/bfinan/jokes-question-and-answer
    Explore at:
    zip(6121780 bytes)Available download formats
    Dataset updated
    Feb 10, 2018
    Authors
    Brendan Finan
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    My goal with this dataset is to create the largest and most organized dataset of jokes.

    Tools for this dataset are on my Github

    Content

    • Jokes reduced to only the Question and the Answer.
    • Duplicates NOT removed
    • Offensive jokes NOT removed

    Acknowledgements

    Question-Answer Jokes by Jiri Roznovjak

    Short Jokes by Abhinav Moudgil

    Inspiration

    Humor is one of the most difficult domains of natural language processing.

    Contribute

    If you want to help rate the jokes based on funniness and/or vulgarity, download the .csv and make new column(s) with your rating(s). Email that to bfinan@iastate.edu, and I'll add your ratings as part of the dataset.

  7. h

    programming-jokes-dataset

    • huggingface.co
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asfandyar Azhar (2024). programming-jokes-dataset [Dataset]. https://huggingface.co/datasets/asfandyarazhar/programming-jokes-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2024
    Authors
    Asfandyar Azhar
    Description

    Programming Jokes Dataset

      Dataset Summary
    

    This dataset contains programming-related jokes scraped from the website Punny Funny. The jokes are organized into different categories based on the structure of the original webpage. The dataset is intended for use in natural language processing tasks, such as fine-tuning language models to generate humor or analyze textual content in the programming domain. Number of Jokes: [220]

      Usage
    

    This dataset is suitable for… See the full description on the dataset page: https://huggingface.co/datasets/asfandyarazhar/programming-jokes-dataset.

  8. h

    jokes-dataset

    • huggingface.co
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rayhana Rafiai (2025). jokes-dataset [Dataset]. https://huggingface.co/datasets/rayhanti/jokes-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2025
    Authors
    Rayhana Rafiai
    Description

    rayhanti/jokes-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. Short Jokes Dataset

    • kaggle.com
    zip
    Updated Dec 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Short Jokes Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/short-jokes-dataset/suggestions?status=pending&yourSuggestions=true
    Explore at:
    zip(9673796 bytes)Available download formats
    Dataset updated
    Dec 5, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Short Jokes Dataset

    Humorous Short Jokes

    By Fraser Greenlee (From Huggingface) [source]

    About this dataset

    This dataset offers a valuable resource for various applications such as natural language processing, sentiment analysis, joke generation algorithms, or simply for entertainment purposes. Whether you're a data scientist looking to analyze humor patterns or an individual seeking some quick comedic relief, this dataset has got you covered.

    By utilizing this dataset, researchers can explore different aspects of humor and study the linguistic features that make these short jokes amusing. Moreover, it provides an opportunity for developing computer models capable of generating similar humorous content based on learned patterns.

    How to use the dataset

    • Understanding the Columns:

      • text: This column contains the text of the short joke.
      • **text: No information is provided about this column.
    • Exploring the Jokes:

      • Start by exploring the text column, which contains the actual jokes. You can read through them and have a good laugh!
    • Analyzing the Jokes:

      • To gain insights from this dataset, you can perform various analyses:
        • Sentiment Analysis: Use Natural Language Processing techniques to analyze the sentiment of each joke.
        • Categorization: Group jokes based on common themes or subjects, such as animals, professions, etc.
        • Length Distribution: Analyze and visualize the distribution of joke lengths.
    • Creating New Content or Applications: Since this dataset provides a large collection of short jokes, you can utilize it creatively:

      • Generating Random Jokes: Develop an algorithm that generates new jokes based on patterns found in this dataset.
      • Humor Classification: Build a model that predicts if a given piece of text is funny or not using machine learning techniques.
    • Sharing Your Findings: If you make interesting discoveries or create unique applications using this dataset, consider sharing them with others in Kaggle community.

    Please note that no information regarding dates is available in train.csv; therefore, any temporal analysis or date-based insights won't be feasible with this specific file.

    Research Ideas

    • Analyzing humor patterns: This dataset can be used to analyze different types of humor and identify patterns or common elements in jokes that make them funny. Researchers and linguists can use this dataset to gain insights into the structure, wordplay, or comedic techniques used in short jokes.
    • Natural language processing: With the text data available in this dataset, it can be used for training models in natural language processing (NLP) tasks such as sentiment analysis, joke generation, or understanding humor from written text. NLP researchers and developers can utilize this dataset to build and improve algorithms for detecting or generating funny content.
    • Social media analysis: Short jokes are popular on social media platforms like Twitter or Reddit where users frequently share humorous content. This dataset can be valuable for analyzing the reception and impact of these jokes on social media platforms. By examining trends, engagement metrics, or user reactions to specific jokes from the dataset, marketers or social media analysts can gain insights into what type of humor resonates with different online communities. Overall, this dataset provides a rich resource for exploring various aspects related to humor analysis and NLP tasks while offering opportunities for sociocultural studies related to online comedy culture

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:----------------------------------------------| | text | The actual content of the short jokes. (Text) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Fraser Greenlee (From Huggingface).

  10. w

    Dataset of books called Jokes, jests and jollies

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Jokes, jests and jollies [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Jokes%2C+jests+and+jollies
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Jokes, jests and jollies. It features 7 columns including author, publication date, language, and book publisher.

  11. w

    Dataset of books called Monster jokes

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Monster jokes [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Monster+jokes
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 3 rows and is filtered where the book is Monster jokes. It features 7 columns including author, publication date, language, and book publisher.

  12. A

    Hungarian Covid jokes and memes

    • repo.researchdata.hu
    jpeg, pdf, png
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katalin Vargha; Katalin Vargha (2024). Hungarian Covid jokes and memes [Dataset]. https://repo.researchdata.hu/dataset.xhtml?persistentId=hdl:21.15109/CONCORDA/WEF6GB
    Explore at:
    jpeg(58524), jpeg(47183), jpeg(125756), jpeg(33716), jpeg(39797), jpeg(37204), jpeg(85549), pdf(89381), jpeg(29167), jpeg(71654), jpeg(53382), jpeg(22656), jpeg(67821), jpeg(165438), pdf(90467), jpeg(25594), jpeg(65998), jpeg(37912), jpeg(58766), pdf(260030), jpeg(37794), jpeg(139645), jpeg(44722), jpeg(65618), jpeg(132144), jpeg(43880), jpeg(78150), jpeg(110836), pdf(87611), jpeg(19817), jpeg(54007), jpeg(81739), jpeg(82960), jpeg(89443), jpeg(100031), jpeg(28705), jpeg(119346), jpeg(65265), jpeg(69211), jpeg(43335), pdf(95352), jpeg(82171), jpeg(59015), jpeg(27512), jpeg(49612), jpeg(147611), jpeg(53115), pdf(96664), jpeg(55006), jpeg(62622), jpeg(96343), jpeg(53452), jpeg(52603), jpeg(43133), jpeg(69445), jpeg(101247), jpeg(48250), jpeg(75661), pdf(92466), jpeg(36685), jpeg(70537), jpeg(71302), jpeg(145465), jpeg(40710), jpeg(83383), jpeg(61333), jpeg(9046), jpeg(80442), jpeg(96445), jpeg(81797), jpeg(60587), jpeg(74652), pdf(90301), pdf(91037), jpeg(95478), jpeg(40198), jpeg(117156), jpeg(72318), pdf(128299), pdf(84791), jpeg(83359), jpeg(55766), jpeg(138002), jpeg(132059), jpeg(26761), jpeg(43259), pdf(91990), jpeg(184586), jpeg(18087), jpeg(55779), jpeg(146108), jpeg(42427), jpeg(154117), jpeg(60708), pdf(87274), pdf(90370), jpeg(136209), pdf(388276), pdf(89193), jpeg(91880), jpeg(118814), jpeg(67690), jpeg(62250), jpeg(44643), jpeg(192409), pdf(83300), jpeg(51566), jpeg(102318), jpeg(83186), jpeg(116378), jpeg(191976), jpeg(84013), jpeg(97862), jpeg(83231), jpeg(102113), jpeg(62109), jpeg(54274), jpeg(48370), jpeg(57373), jpeg(44848), pdf(89876), jpeg(82808), jpeg(22741), jpeg(34134), jpeg(150142), jpeg(57479), pdf(89199), jpeg(63573), jpeg(62054), pdf(376887), png(287379), jpeg(81240), jpeg(56965), jpeg(63252), pdf(89235), pdf(89785), jpeg(37292), jpeg(46420), pdf(89684), jpeg(89748), jpeg(31776), jpeg(78542), jpeg(46107), jpeg(104983), jpeg(23190), jpeg(70395), jpeg(127176), jpeg(63031), jpeg(110174), pdf(266957), jpeg(60289), jpeg(48735), jpeg(78352), jpeg(149482), pdf(201527), jpeg(76238), jpeg(42417), jpeg(58612), jpeg(74748), pdf(268243), jpeg(139995), jpeg(54227), pdf(90665), jpeg(34435), jpeg(34905), jpeg(158762), jpeg(38048), pdf(88682), jpeg(112450), jpeg(44860), jpeg(131044), jpeg(41602), pdf(92425), jpeg(212874), png(84965), jpeg(25415), jpeg(91126), pdf(84861), pdf(284315), jpeg(47865), pdf(87864), pdf(83424), jpeg(62378), pdf(89514), jpeg(101237), jpeg(7774), pdf(93091), jpeg(140198), jpeg(147261), jpeg(65073), jpeg(82149), png(627550), jpeg(72229), jpeg(63444), jpeg(76376), pdf(92769), jpeg(83959), jpeg(70827), jpeg(51799), jpeg(64051), jpeg(300061), jpeg(42644), jpeg(80096), jpeg(46540), jpeg(26131), jpeg(55669), jpeg(92710), jpeg(35436), png(222530), jpeg(87944), jpeg(49581), jpeg(63807), jpeg(142761), pdf(86541), png(123355), jpeg(68753), jpeg(60900), jpeg(54530), jpeg(57655), jpeg(100615), pdf(76594), pdf(270112), pdf(87576), jpeg(54384), pdf(90194), jpeg(57913), jpeg(79530), jpeg(35821), jpeg(11340), jpeg(22875), jpeg(7451), jpeg(49231), pdf(94000), jpeg(83232), jpeg(77305), png(492831), jpeg(35056), jpeg(73608), jpeg(69290), jpeg(78712), jpeg(47408), jpeg(110816), jpeg(9145), pdf(90743), jpeg(52358), jpeg(59513), pdf(94629), jpeg(74750), jpeg(72775), jpeg(58006), jpeg(78537), pdf(104726), jpeg(97147), jpeg(98931), jpeg(41760), jpeg(64580), jpeg(87158), pdf(93154), pdf(89092), jpeg(14995), pdf(194333), pdf(3200), jpeg(93742), pdf(93965), jpeg(53834), jpeg(67868), pdf(91696), jpeg(9307), jpeg(115943), jpeg(98871), jpeg(69975), jpeg(76271), jpeg(74877), jpeg(60619), jpeg(31103), jpeg(84022), jpeg(94425), pdf(92116), pdf(3187), jpeg(43833), jpeg(60640), jpeg(80755), pdf(92212), pdf(92178), jpeg(129853), jpeg(63996), jpeg(39604), jpeg(71427), jpeg(110140), pdf(91789), jpeg(113391), jpeg(46622), jpeg(110892), jpeg(123087), jpeg(32758), pdf(262240), jpeg(38446), png(119547), jpeg(62975), jpeg(104454), jpeg(74644), jpeg(49597), pdf(91371), jpeg(176632), jpeg(144920), jpeg(56669), jpeg(108174), jpeg(104787), jpeg(5525), pdf(94520), jpeg(94231), jpeg(47453), jpeg(92253), jpeg(210173), jpeg(24166), pdf(90868), pdf(90387), jpeg(133769), jpeg(136705), pdf(94610), jpeg(42060), jpeg(77372), jpeg(18799), pdf(88344), pdf(92373), jpeg(50594), jpeg(85010), jpeg(75659), jpeg(60420), jpeg(61079), jpeg(75425), jpeg(55632), pdf(184248), jpeg(57786), jpeg(95975), jpeg(67646), jpeg(82945), jpeg(19137), jpeg(105934), jpeg(55780), pdf(85982), jpeg(65249), jpeg(56744), pdf(88786)Available download formats
    Dataset updated
    May 2, 2024
    Dataset provided by
    ARP
    Authors
    Katalin Vargha; Katalin Vargha
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset includes a collection of Hungarian Internet humour connected to the Covid-19 pandemic. The collection includes 344 items (mostly jokes and memes) that were collected online during the first wave of Covid in Hungary, between January and June 2020.

  13. h

    jokes

    • huggingface.co
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    foong chee how (2024). jokes [Dataset]. https://huggingface.co/datasets/kentfoong/jokes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 24, 2024
    Authors
    foong chee how
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    kentfoong/jokes dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. c

    dad jokes Price Prediction Data

    • coinbase.com
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). dad jokes Price Prediction Data [Dataset]. https://www.coinbase.com/en/price-prediction/base-dad-jokes
    Explore at:
    Dataset updated
    Nov 13, 2025
    Variables measured
    Growth Rate, Predicted Price
    Measurement technique
    User-defined projections based on compound growth. This is not a formal financial forecast.
    Description

    This dataset contains the predicted prices of the asset dad jokes over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.

  15. E

    Data from: Corpus of daily jokes from the 24ur.com portal Å ale24 1.0

    • live.european-language-grid.eu
    binary format
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Corpus of daily jokes from the 24ur.com portal Å ale24 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/23698
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Oct 2, 2024
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This is a corpus of 1915 "jokes of the day" ("šala dneva") published by the Slovenian news portal 24ur.com. The jokes were scraped from their archive on September 18th, 2024. The initial list is lightly curated: shorter texts found in the original collection were removed from the corpus since they appear to be illustration captions without the accompanying illustrations.

    Readers of the news portal vote on the jokes themselves with thumbs up and thumbs down buttons. The voting results are included as metadata with each joke. Several jokes have been published more than once. Each joke (distinguished based on exact text matches) is identified by a hash of its text and presents a list of voting results for every instance of its publication. The normalised_text field contains text with punctuation corrections. For now, this is limited to replacing '' (two consecutive apostrophes U+0027) with " (a single straight/dumb/vertical quotation mark U+0022). The former (two apostrophes) is consistently used in place of the latter in the original corpus.

    Based on the name ("Å ala dneva" i.e. "Joke of the day") and observed frequency of posting during September 2024 we assume each entry corresponds to a day starting from the day of data collection counting backwards. Each voting event for has an associated estimated publication date calculated with the above algorithm.

    The jokes are linguistically annotated with CLASSLA-Stanza (https://github.com/clarinsi/classla), using the models for standard Slovenian. The JSONL file contains entries representing individual jokes containing: - a hash of the original joke text used for duplicate identification (key: hash) - original scraped text (key: original_text) - normalised text (key: normalised_text) - linguistically annotated normalised text in CoNLL-U format (key: processed_text) - a list of vote objects containing joke vote metadata (key: votes) - votes for (key: votes.for) - votes against (key: votes.against) - estimated dates of joke publication and voting (key: estimated_date)

    The corpus contains 16658 sentences, 129063 tokens, and 662 recognised named entities.

  16. f

    Data from: Global geography of jokes

    • scielo.figshare.com
    jpeg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hervé Théry (2023). Global geography of jokes [Dataset]. http://doi.org/10.6084/m9.figshare.14307540.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELO journals
    Authors
    Hervé Théry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract Abstract: Jokes between countries are useful to reveal the ethnotypes existing in each of them, to represent them in cartographic form allows to perceive their distribution and the spatial projection of mockery: who are we laughing at, who are the scapegoats for the inhabitants of each country? Based on the analysis of an ad hoc database covering more than 60% of the countries and territories of the world and 90% of its population, the text shows that these jokes are social constructions, have a temporality and are divided basically in two categories, from top to bottom and from bottom to top.

  17. h

    short-jokes-punchline

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timxjl (2024). short-jokes-punchline [Dataset]. https://huggingface.co/datasets/Timxjl/short-jokes-punchline
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Authors
    Timxjl
    License

    https://choosealicense.com/licenses/gpl-2.0/https://choosealicense.com/licenses/gpl-2.0/

    Description

    Short Jokes Punchline

    This dataset contains information about jokes, visitors, labels, and label segments used in a joke labeling application. The data is stored in four CSV files: joke.csv, visitor.csv, label.csv, and label_segment.csv.

      Files
    
    
    
    
    
      joke.csv
    

    This file contains 200 jokes randomly sampled from the Kaggle dataset "Short Jokes." Each row represents a joke with the following columns:

    id: The unique identifier for the joke. text: The text content of the… See the full description on the dataset page: https://huggingface.co/datasets/Timxjl/short-jokes-punchline.

  18. w

    Dataset of book subjects that contain Jokes my father never taught me :...

    • workwithdata.com
    Updated Nov 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain Jokes my father never taught me : life, love, and loss with Richard Pryor [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Jokes+my+father+never+taught+me+:+life%2C+love%2C+and+loss+with+Richard+Pryor&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 4 rows and is filtered where the books is Jokes my father never taught me : life, love, and loss with Richard Pryor. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  19. Jester Jokes Dataset v4

    • kaggle.com
    zip
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Amine DHIAB (2023). Jester Jokes Dataset v4 [Dataset]. https://www.kaggle.com/datasets/mohamedaminedhiab/jester-jokes-dataset-v4
    Explore at:
    zip(1419440 bytes)Available download formats
    Dataset updated
    Jun 22, 2023
    Authors
    Mohamed Amine DHIAB
    Description

    Dataset

    This dataset was created by Mohamed Amine DHIAB

    Contents

  20. h

    chinese-joke

    • huggingface.co
    Updated Apr 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wangxh (2023). chinese-joke [Dataset]. https://huggingface.co/datasets/notsobad9527/chinese-joke
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2023
    Authors
    wangxh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    notsobad9527/chinese-joke dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fraser Greenlee (2021). short-jokes [Dataset]. https://huggingface.co/datasets/Fraser/short-jokes

short-jokes

Fraser/short-jokes

Explore at:
Dataset updated
Mar 9, 2021
Authors
Fraser Greenlee
Description

Copy of Kaggle dataset, adding to Huggingface for ease of use.

Description from Kaggle:

Context

Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes.

Visit my Github repository for more information regarding collection of data and the scripts used.

Content

This dataset is in the form of a csv file containing 231,657 jokes. Length of jokes ranges from 10 to 200 characters. Each line in the file contains a unique ID and joke.

Disclaimer

It has been attempted to keep the jokes as clean as possible. Since the data has been collected by scraping websites, it is possible that there may be a few jokes that are inappropriate or offensive to some people.

Search
Clear search
Close search
Google apps
Main menu