rayhanti/programming-jokes-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Context Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes. You can visit the Github… See the full description on the dataset page: https://huggingface.co/datasets/ysharma/short_jokes.
6.5 million anonymous ratings of jokes by users of the Jester Joke Recommender System.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is a corpus of 1915 "jokes of the day" ("šala dneva") published by the Slovenian news portal 24ur.com. The jokes were scraped from their archive on September 18th, 2024. The initial list is lightly curated: shorter texts found in the original collection were removed from the corpus since they appear to be illustration captions without the accompanying illustrations. Readers of the news portal vote on the jokes themselves with thumbs up and thumbs down buttons. The voting results are included as metadata with each joke. Several jokes have been published more than once. Each joke (distinguished based on exact text matches) is identified by a hash of its text and presents a list of voting results for every instance of its publication. The normalised_text field contains text with punctuation corrections. For now, this is limited to replacing '' (two consecutive apostrophes U+0027) with " (a single straight/dumb/vertical quotation mark U+0022). The former (two apostrophes) is consistently used in place of the latter in the original corpus. Based on the name ("Šala dneva" i.e. "Joke of the day") and observed frequency of posting during September 2024 we assume each entry corresponds to a day starting from the day of data collection counting backwards. Each voting event for has an associated estimated publication date calculated with the above algorithm. The jokes are linguistically annotated with CLASSLA-Stanza (https://github.com/clarinsi/classla), using the models for standard Slovenian. The JSONL file contains entries representing individual jokes containing: - a hash of the original joke text used for duplicate identification (key: hash) - original scraped text (key: original_text) - normalised text (key: normalised_text) - linguistically annotated normalised text in CoNLL-U format (key: processed_text) - a list of vote objects containing joke vote metadata (key: votes) - votes for (key: votes.for) - votes against (key: votes.against) - estimated dates of joke publication and voting (key: estimated_date)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books and is filtered where the book is Best teenage jokes, featuring 7 columns including author, BNB id, book, book publisher, and ISBN. The preview is ordered by publication date (descending).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Avneet Singh
Released under Apache 2.0
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Statistics illustrates consumption, production, prices, and trade of Festive, carnival or other entertainment articles, including conjuring tricks and novelty jokes in Aruba from 2007 to 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects and is filtered where the books is Space facts & jokes book, featuring 10 columns including authors, average publication date, book publishers, book subject, and books. The preview is ordered by number of books (descending).
Ayush-Singh/jokes-pizza dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
Corpus for testing whether your LLM can explain the joke well. But this is a rather small dataset, if someone can point to a larger ones would be very nice.
Languages
English
Dataset Structure
Data Fields
url : link to the explaination
joke : the original joke
explaination : the explaination of the joke
Data Splits
Since its so small, there's no splits… See the full description on the dataset page: https://huggingface.co/datasets/theblackcat102/joke_explaination.
https://whoisdatacenter.com/index.php/terms-of-use/https://whoisdatacenter.com/index.php/terms-of-use/
Explore the historical Whois records related to adult-jokes.net (Domain). Get insights into ownership history and changes over time.
Ayush-Singh/jokes-new dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Statistics illustrates consumption, production, prices, and trade of Festive, carnival or other entertainment articles, including conjuring tricks and novelty jokes in Martinique from 2007 to 2024.
http://n2t.net/ark:/87925/h1cc0xm5http://n2t.net/ark:/87925/h1cc0xm5
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Story collected by Brigid Browne, a student at Dromina, Ráth Luirc school (Dromina, Co. Cork) from informant John Browne.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books and is filtered where the book is Really, really gross jokes, riddles, and tongue twisters, featuring 7 columns including author, BNB id, book, book publisher, and ISBN. The preview is ordered by publication date (descending).
http://n2t.net/ark:/87925/h1cc0xm5http://n2t.net/ark:/87925/h1cc0xm5
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Story collected by Betty Feely, a student at An Clochar, Cara Droma Ruisc school (Carrick-on-Shannon, Co. Leitrim) (no informant identified).
http://n2t.net/ark:/87925/h1cc0xm5http://n2t.net/ark:/87925/h1cc0xm5
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Story collected by a student at Lismacaffry school (Lismacaffry, Co. Westmeath) from informant John Leslie.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Bulgarian Jokes Dataset
Overview
The Bulgarian Jokes Dataset is a collection of Bulgarian-language jokes gathered and prepared for use in training and fine-tuning natural language processing (NLP) models. This dataset is designed to help researchers and developers build models capable of understanding and generating humorous content in Bulgarian.
Dataset Structure
The dataset is structured in a format suitable for NLP training and fine-tuning tasks… See the full description on the dataset page: https://huggingface.co/datasets/vislupus/alpaca-bulgarian-jokes.
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Explore the historical Whois records related to jokes-plus.com (Domain). Get insights into ownership history and changes over time.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Silly jokes is a book. It was written by Claire Fletcher and published by Helen Exley Gift books in 2010.
rayhanti/programming-jokes-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community