Dataset Card for poetry
Dataset Summary
It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern
Supported Tasks and Leaderboards
[Needs More Information]
Languages
[Needs More Information]
Dataset Structure
Data Instances
[Needs More Information]
Data Fields
Has 5 columns:
Content Author Poem name Age Type… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Gutenberg Poem Dataset
Dataset Summary
Poem Sentiment is a sentiment dataset of poem verses from Project Gutenberg. This dataset can be used for tasks such as sentiment classification or style transfer for poems.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
The text in the dataset is in English (en).
Dataset Structure
Data Instances
Example of one instance in the… See the full description on the dataset page: https://huggingface.co/datasets/google-research-datasets/poem_sentiment.
A blackout poetry dataset constructed from publicly available short stories and large poems. The dataset consists of two variants: 8K and 16K examples of passages along with a poem generated from the passage and the indices of the words in the passage from which words in the poem have been selected. The dataset also contains perplexity scores for each of the poems indicating the language quality of the poems.
The dataset was constructed synthetically, and hence contains multiple poor poems and frequent grammatical errors. However, it is a great starting point for the task of applying machine learning to blackout poetry generation.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Allison Parrish's Gutenberg Poetry Corpus
This corpus was originally published under the CC0 license by Allison Parrish. Please visit Allison's fantastic accompanying GitHub repository for usage inspiration as well as more information on how the data was mined, how to create your own version of the corpus, and examples of projects using it. This dataset contains 3,085,117 lines of poetry from hundreds of Project Gutenberg books. Each line has a corresponding gutenberg_id (1191… See the full description on the dataset page: https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a table of word counts for a collection of 75,297 English-language poems.
This dataset was created by Jamie Wang
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Mirza Ahmad Awais
Released under CC0: Public Domain
iamketan25/poem-instructions-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Even though there is available of the lexicon in emotion analysis, to identify emotion from poems had to rely on limited emotion lexicons. Since those lexicons are not created for poems, and it is not concentrated on poetic features. This paper presents a text corpus PERC(Poem Emotion Recognition Corpus) comprising a set of poems and features for emotion recognition from poems. Emotion classication is based on 'Navarasa,' described in 'Natyasastra.' Navarasa consists of nine primary emotions such as Love, Sad, Anger, Hate, Fear, Surprise, Courage, Joy, and Peace. Although there are many text corpus for emotion recognition, we do not know of a text corpus for poems based on nine emotions. The corpus created is from an exhaustive collection of poems of Indian poets for the period 1850-2016. The novelty of this work is the creation of a corpus using poems mined from the web and evaluated by human experts.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The largest Arabic poetry dataset that contains more than 2.09 million verses. The dataset is comprehensive and contains additional information associated for each verse such as poet's name, poem's title, era, meter, sub-meter, etc.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Pham Tuyet
Released under MIT
A selection of poetic texts (71,490 words) from the Old English Section of the Helsinki Corpus of English Texts, syntactically and morphologically annotated.
Introduction
CCPM is a large Chinese classical poetry matching dataset that can be used for poetry matching, understanding and translation.
The main task of this dataset is: given a description in modern Chinese, the model is supposed to select one line of Chinese classical poetry from four candidates that semantically match the given description most.
Size
It contains 27,218 instances in total, which are split into training (21,778), validation (2,720) and test (2,720) sets.
Format
Each instance is composed of translation (the description in modern Chinese, a string), choice (four candidate lines of Chinese classical poetry, a list) and answer (the index of the correct line, an integer between 0 and 3).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The childPoeDE corpus is a collection of 1082 German poems for children created within the CHYLSA project. The poems were taken from anthologies published between 1991 and 2019. This publication includes the poem-level metadata for each poem with information about the author, the poem's length, data on case, punctuation, layout, rhyme, type-token ratio (TTR and MATTR) and lexical density. It also includes token-level metadata, namely word length and position, POS tags in different levels of granularity as well as data on onomatopoeia and sonority. Furthermore, this publication provides a word frequency table and a Python script which was used to extract some of the metadata from the texts (poemtool.py). The childPoeDE corpus does not contain all poems from the anthologies. A list of the poems that have been omitted for different reasons (length, language, typography, ...) can be accessed as well.
Read more about the childPoeDE corpus in our data paper published in the Journal of Open Humanities Data: The ChildPoeDE Corpus: 1082 German Children’s Poems for Computational and Experimental Studies on Poetry Reception.
DFG Schwerpunktprogramm SPP 2207 “Computational Literary Studies“
Online:
Subproject: „CHYLSA (Children’s and Youth Literature Sentiment Analysis)“
Online:
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
matthh/gutenberg-poetry-corpus dataset hosted on Hugging Face and contributed by the HF Datasets community
The ultimate goal of the Program for the Exploration of the Eastern Mediterranean (POEM) is to reach a comprehensive knowledge of the physical, chemical, and biological oceanography of the Eastern Mediterranean. Such knowledge is an essential basis for environmental management, resource exploration, and marine operations. The overall scientific objectives are to: (1) describe the physical phenomena and quantify their kinematics; (2) define basic dynamical processes; and (3) construct physical models suitable for general ocean scientific studies and applications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books and is filtered where the book is Poetry : reading, reacting, writing. It has 7 columns such as book, author, ISBN, BNB id, and language. The data is ordered by publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about stocks and is filtered where the company is Poem. It has 8 columns such as stock, stock name, exchange, exchange symbol, and timezone.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books and is filtered where the book publisher is The Hedgehog Poetry Press. It has 7 columns such as book, author, ISBN, BNB id, and language. The data is ordered by publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Open Poetry Vision
dataset is a synthetic dataset created by Roboflow for OCR tasks.
It combines a random image from the Open Images Dataset with text primarily sampled from Gwern's GPT-2 Poetry project. Each image in the dataset contains between 1 and 5 strings in a variety of fonts and colors randomly positioned in the 512x512 canvas. The classes correspond to the font of the text.
Example Image: https://i.imgur.com/sZT516a.png" alt="Example Image">
A common OCR workflow is to use a neural network to isolate text for input into traditional optical character recognition software. This dataset could make a good starting point for an OCR project like business card parsing or automated paper form-processing.
Alternatively, you could try your hand using this as a neural font identification dataset. Nvidia, amongst others, have had success with this task.
Use the fork
button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
Dataset Card for poetry
Dataset Summary
It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern
Supported Tasks and Leaderboards
[Needs More Information]
Languages
[Needs More Information]
Dataset Structure
Data Instances
[Needs More Information]
Data Fields
Has 5 columns:
Content Author Poem name Age Type… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.