Dataset Card for poetry
Dataset Summary
It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern
Supported Tasks and Leaderboards
[Needs More Information]
Languages
[Needs More Information]
Dataset Structure
Data Instances
[Needs More Information]
Data Fields
Has 5 columns:
Content Author Poem name Age Type… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.
Capturing emotion from reviews and tweets is a well studied task. reviews and tweets are not abundant with emotions, where poetry is a text which is abundant with emotions, so capturing emotions from poetry is an interesting task. In this regard we have collected poems from Poemhunter.com(we thank the website owners) and created a dataset and manually annotated the poems with 5 emotions namely Fear, Sad, Surprise, Happy and Angry. This dataset comprise of 3 files 1. ABIEMO: American, British and Indian poets poems 2. CAPEMO: Augmented Poems to resolve class imbalance problem using NLPAUG library(we thank the library developers) 3. BAPEMO: Extended Augmented poems to resolve class imbalance problem
along with emotion country of poem is also assigned. We can use this dataset to perform poet style analysis, emotion analysis country wise differences in poetry etc.
This file contains nearly all poems from the Poetry Foundation Website. Content All poems have a title and author. Most poems are also labeled with the tags as available from the Poetry Foundation Website. The word cloud above shows the most used tags! Inspiration This dataset can be used for a variety of tasks related to poetry writing.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A blackout poetry dataset constructed from publicly available short stories and large poems. The dataset consists of two variants: 8K and 16K examples of passages along with a poem generated from the passage and the indices of the words in the passage from which words in the poem have been selected. The dataset also contains perplexity scores for each of the poems indicating the language quality of the poems.
The dataset was constructed synthetically, and hence contains multiple poor poems and frequent grammatical errors. However, it is a great starting point for the task of applying machine learning to blackout poetry generation.
The dataset was first introduced in MAPLE – MAsking words to generate blackout Poetry using sequence-to-sequence LEarning.
The dataset has two variants: - 8K (sampled poems from the 16K dataset with the lowest perplexity scores) - 16K
Both variants contain data in the following format:
passage | poem | indices |
---|---|---|
Did the CIA tell the FBI that it knows the wor... | cia fbi the biggest weapon | [2, 5, 9, 24, 25] |
A vigilante lacking of heroic qualities that | ||
... | lacking qualities that damn criminals | [2, 5, 6, 11, 12] |
The passage is the text from which the poem is generated. The poem is the generated poem. The indices are the indices of the words in the text that are chosen for the poem.
This dataset was generated synthetically using Liza Daly's pattern matching based blackout poetry generation.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Allison Parrish's Gutenberg Poetry Corpus
This corpus was originally published under the CC0 license by Allison Parrish. Please visit Allison's fantastic accompanying GitHub repository for usage inspiration as well as more information on how the data was mined, how to create your own version of the corpus, and examples of projects using it. This dataset contains 3,085,117 lines of poetry from hundreds of Project Gutenberg books. Each line has a corresponding gutenberg_id (1191… See the full description on the dataset page: https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus.
The largest Arabic poetry dataset that contains more than 2.09 million verses. The dataset is comprehensive and contains additional information associated for each verse such as poet's name, poem's title, era, meter, sub-meter, etc.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Pham Tuyet
Released under MIT
A selection of poetic texts (71,490 words) from the Old English Section of the Helsinki Corpus of English Texts, syntactically and morphologically annotated.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Pham Tuyet
Released under Apache 2.0
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Mirza Ahmad Awais
Released under CC0: Public Domain
Introduction
CCPM is a large Chinese classical poetry matching dataset that can be used for poetry matching, understanding and translation.
The main task of this dataset is: given a description in modern Chinese, the model is supposed to select one line of Chinese classical poetry from four candidates that semantically match the given description most.
Size
It contains 27,218 instances in total, which are split into training (21,778), validation (2,720) and test (2,720) sets.
Format
Each instance is composed of translation (the description in modern Chinese, a string), choice (four candidate lines of Chinese classical poetry, a list) and answer (the index of the correct line, an integer between 0 and 3).
VidaEdco/prompt-poem-dataset-20240921_004141 dataset hosted on Hugging Face and contributed by the HF Datasets community
The Chinese Poetry dataset is a dataset of Chinese poems used for language modeling.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Open Poetry Vision
dataset is a synthetic dataset created by Roboflow for OCR tasks.
It combines a random image from the Open Images Dataset with text primarily sampled from Gwern's GPT-2 Poetry project. Each image in the dataset contains between 1 and 5 strings in a variety of fonts and colors randomly positioned in the 512x512 canvas. The classes correspond to the font of the text.
Example Image:
https://i.imgur.com/sZT516a.png" alt="Example Image">
A common OCR workflow is to use a neural network to isolate text for input into traditional optical character recognition software. This dataset could make a good starting point for an OCR project like business card parsing or automated paper form-processing.
Alternatively, you could try your hand using this as a neural font identification dataset. Nvidia, amongst others, have had success with this task.
Use the fork
button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a table of word counts for a collection of 75,297 English-language poems.
The ultimate goal of the Program for the Exploration of the Eastern Mediterranean (POEM) is to reach a comprehensive knowledge of the physical, chemical, and biological oceanography of the Eastern Mediterranean. Such knowledge is an essential basis for environmental management, resource exploration, and marine operations. The overall scientific objectives are to: (1) describe the physical phenomena and quantify their kinematics; (2) define basic dynamical processes; and (3) construct physical models suitable for general ocean scientific studies and applications.
https://choosealicense.com/licenses/agpl-3.0/https://choosealicense.com/licenses/agpl-3.0/
From: https://www.kaggle.com/datasets/tgdivy/poetry-foundation-poems Poetry Foundation Poems Dataset Overview This dataset contains a collection of 13.9k poems sourced from the Poetry Foundation website. Each poem entry includes its title, author, and associated tags (if available). The dataset provides a robust resource for exploring poetry, analyzing thematic trends, or creating applications such as poem generators. Dataset Structure The dataset consists of the following columns: 1. Title:… See the full description on the dataset page: https://huggingface.co/datasets/suayptalha/Poetry-Foundation-Poems.
This dataset was created by mira318
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books and is filtered where the book is Antebellum American women's poetry : a rhetoric of sentiment, featuring 7 columns including author, BNB id, book, book publisher, and ISBN. The preview is ordered by publication date (descending).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This report evaluates the impact of Cancer Alley, an immersive poetry film created by poet Lucy English and filmmakers Pamela Faulkenberg and Jack Cochran and screened at the Watershed as part of the Lyra Bristol Poetry Festival 2024. This report examines data collected before, during, and after the screening from a group of poets and filmmakers as well as members of the audience, through in-depth qualitative analysis of questionnaires, interviews, focus group, and journals.
Dataset Card for poetry
Dataset Summary
It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern
Supported Tasks and Leaderboards
[Needs More Information]
Languages
[Needs More Information]
Dataset Structure
Data Instances
[Needs More Information]
Data Fields
Has 5 columns:
Content Author Poem name Age Type… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.