100+ datasets found

h
poetry
huggingface.co
Updated Nov 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
merve (2021). poetry [Dataset]. https://huggingface.co/datasets/merve/poetry
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 3, 2021
Authors
merve
Description
Dataset Card for poetry

Dataset Summary

It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern

Supported Tasks and Leaderboards

[Needs More Information]

Languages

[Needs More Information]

Dataset Structure Data Instances

[Needs More Information]

Data Fields

Has 5 columns:

Content Author Poem name Age Type

Data Splits

Only training… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.
American,British,Indian Emotion poetry dataset
kaggle.com
zip
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pkkazipeta143 (2024). American,British,Indian Emotion poetry dataset [Dataset]. https://www.kaggle.com/datasets/pkkazipeta143/americanbritishindian-emotion-poetry-dataset
Explore at:
zip(4324938 bytes)Available download formats
Dataset updated
Apr 6, 2024
Authors
pkkazipeta143
Area covered
United Kingdom, United States
Description
Capturing emotion from reviews and tweets is a well studied task. reviews and tweets are not abundant with emotions, where poetry is a text which is abundant with emotions, so capturing emotions from poetry is an interesting task. In this regard we have collected poems from Poemhunter.com(we thank the website owners) and created a dataset and manually annotated the poems with 5 emotions namely Fear, Sad, Surprise, Happy and Angry. This dataset comprise of 3 files 1. ABIEMO: American, British and Indian poets poems 2. CAPEMO: Augmented Poems to resolve class imbalance problem using NLPAUG library(we thank the library developers) 3. BAPEMO: Extended Augmented poems to resolve class imbalance problem

along with emotion country of poem is also assigned. We can use this dataset to perform poet style analysis, emotion analysis country wise differences in poetry etc.
Poetry reading by young people in the United Kingdom (UK) 2017-2025
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Poetry reading by young people in the United Kingdom (UK) 2017-2025 [Dataset]. https://www.statista.com/statistics/299083/poem-reading-by-young-people-in-the-uk/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United Kingdom
Description
In 2025, just under 18 percent of school children aged eight to 18 years old reported reading poems in print form in their free time, down slightly from previous years. Song lyrics were the most popular option for kids and young people wanting to read outside of school, with more than 60 percent engaging with song lyrics on-screen.
Poetry Analysis Data
kaggle.com
zip
Updated Jul 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JPSS (2017). Poetry Analysis Data [Dataset]. https://www.kaggle.com/jatindersehdev/poetry-analysis-data
Explore at:
zip(242789 bytes)Available download formats
Dataset updated
Jul 8, 2017
Authors
JPSS
Description
Context

This contains a set of poems required for analysis for a poem

Content

selection of poems from poetryfoundation .com

Acknowledgements

Poems from Poetryfoundation.com

Inspiration

It is interesting to use this data, only for the purpose of pure research on the capability of AI & ML to classify poems.
Readership of poetry in the U.S. 2012-2017, by gender
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Readership of poetry in the U.S. 2012-2017, by gender [Dataset]. https://www.statista.com/statistics/971677/poetry-reading-by-gender/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
This statistic shows the share of adults reading poetry in the United States in 2012 and 2017, broken down by gender. The data reveals that the share of surveyed women in the U.S. reading poetry increased significantly in five years, growing from ***** percent in 2012 to **** percent in 2017.
Data from: Armenian poems dataset
kaggle.com
zip
Updated Apr 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
edwuloderdo (2025). Armenian poems dataset [Dataset]. https://www.kaggle.com/datasets/eduardarzumanyan/armenian-poems-dataset/discussion
Explore at:
zip(70195 bytes)Available download formats
Dataset updated
Apr 13, 2025
Authors
edwuloderdo
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
📜 Dataset Description: Armenian Poems Collection

This dataset contains a curated collection of Armenian poems written in the beautiful and expressive Armenian language. The goal of this dataset is to support research and creative applications in natural language processing (NLP), poetry generation, language modeling, and cultural preservation. 📝 Contents

Each entry in the dataset includes:

Վերնագիր (Title): The name or theme of the poem Տեքստ (Text): The full content of the poem

The poems vary in length, style, and subject matter, reflecting the richness of Armenian literary tradition and contemporary expression. 🌍 Use Cases

Language modeling for low-resource languages Poetry or text generation models Sentiment or stylistic analysis Educational or linguistic research Cultural and literary exploration

⚖️ License

This dataset is licensed under CC BY 4.0, which allows for use, sharing, and modification with proper attribution. 🙏 Credits

This dataset was compiled and shared to promote the beauty and uniqueness of Armenian poetry in the machine learning and NLP communities.
Adults reading poetry in the U.S. 2002-2017
statista.com
Updated Sep 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2018). Adults reading poetry in the U.S. 2002-2017 [Dataset]. https://www.statista.com/statistics/971672/reading-of-poetry-us/
Explore at:
Dataset updated
Sep 15, 2018
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
This statistic shows the share of adults reading poetry in the United States from 2002 to 2017. The data shows that 11.7 percent of surveyed U.S. adults were reading poetry in 2017, up from 6.7 percent five years previously.
f
Dataset: What the Eyes Reveal about (Reading) Poetry
datasetcatalog.nlm.nih.gov
figshare.com
Updated Dec 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wallot, Sebastian; Menninghaus, Winfried (2020). Dataset: What the Eyes Reveal about (Reading) Poetry [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000555746
Explore at:
Dataset updated
Dec 16, 2020
Authors
Wallot, Sebastian; Menninghaus, Winfried
Description
dataPOEM.csv The dataPOEM.csv data set contains data on the level of each poem. scoresAes = factor scores of moving, beauty, and melodious ratings. participant = participant number poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) poemIdentity = poem number avgWFreq = average word frequency of poem totalGazeSlopeLineLength totalGazeWordMeanNAByWordLen totalGazeWordMeanNADiff order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) firstFixDurMS_MINFIX_AVG = first fixation duration totalGazeMS_MINFIX_AVG = total gaze durations fixDurMS_MINFIX_NUM = number of fixations sacLenMS_MINFIX_AVG = average saccade length percRegMS_MINFIX_AVG = percentage of regressive eye movements pupilDial_AVG = average pupil dilation blink_NUM_TotalRT = number of blinks relative to total reading time totalReadingTime = total reading time of the poem areaTT = total score of the Aesthetic Responsiveness Assessment questionnaire dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem moving = rating of how moving the poem was beauty = rating of how beautiful the poem was melodious = rating of how melodious the poem was dataROI.csv The dataROI.csv data set contains data on the level of each line within a poem. order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) participant = participant number poemIdentity = poem number lineNr = line number within poem poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) verseEnd = wheter a particular word/line was the last line of a stanza (0 = word/line within a stanza, 1 = last word/line of a stanza) BeginCloseRhyme = whether a particular line’s final word marked the opening or closing of a rhyme pair (1 = opening of rhyme, 2 = closing of rhyme) lastFix = whether a particular line or word was the last one of the poem (0 = word/line within a poem, 1 = last word/line of poem) totalGazeByWordNA = total gaze duration of final word of a line relative to word length gazeByLineLengthNA = total gaze duration of a line relative to line length dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem
Persian Poetry Corpus from Ganjoor
kaggle.com
zip
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
amirreza jamadi (2025). Persian Poetry Corpus from Ganjoor [Dataset]. https://www.kaggle.com/datasets/amirrezajamadi/persian-poetry-corpus-from-ganjoor
Explore at:
zip(181241098 bytes)Available download formats
Dataset updated
Nov 24, 2025
Authors
amirreza jamadi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
# Summary

This dataset offers a clean, structured, and ready-to-use collection of classical and modern Persian poetry, sourced from the well-known Ganjoor digital library. It is built for those who want fast access to high-quality Persian text for NLP, data analysis, or digital humanities projects.

The dataset includes: - ALL_POEMS_MERGED: A single master file with all poems - POETS_SUMMARY: a summary file containing biographies and key metadata about poets. - individual files for over 200 poets

Context

Ganjoor is the largest open digital archive of Persian poetry. Its mission is to preserve and freely share the literary heritage of the Persian language and culture.

While Ganjoor provides its database publicly, it is not formatted for immediate use in data science workflows. This dataset provides a cleaned, analysis-ready version of their poetry corpus—making it easier for anyone working in NLP or computational literature to engage with this rich source of text.

1. The Master Corpus File (ALL_POETS_MERGED.csv):

This is the primary file for most analysis, containing every processed verse from all poets in a single, convenient table.

id: The unique identifier for the poem.

avatar: A URL to an image of the poet.

poet: The poet's name in Persian (Farsi).

collection: The name of the collection or book the poem belongs to.

poem_title: The title of the poem.

vorder: The sequential order of the verse within the poem.

verse: The full, combined text of the verse.

url: The original Ganjoor URL for the poem.

2. The Poet Summary File (POETS_SUMMARY_WITH_DESCRIPTIONS.csv)

This file provides high-level information and metadata for each poet included in the dataset.

avatar_url: A URL to an image of the poet from Ganjoor.

persian_name: The poet's name in Persian.

english_name: The poet's English name, based on the URL.

collections: The total number of unique collections by the poet in this dataset.

poems: The total number of unique poems by the poet in this dataset.

url: The URL to the poet's main page on Ganjoor.

description: A short biography of the poet from Ganjoor.

3. Individual Poet Files (e.g., hafez.csv, saadi.csv, ...)

For convenience, a dedicated CSV file is provided for each poet. These files follow the same structure as the ALL_POETS_MERGED.csv file.

*This dataset would not be possible without the monumental work of the Ganjoor team. All credit for the underlying poetry, biographies, and metadata belongs to Ganjoor.net.
Arabic Poem Comprehensive Dataset (APCD)
kaggle.com
zip
Updated Nov 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Khaled Elsafty (2022). Arabic Poem Comprehensive Dataset (APCD) [Dataset]. https://www.kaggle.com/datasets/mohamedkhaledelsafty/best-arabic-poem-comprehensive-dataset
Explore at:
zip(183473042 bytes)Available download formats
Dataset updated
Nov 14, 2022
Authors
Mohamed Khaled Elsafty
Description
Poem Comprehensive Dataset (PCD)

Arabic PCD (APCD)

This data I get from Here

Description

The Arabic dataset is scraped mainly from الموسوعة الشعرية and الديوان. After merging both, the total number of verses is 1,831,770 poetic verses. Each verse is labeled by its meter, the poet who wrote it, and the age which it was written in. There are 22 meters, 3701 poets and 11 ages: Pre-Islamic, Islamic, Umayyad, Mamluk, Abbasid, Ayyubid, Ottoman, Andalusian, era between Umayyad and Abbasid, Fatimid, and finally the modern age. We are only interested in the 16 classic meters which are attributed to Al-Farahidi, and they comprise the majority of the dataset with a total number around 1.7M verses. It is important to note that the verses diacritic states are not consistent. This means that a verse can carry full, semi diacritics, or it can carry nothing.
Readership of poetry in the U.S. 2012-2017, by ethnicity
statista.com
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Readership of poetry in the U.S. 2012-2017, by ethnicity [Dataset]. https://www.statista.com/statistics/971673/ethnic-groups-poetry-reading/
Explore at:
Dataset updated
Nov 27, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
This statistic shows the share of adults reading poetry in the United States in 2012 and 2017, broken down by ethnicity. The data reveals that the share of surveyed Asian Americans in the U.S. reading poetry more than doubled in five years, increasing from *** percent in 2012 to **** percent in 2017. In fact, there was a significant increase in poetry readership among all surveyed ethnic groups.
Persian Poem Dataset
kaggle.com
zip
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammadreza Mohammadi (2025). Persian Poem Dataset [Dataset]. https://www.kaggle.com/datasets/jigsaw13/persian-poem-dataset/data
Explore at:
zip(59067418 bytes)Available download formats
Dataset updated
Jul 3, 2025
Authors
Mohammadreza Mohammadi
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset is a comprehensive and invaluable collection of over 88,000 Persian poems, representing a rich tapestry of classic and contemporary literature. It serves as an essential corpus for anyone interested in Natural Language Processing, cultural analytics, or the timeless beauty of Persian verse.

The collection features works from 23 of Iran's most celebrated poets, spanning centuries of literary tradition. From the epic narratives of Ferdowsi to the mystical ghazals of Rumi and the profound wisdom of Saadi, this dataset provides a unique opportunity to explore the evolution of the Persian language and its poetic forms.

Key Features: - Vast Collection: With a total of 88000 poems, this dataset offers a substantial body of text for training robust language models and performing in-depth textual analysis. - Legendary Poets: The dataset is anchored by the complete works of literary giants, including: - Ferdowsi: Author of the epic Shahnameh. - Saadi: Master of wisdom and lyrical prose. - Rumi (Molavi): The world-renowned mystic poet. - Hafez: The master of the Persian ghazal. - Nizami: Known for his romantic and epic quintet. - Khayyam: The celebrated poet, mathematician, and astronomer. - Diverse Representation: It includes a wide range of poets from different eras and styles, such as Attar, Sanai, and Vahshi Bafqi, providing a diverse sample of Persian literary history.

Potential Use Cases: - Language Model Training: An ideal resource for training and fine-tuning large language models (LLMs) on the nuances of poetic and classical Persian. - Text Generation: Can be used to build models capable of generating new poems in the style of classic masters. - Computational Stylistics: Researchers can analyze and compare the unique stylistic fingerprints of different poets. - Sentiment Analysis: Explore the emotional arcs and themes prevalent in different poetic traditions. - Historical and Cultural Studies: Uncover cultural trends and philosophical shifts as reflected in the literature of different periods.

This dataset is a tribute to the enduring legacy of Persian poetry and an open invitation for researchers, data scientists, and enthusiasts to derive new insights from this magnificent literary heritage.
Collection of poems by Abai Kunanbayev
kaggle.com
zip
Updated Oct 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arman Zhalgasbayev (2024). Collection of poems by Abai Kunanbayev [Dataset]. https://www.kaggle.com/datasets/armanzhalgasbayev/collection-of-poems-by-abay-kunanbayev
Explore at:
zip(110022 bytes)Available download formats
Dataset updated
Oct 20, 2024
Authors
Arman Zhalgasbayev
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
1) Main info about Abai retrieved from: Abai's wikipedia page

2) 45 Kara Soz (aka "Book of Words") is a seminal work in Kazakh literature, composed of 45 brief parables and philosophical treatises written by written by Abai in the late 19th century, it reflects Abai's deep engagement with the social and cultural issues of his time. Data for 45 kara soz retrieved from: https://abai.kz/post/6

3) Abai's 100 poems collection retrieved from: https://ommli4.edu.kz/korkem-adebiet-qazaqsha/3753/

All data were well-formatted in JSON end data easy-to-use 👍

Format: JSON

Language: Kazakh

Data points: 161
Data from: Nobody Reads You: An Anonymous Poem by Augusto de Campos
scielo.figshare.com
jpeg
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thiago Castañon (2023). Nobody Reads You: An Anonymous Poem by Augusto de Campos [Dataset]. http://doi.org/10.6084/m9.figshare.14306844.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14306844.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Thiago Castañon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Aiming to remove from ostracism a poem that has not yet been read as a poem by anyone in Brazilian criticism, this essay seeks to provide some reflections around oútis, the opening poem in Augusto de Campos’ book Não. Starting from the only exception of Gonzalo Aguilar, who dedicates two valuable paragraphs to this poem, this paper puts forward the hypothesis that the excessively long date range appearing under the poem’s title leads us to read it as a synthesis of the author’s poetics. In a preliminary study of the relationship between poetry and photography, in dialogue with formulations by Benjamin, Eisenstein and Fenollosa on the cinematographic montage and the ideogram technique, we highlight the complexity involved in this “one-word poem” superimposed on the image of shadows on the grass, whose syntax assumes an intricate network of connections between the visible and the readable, Portuguese and Greek, poetry, visual arts and other poems throughout history, with which it enters in a synchronic constellation.
f
Results of the statistical analysis.
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Gao; Jeroen Dera; Annabel D. Nijhof; Roel M. Willems (2023). Results of the statistical analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0225757.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0225757.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Xin Gao; Jeroen Dera; Annabel D. Nijhof; Roel M. Willems
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ANCOVA’s were done for each of the four dependent variables General Liking, Perceived Flow, Perceived Topic Clarity, and Perceived Structure Clarity. The ANCOVA’s had three factors: Poem Difficulty (Low, High), Font Readability (Low, High) and Experiment (Exp. 1, Exp. 2, Exp. 3). The covariates Age and the answer to the question ‘Do you consider yourself to be a poetry lover?’ were added as covariates to explain additional variance. Note that for none of the dependent variables a statistically significant 3-way interaction was observed. We therefore did not analyze the data per experiment. We did observe a Poem Difficulty x Font Readability interaction for three of the four dependent variables, indicating that Font Readability has a differential effect on poems that have low or high difficulty.
i
Grant Giving Statistics for Salvation Poem Foundation
instrumentl.com
Updated Nov 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Grant Giving Statistics for Salvation Poem Foundation [Dataset]. https://www.instrumentl.com/990-report/salvation-poem-foundation-inc
Explore at:
Dataset updated
Nov 8, 2022
Variables measured
Total Assets, Total Giving, Average Grant Amount
Description
Financial overview and grant giving statistics of Salvation Poem Foundation
i
Grant Giving Statistics for Public Poetry
instrumentl.com
Updated Jul 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Grant Giving Statistics for Public Poetry [Dataset]. https://www.instrumentl.com/990-report/public-poetry
Explore at:
Dataset updated
Jul 1, 2021
Variables measured
Total Assets, Total Giving
Description
Financial overview and grant giving statistics of Public Poetry
ChinesePoetryDataset
kaggle.com
zip
Updated Jul 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
qianboao (2022). ChinesePoetryDataset [Dataset]. https://www.kaggle.com/datasets/qianboao/chinesepoetrydataset/code
Explore at:
zip(32326383 bytes)Available download formats
Dataset updated
Jul 8, 2022
Authors
qianboao
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
Chinese Poetry Dataset

If you are interested in Chinese poetry, this is the most not-to-be-missed Chinese NLP dataset for you.

Why do this

Chinese poetry is the treasure of the Chinese nation and the world, and we should pass it on.

Abort this

The dataset covers the first 304,386 poems of nearly 14,000 ancient poets of the Tang and Song dynasties, 11,037 of which are tagged with poetic style.

The dataset is extremely easy to use.

The chinese_poems.txt file covers 304,386 poems, one line per poem.

The poem data with tags is saved via json, you can use it directly with json.load() .The 'tags' describes the poetic style of the poem. The 'lines' tag holds the content of the poem.
i
Grant Giving Statistics for Whatcom Poetry Series
instrumentl.com
Updated Sep 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Grant Giving Statistics for Whatcom Poetry Series [Dataset]. https://www.instrumentl.com/990-report/whatcom-poetry-series
Explore at:
Dataset updated
Sep 17, 2021
Area covered
Whatcom County
Description
Financial overview and grant giving statistics of Whatcom Poetry Series
Z
childPoeDE: A corpus of German Children's Poems for Computational and...
data-staging.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lehmann, Marina; Heumann, Anne; Kuijpers, Moniek M.; Lauer, Gerhard; Lüdtke, Jana (2024). childPoeDE: A corpus of German Children's Poems for Computational and Experimental Studies - Metadata [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7684911
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Freie Universität Berlin
Universität Basel
Johannes Gutenberg-Universität Mainz
Authors
Lehmann, Marina; Heumann, Anne; Kuijpers, Moniek M.; Lauer, Gerhard; Lüdtke, Jana
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The childPoeDE corpus is a collection of 1082 German poems for children created within the CHYLSA project. The poems were taken from anthologies published between 1991 and 2019. This publication includes the poem-level metadata for each poem with information about the author, the poem's length, data on case, punctuation, layout, rhyme, type-token ratio (TTR and MATTR) and lexical density. It also includes token-level metadata, namely word length and position, POS tags in different levels of granularity as well as data on onomatopoeia and sonority. Furthermore, this publication provides a word frequency table and a Python script which was used to extract some of the metadata from the texts (poemtool.py). The childPoeDE corpus does not contain all poems from the anthologies. A list of the poems that have been omitted for different reasons (length, language, typography, ...) can be accessed as well.

Read more about the childPoeDE corpus in our data paper published in the Journal of Open Humanities Data: The ChildPoeDE Corpus: 1082 German Children’s Poems for Computational and Experimental Studies on Poetry Reception.

DFG Schwerpunktprogramm SPP 2207 “Computational Literary Studies“ Online:

https://gepris.dfg.de/gepris/projekt/402743989

https://dfg-spp-cls.github.io/

Subproject: „CHYLSA (Children’s and Youth Literature Sentiment Analysis)“

Online:

https://gepris.dfg.de/gepris/projekt/424250469

https://dfg-spp-cls.github.io/projects_en/2020/01/24/TP-CHYLSA/

Facebook

Twitter

Click to copy link

Link copied

Cite

merve (2021). poetry [Dataset]. https://huggingface.co/datasets/merve/poetry

poetry

merve/poetry

Explore at:

8 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 3, 2021

Authors

merve

Description

Dataset Card for poetry

  Dataset Summary

It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern

  Supported Tasks and Leaderboards

[Needs More Information]

  Languages

[Needs More Information]

  Dataset Structure





  Data Instances

[Needs More Information]

  Data Fields

Has 5 columns:

Content Author Poem name Age Type

  Data Splits

Only training… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.

Clear search

Close search

Google apps

Main menu

poetry

American,British,Indian Emotion poetry dataset

Poetry reading by young people in the United Kingdom (UK) 2017-2025

Poetry Analysis Data

Context

Content

Acknowledgements

Inspiration

Readership of poetry in the U.S. 2012-2017, by gender

Data from: Armenian poems dataset

Adults reading poetry in the U.S. 2002-2017

Dataset: What the Eyes Reveal about (Reading) Poetry

Persian Poetry Corpus from Ganjoor

# Summary

Context

1. The Master Corpus File (ALL_POETS_MERGED.csv):

2. The Poet Summary File (POETS_SUMMARY_WITH_DESCRIPTIONS.csv)

3. Individual Poet Files (e.g., hafez.csv, saadi.csv, ...)

Arabic Poem Comprehensive Dataset (APCD)

Poem Comprehensive Dataset (PCD)

Arabic PCD (APCD)

Description

Readership of poetry in the U.S. 2012-2017, by ethnicity

Persian Poem Dataset

Collection of poems by Abai Kunanbayev

Data from: Nobody Reads You: An Anonymous Poem by Augusto de Campos

Results of the statistical analysis.

Grant Giving Statistics for Salvation Poem Foundation

Grant Giving Statistics for Public Poetry

ChinesePoetryDataset

Chinese Poetry Dataset

Why do this

Abort this

Grant Giving Statistics for Whatcom Poetry Series

childPoeDE: A corpus of German Children's Poems for Computational and...

poetry

merve/poetry