Facebook
TwitterDataset Card for poetry
Dataset Summary
It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern
Supported Tasks and Leaderboards
[Needs More Information]
Languages
[Needs More Information]
Dataset Structure
Data Instances
[Needs More Information]
Data Fields
Has 5 columns:
Content Author Poem name Age Type
Data Splits
Only training… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.
Facebook
TwitterCapturing emotion from reviews and tweets is a well studied task. reviews and tweets are not abundant with emotions, where poetry is a text which is abundant with emotions, so capturing emotions from poetry is an interesting task. In this regard we have collected poems from Poemhunter.com(we thank the website owners) and created a dataset and manually annotated the poems with 5 emotions namely Fear, Sad, Surprise, Happy and Angry. This dataset comprise of 3 files 1. ABIEMO: American, British and Indian poets poems 2. CAPEMO: Augmented Poems to resolve class imbalance problem using NLPAUG library(we thank the library developers) 3. BAPEMO: Extended Augmented poems to resolve class imbalance problem
along with emotion country of poem is also assigned. We can use this dataset to perform poet style analysis, emotion analysis country wise differences in poetry etc.
Facebook
TwitterIn 2025, just under 18 percent of school children aged eight to 18 years old reported reading poems in print form in their free time, down slightly from previous years. Song lyrics were the most popular option for kids and young people wanting to read outside of school, with more than 60 percent engaging with song lyrics on-screen.
Facebook
TwitterThis contains a set of poems required for analysis for a poem
selection of poems from poetryfoundation .com
Poems from Poetryfoundation.com
It is interesting to use this data, only for the purpose of pure research on the capability of AI & ML to classify poems.
Facebook
TwitterThis statistic shows the share of adults reading poetry in the United States in 2012 and 2017, broken down by gender. The data reveals that the share of surveyed women in the U.S. reading poetry increased significantly in five years, growing from ***** percent in 2012 to **** percent in 2017.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
📜 Dataset Description: Armenian Poems Collection
This dataset contains a curated collection of Armenian poems written in the beautiful and expressive Armenian language. The goal of this dataset is to support research and creative applications in natural language processing (NLP), poetry generation, language modeling, and cultural preservation. 📝 Contents
Each entry in the dataset includes:
Վերնագիր (Title): The name or theme of the poem
Տեքստ (Text): The full content of the poem
The poems vary in length, style, and subject matter, reflecting the richness of Armenian literary tradition and contemporary expression. 🌍 Use Cases
Language modeling for low-resource languages
Poetry or text generation models
Sentiment or stylistic analysis
Educational or linguistic research
Cultural and literary exploration
⚖️ License
This dataset is licensed under CC BY 4.0, which allows for use, sharing, and modification with proper attribution. 🙏 Credits
This dataset was compiled and shared to promote the beauty and uniqueness of Armenian poetry in the machine learning and NLP communities.
Facebook
TwitterThis statistic shows the share of adults reading poetry in the United States from 2002 to 2017. The data shows that 11.7 percent of surveyed U.S. adults were reading poetry in 2017, up from 6.7 percent five years previously.
Facebook
TwitterdataPOEM.csv The dataPOEM.csv data set contains data on the level of each poem. scoresAes = factor scores of moving, beauty, and melodious ratings. participant = participant number poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) poemIdentity = poem number avgWFreq = average word frequency of poem totalGazeSlopeLineLength totalGazeWordMeanNAByWordLen totalGazeWordMeanNADiff order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) firstFixDurMS_MINFIX_AVG = first fixation duration totalGazeMS_MINFIX_AVG = total gaze durations fixDurMS_MINFIX_NUM = number of fixations sacLenMS_MINFIX_AVG = average saccade length percRegMS_MINFIX_AVG = percentage of regressive eye movements pupilDial_AVG = average pupil dilation blink_NUM_TotalRT = number of blinks relative to total reading time totalReadingTime = total reading time of the poem areaTT = total score of the Aesthetic Responsiveness Assessment questionnaire dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem moving = rating of how moving the poem was beauty = rating of how beautiful the poem was melodious = rating of how melodious the poem was dataROI.csv The dataROI.csv data set contains data on the level of each line within a poem. order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) participant = participant number poemIdentity = poem number lineNr = line number within poem poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) verseEnd = wheter a particular word/line was the last line of a stanza (0 = word/line within a stanza, 1 = last word/line of a stanza) BeginCloseRhyme = whether a particular line’s final word marked the opening or closing of a rhyme pair (1 = opening of rhyme, 2 = closing of rhyme) lastFix = whether a particular line or word was the last one of the poem (0 = word/line within a poem, 1 = last word/line of poem) totalGazeByWordNA = total gaze duration of final word of a line relative to word length gazeByLineLengthNA = total gaze duration of a line relative to line length dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset offers a clean, structured, and ready-to-use collection of classical and modern Persian poetry, sourced from the well-known Ganjoor digital library. It is built for those who want fast access to high-quality Persian text for NLP, data analysis, or digital humanities projects.
The dataset includes: - ALL_POEMS_MERGED: A single master file with all poems - POETS_SUMMARY: a summary file containing biographies and key metadata about poets. - individual files for over 200 poets
Ganjoor is the largest open digital archive of Persian poetry. Its mission is to preserve and freely share the literary heritage of the Persian language and culture.
While Ganjoor provides its database publicly, it is not formatted for immediate use in data science workflows. This dataset provides a cleaned, analysis-ready version of their poetry corpus—making it easier for anyone working in NLP or computational literature to engage with this rich source of text.
This is the primary file for most analysis, containing every processed verse from all poets in a single, convenient table.
This file provides high-level information and metadata for each poet included in the dataset.
For convenience, a dedicated CSV file is provided for each poet. These files follow the same structure as the ALL_POETS_MERGED.csv file.
*This dataset would not be possible without the monumental work of the Ganjoor team. All credit for the underlying poetry, biographies, and metadata belongs to Ganjoor.net.
Facebook
TwitterThis data I get from Here
The Arabic dataset is scraped mainly from الموسوعة الشعرية and الديوان. After merging both, the total number of verses is 1,831,770 poetic verses. Each verse is labeled by its meter, the poet who wrote it, and the age which it was written in. There are 22 meters, 3701 poets and 11 ages: Pre-Islamic, Islamic, Umayyad, Mamluk, Abbasid, Ayyubid, Ottoman, Andalusian, era between Umayyad and Abbasid, Fatimid, and finally the modern age. We are only interested in the 16 classic meters which are attributed to Al-Farahidi, and they comprise the majority of the dataset with a total number around 1.7M verses. It is important to note that the verses diacritic states are not consistent. This means that a verse can carry full, semi diacritics, or it can carry nothing.
Facebook
TwitterThis statistic shows the share of adults reading poetry in the United States in 2012 and 2017, broken down by ethnicity. The data reveals that the share of surveyed Asian Americans in the U.S. reading poetry more than doubled in five years, increasing from *** percent in 2012 to **** percent in 2017. In fact, there was a significant increase in poetry readership among all surveyed ethnic groups.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset is a comprehensive and invaluable collection of over 88,000 Persian poems, representing a rich tapestry of classic and contemporary literature. It serves as an essential corpus for anyone interested in Natural Language Processing, cultural analytics, or the timeless beauty of Persian verse.
The collection features works from 23 of Iran's most celebrated poets, spanning centuries of literary tradition. From the epic narratives of Ferdowsi to the mystical ghazals of Rumi and the profound wisdom of Saadi, this dataset provides a unique opportunity to explore the evolution of the Persian language and its poetic forms.
Key Features: - Vast Collection: With a total of 88000 poems, this dataset offers a substantial body of text for training robust language models and performing in-depth textual analysis. - Legendary Poets: The dataset is anchored by the complete works of literary giants, including: - Ferdowsi: Author of the epic Shahnameh. - Saadi: Master of wisdom and lyrical prose. - Rumi (Molavi): The world-renowned mystic poet. - Hafez: The master of the Persian ghazal. - Nizami: Known for his romantic and epic quintet. - Khayyam: The celebrated poet, mathematician, and astronomer. - Diverse Representation: It includes a wide range of poets from different eras and styles, such as Attar, Sanai, and Vahshi Bafqi, providing a diverse sample of Persian literary history.
Potential Use Cases: - Language Model Training: An ideal resource for training and fine-tuning large language models (LLMs) on the nuances of poetic and classical Persian. - Text Generation: Can be used to build models capable of generating new poems in the style of classic masters. - Computational Stylistics: Researchers can analyze and compare the unique stylistic fingerprints of different poets. - Sentiment Analysis: Explore the emotional arcs and themes prevalent in different poetic traditions. - Historical and Cultural Studies: Uncover cultural trends and philosophical shifts as reflected in the literature of different periods.
This dataset is a tribute to the enduring legacy of Persian poetry and an open invitation for researchers, data scientists, and enthusiasts to derive new insights from this magnificent literary heritage.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
1) Main info about Abai retrieved from: Abai's wikipedia page
2) 45 Kara Soz (aka "Book of Words") is a seminal work in Kazakh literature, composed of 45 brief parables and philosophical treatises written by written by Abai in the late 19th century, it reflects Abai's deep engagement with the social and cultural issues of his time. Data for 45 kara soz retrieved from: https://abai.kz/post/6
3) Abai's 100 poems collection retrieved from: https://ommli4.edu.kz/korkem-adebiet-qazaqsha/3753/
All data were well-formatted in JSON end data easy-to-use 👍
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract Aiming to remove from ostracism a poem that has not yet been read as a poem by anyone in Brazilian criticism, this essay seeks to provide some reflections around oútis, the opening poem in Augusto de Campos’ book Não. Starting from the only exception of Gonzalo Aguilar, who dedicates two valuable paragraphs to this poem, this paper puts forward the hypothesis that the excessively long date range appearing under the poem’s title leads us to read it as a synthesis of the author’s poetics. In a preliminary study of the relationship between poetry and photography, in dialogue with formulations by Benjamin, Eisenstein and Fenollosa on the cinematographic montage and the ideogram technique, we highlight the complexity involved in this “one-word poem” superimposed on the image of shadows on the grass, whose syntax assumes an intricate network of connections between the visible and the readable, Portuguese and Greek, poetry, visual arts and other poems throughout history, with which it enters in a synchronic constellation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ANCOVA’s were done for each of the four dependent variables General Liking, Perceived Flow, Perceived Topic Clarity, and Perceived Structure Clarity. The ANCOVA’s had three factors: Poem Difficulty (Low, High), Font Readability (Low, High) and Experiment (Exp. 1, Exp. 2, Exp. 3). The covariates Age and the answer to the question ‘Do you consider yourself to be a poetry lover?’ were added as covariates to explain additional variance. Note that for none of the dependent variables a statistically significant 3-way interaction was observed. We therefore did not analyze the data per experiment. We did observe a Poem Difficulty x Font Readability interaction for three of the four dependent variables, indicating that Font Readability has a differential effect on poems that have low or high difficulty.
Facebook
TwitterFinancial overview and grant giving statistics of Salvation Poem Foundation
Facebook
TwitterFinancial overview and grant giving statistics of Public Poetry
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
If you are interested in Chinese poetry, this is the most not-to-be-missed Chinese NLP dataset for you.
Chinese poetry is the treasure of the Chinese nation and the world, and we should pass it on.
The dataset covers the first 304,386 poems of nearly 14,000 ancient poets of the Tang and Song dynasties, 11,037 of which are tagged with poetic style.
The dataset is extremely easy to use.
The chinese_poems.txt file covers 304,386 poems, one line per poem.
The poem data with tags is saved via json, you can use it directly with json.load() .The 'tags' describes the poetic style of the poem. The 'lines' tag holds the content of the poem.
Facebook
TwitterFinancial overview and grant giving statistics of Whatcom Poetry Series
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The childPoeDE corpus is a collection of 1082 German poems for children created within the CHYLSA project. The poems were taken from anthologies published between 1991 and 2019. This publication includes the poem-level metadata for each poem with information about the author, the poem's length, data on case, punctuation, layout, rhyme, type-token ratio (TTR and MATTR) and lexical density. It also includes token-level metadata, namely word length and position, POS tags in different levels of granularity as well as data on onomatopoeia and sonority. Furthermore, this publication provides a word frequency table and a Python script which was used to extract some of the metadata from the texts (poemtool.py). The childPoeDE corpus does not contain all poems from the anthologies. A list of the poems that have been omitted for different reasons (length, language, typography, ...) can be accessed as well.
Read more about the childPoeDE corpus in our data paper published in the Journal of Open Humanities Data: The ChildPoeDE Corpus: 1082 German Children’s Poems for Computational and Experimental Studies on Poetry Reception.
DFG Schwerpunktprogramm SPP 2207 “Computational Literary Studies“ Online:
https://gepris.dfg.de/gepris/projekt/402743989
https://dfg-spp-cls.github.io/
Subproject: „CHYLSA (Children’s and Youth Literature Sentiment Analysis)“
Online:
https://gepris.dfg.de/gepris/projekt/424250469
https://dfg-spp-cls.github.io/projects_en/2020/01/24/TP-CHYLSA/
Facebook
TwitterDataset Card for poetry
Dataset Summary
It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern
Supported Tasks and Leaderboards
[Needs More Information]
Languages
[Needs More Information]
Dataset Structure
Data Instances
[Needs More Information]
Data Fields
Has 5 columns:
Content Author Poem name Age Type
Data Splits
Only training… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.