100+ datasets found

h
poetry
huggingface.co
Updated Nov 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
merve (2021). poetry [Dataset]. https://huggingface.co/datasets/merve/poetry
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 3, 2021
Authors
merve
Description
Dataset Card for poetry

Dataset Summary

It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern

Supported Tasks and Leaderboards

[Needs More Information]

Languages

[Needs More Information]

Dataset Structure Data Instances

[Needs More Information]

Data Fields

Has 5 columns:

Content Author Poem name Age Type

Data Splits

Only training… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.
h
gutenberg-poetry-corpus
huggingface.co
Updated Oct 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigLAM: BigScience Libraries, Archives and Museums (2022). gutenberg-poetry-corpus [Dataset]. https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 15, 2022
Dataset authored and provided by
BigLAM: BigScience Libraries, Archives and Museums
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Allison Parrish's Gutenberg Poetry Corpus

This corpus was originally published under the CC0 license by Allison Parrish. Please visit Allison's fantastic accompanying GitHub repository for usage inspiration as well as more information on how the data was mined, how to create your own version of the corpus, and examples of projects using it. This dataset contains 3,085,117 lines of poetry from hundreds of Project Gutenberg books. Each line has a corresponding gutenberg_id (1191… See the full description on the dataset page: https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus.
poem-dataset-3
kaggle.com
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pham Tuyet (2024). poem-dataset-3 [Dataset]. https://www.kaggle.com/datasets/phamtuyet/poem-dataset-3/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Pham Tuyet
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Pham Tuyet

Released under MIT

Contents
Emotion-Categorized Poetry Dataset
kaggle.com
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
rohithkumarsaravanan (2024). Emotion-Categorized Poetry Dataset [Dataset]. https://www.kaggle.com/datasets/rohithkumarsaravanan/emotion-categorized-poetry-dataset/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
rohithkumarsaravanan
Description
Dataset Description This dataset is a curated collection of poems, each categorized by a specific emotion: Anger, Courage, Fear, Joy, Love, Peace, Sad, and Surprise. Each line of poetry captures the depth and essence of human emotions, making this dataset valuable for:

Emotion Analysis in Text: Useful for sentiment analysis and natural language processing (NLP).

Creative Writing and Inspiration: A resource for writers, poets, and artists seeking emotional depth.

Machine Learning and AI Training: Ideal for training models in text classification, emotion detection, and sentiment-based applications.

Dataset Highlights - Structure: Two columns — "Poem" (a single poetic line) and "Emotion" (the associated emotional category). - Versatility: Combines artistic creativity with analytical rigor, suitable for academic, creative, and technical applications. - Volume: A comprehensive and growing repository of poetry that bridges art and machine learning. This dataset is designed to inspire both humans and machines to understand, generate, and respond to the spectrum of human emotions in literature.
h
PoetryFoundationData
huggingface.co
Updated Mar 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahul Es (2023). PoetryFoundationData [Dataset]. https://huggingface.co/datasets/shahules786/PoetryFoundationData
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2023
Authors
Shahul Es
Description
This file contains nearly all poems from the Poetry Foundation Website. Content All poems have a title and author. Most poems are also labeled with the tags as available from the Poetry Foundation Website. The word cloud above shows the most used tags! Inspiration This dataset can be used for a variety of tasks related to poetry writing.
t
Automatic Analysis of Rhythmic Poetry - Dataset - LDM
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Automatic Analysis of Rhythmic Poetry - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/automatic-analysis-of-rhythmic-poetry
Explore at:
Dataset updated
Dec 16, 2024
Description
Automatic analysis of rhythmic poetry with applications to generation and translation.
f
Dataset: What the Eyes Reveal about (Reading) Poetry
datasetcatalog.nlm.nih.gov
figshare.com
Updated Dec 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wallot, Sebastian; Menninghaus, Winfried (2020). Dataset: What the Eyes Reveal about (Reading) Poetry [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000555746
Explore at:
Dataset updated
Dec 16, 2020
Authors
Wallot, Sebastian; Menninghaus, Winfried
Description
dataPOEM.csv The dataPOEM.csv data set contains data on the level of each poem. scoresAes = factor scores of moving, beauty, and melodious ratings. participant = participant number poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) poemIdentity = poem number avgWFreq = average word frequency of poem totalGazeSlopeLineLength totalGazeWordMeanNAByWordLen totalGazeWordMeanNADiff order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) firstFixDurMS_MINFIX_AVG = first fixation duration totalGazeMS_MINFIX_AVG = total gaze durations fixDurMS_MINFIX_NUM = number of fixations sacLenMS_MINFIX_AVG = average saccade length percRegMS_MINFIX_AVG = percentage of regressive eye movements pupilDial_AVG = average pupil dilation blink_NUM_TotalRT = number of blinks relative to total reading time totalReadingTime = total reading time of the poem areaTT = total score of the Aesthetic Responsiveness Assessment questionnaire dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem moving = rating of how moving the poem was beauty = rating of how beautiful the poem was melodious = rating of how melodious the poem was dataROI.csv The dataROI.csv data set contains data on the level of each line within a poem. order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) participant = participant number poemIdentity = poem number lineNr = line number within poem poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) verseEnd = wheter a particular word/line was the last line of a stanza (0 = word/line within a stanza, 1 = last word/line of a stanza) BeginCloseRhyme = whether a particular line’s final word marked the opening or closing of a rhyme pair (1 = opening of rhyme, 2 = closing of rhyme) lastFix = whether a particular line or word was the last one of the poem (0 = word/line within a poem, 1 = last word/line of poem) totalGazeByWordNA = total gaze duration of final word of a line relative to word length gazeByLineLengthNA = total gaze duration of a line relative to line length dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem
w
Dataset of book subjects that contain Poetry : an introduction
workwithdata.com
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of book subjects that contain Poetry : an introduction [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Poetry+%3A+an+introduction&j=1&j0=books
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about book subjects. It has 3 rows and is filtered where the books is Poetry : an introduction. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Data from: Hindi Poem Dataset
kaggle.com
Updated Aug 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tushar Singh (2021). Hindi Poem Dataset [Dataset]. https://www.kaggle.com/tusharsingh1999/hindi-poem-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tushar Singh
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Context

Since I could not find a good dataset online for Hindi poems, I decided to scrape public sites to find some beautiful poems. This dataset is the result of tha scraping process undertook using scrapy module in python.

Content

The dataset can be loaded as a python list of dictionaries by reading JSON line by line and converting each line using json module.

Example: data = [] with open("scraped_all.json", "r") as f: for line in f: data.append(json.loads(line))

Acknowledgements

Dataset is scraped from: https://www.amarujala.com/kavya/kavita.
S
Dataset of imagery and sentiment in frontier poetry throughout history
scidb.cn
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiang Zudong; Li Lin; Jia Zeyu; Li Chengcheng (2025). Dataset of imagery and sentiment in frontier poetry throughout history [Dataset]. http://doi.org/10.57760/sciencedb.25440
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.25440
Dataset updated
May 23, 2025
Dataset provided by
Science Data Bank
Authors
Jiang Zudong; Li Lin; Jia Zeyu; Li Chengcheng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Frontier poetry is one of the most important themes in classical Chinese poetry, focusing on life and scenery in border regions. Imagery is a semantic composite of subjective and objective interactions, representing the objective objects of the poet's subjective emotions. The imagery system of frontier poetry exhibits significant regional convergence and cultural symbolism. This paper constructs a dataset of imagery sentiment in frontier poetry, which includes 40,000 frontier poems from the pre-Qin period to the present. It uses a combination of textual criticism and computational linguistics theories and methods to annotate and proofread the imagery and sentiments expressed in frontier poetry. This dataset not only provides rich research data for the study of frontier poetry, but also provides a macro perspective for in-depth exploration of the evolution of imagery sentiment in poetry.This dataset crawled 42,836 frontier poems from the Internet, covering war poems from the Book of Songs in the pre-Qin period to contemporary new poems, spanning the pre-Qin to modern and contemporary periods, striving to be complete, accurate, and reliable. The crawled data was cleaned and standardized, non-text symbols and redundant format tags were removed, a table of variant characters was established, and ancient texts were used to restore garbled characters through exegesis. Incorrectly identified poems were deleted, and finally, sentence segmentation and error correction were performed, with each sentence separated by commas and periods. In the end, a total of 42,807 high-quality frontier poems were obtained. Based on the collected poem texts, we constructed a data annotation system containing the encoding, author, name, imagery, and sentiment information of the poems. Each poem has a unique number, with the first two digits representing the dynasty number, such as “01” for the pre-Qin period, the middle four digits representing the author number, with poets sorted by their birth and death years, and the last two digits representing the serial number of the work, sorted by the first letter of the title. The imagery data of the poems and lyrics is annotated using a pre-trained model and manual review, while the sentiment is annotated manually.The final dataset consists of 11 CSV tables, with one table for each dynasty, and the files are named after the dynasty. Each data point consists of six parts: code, author, name, text, imagery, and sentiment.
Open Poetry Vision Dataset
universe.roboflow.com
zip
Updated Apr 7, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow (2022). Open Poetry Vision Dataset [Dataset]. https://universe.roboflow.com/roboflow-gw7yv/open-poetry-vision/model/2
Explore at:
zipAvailable download formats
Dataset updated
Apr 7, 2022
Dataset authored and provided by
Roboflowhttps://roboflow.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Text Bounding Boxes
Description
Overview

The Open Poetry Vision dataset is a synthetic dataset created by Roboflow for OCR tasks.

It combines a random image from the Open Images Dataset with text primarily sampled from Gwern's GPT-2 Poetry project. Each image in the dataset contains between 1 and 5 strings in a variety of fonts and colors randomly positioned in the 512x512 canvas. The classes correspond to the font of the text.

Example Image: https://i.imgur.com/sZT516a.png" alt="Example Image">

Use Cases

A common OCR workflow is to use a neural network to isolate text for input into traditional optical character recognition software. This dataset could make a good starting point for an OCR project like business card parsing or automated paper form-processing.

Alternatively, you could try your hand using this as a neural font identification dataset. Nvidia, amongst others, have had success with this task.

Using this Dataset

Use the fork button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.

Version 5 of this dataset (classes_all_text-raw-images) has all classes remapped to be labeled as "text." This was accomplished by using Modify Classes as a preprocessing step.

Version 6 of this dataset (classes_all_text-augmented-FAST) has all classes remapped to be labeled as "text." and was trained with Roboflow's Fast Model.

Version 7 of this dataset (classes_all_text-augmented-ACCURATE) has all classes remapped to be labeled as "text." and was trained with Roboflow's Accurate Model.

Introducing the New Roboflow Train

What to Think About When Choosing Model Sizes

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
Poetry Assessment EEG Dataset 2
openneuro.org
Updated Sep 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soma Chaudhuri; Joydeep Bhattacharya (2025). Poetry Assessment EEG Dataset 2 [Dataset]. http://doi.org/10.18112/openneuro.ds006647.v1.0.1
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds006647.v1.0.1
Dataset updated
Sep 11, 2025
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Soma Chaudhuri; Joydeep Bhattacharya
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Understanding how the brain engages with poetic language is key to advancing empirical research on aesthetic and creative cognition. This experiment involved 64-channel EEG recordings and behavioural ratings from 51 participants who read and evaluated 210 short English-language texts — 70 Haiku (nature-themed), 70 Senryu (emotion-themed), and 70 non-poetic Control texts. Each poem/text was rated on five subjective dimensions: Aesthetic Appeal, Vivid Imagery, Being Moved, Originality, and Creativity — using a 7-point scale.

The full study involved 51 participants, and the data were divided into two BIDS-compliant datasets to ensure technical validation and facilitate upload to OpenNeuro.

Poetry Assessment EEG Dataset 1 contains data from 47 participants whose continuous EEG recordings passed technical validation and were used in the primary analyses.

Poetry Assessment EEG Dataset 2 (this dataset) includes the remaining 4 participants (P105, P141, P142, P146), whose EEG recordings were acquired in segments due to session interruptions and later concatenated during preprocessing. These participants were excluded from the PSD analysis to avoid potential artifacts but are included here for completeness and transparency. In this dataset, the participants.tsv file maps anonymized BIDS IDs (sub-001 to sub-004) to the original participant codes used during data collection (P105–P146), as follows:

sub-001 → P105
sub-002 → P141
sub-003 → P142
sub-004 → P146

Dataset Structure and Navigation: Each subject folder contains four core EEG files:

channels.tsv – EEG channel metadata eeg.json – EEG recording metadata eeg.set – Raw EEG data (EEGLAB format) events.tsv – Event markers aligned with poem presentation

The /code/ directory includes:

Preprocessing.m – MATLAB preprocessing script BioSemi64.loc – 64-channel coordinate file

The /derivatives/ directory contains:

Behavioural_Ratings/ – One .csv file per participant (e.g., P105.csv), including trial-by-trial ratings across five dimensions: Aesthetic Appeal, Vivid Imagery, Emotional Impact (labeled as 'being moved'), Originality, and Creativity.

Psychometric_Responses/ – A single .csv file with demographic and trait-level questionnaire responses per participant, including: PANAS (mood), Openness, Curiosity, VVIQ (visual imagery), AVIQ (auditory imagery), MAAS (mindfulness), and AReA (aesthetic responsiveness).

Also includes questionnaires.pdf with full questionnaire texts and scoring keys

The /stimuli/ directory includes:

All 210 texts used in the experiment: 70 Haiku (nature-themed poetry), 70 Senryu (emotion-themed poetry), 70 Control (non-poetic matched prose).

Block-wise trial assignments for all seven blocks

Resting-state EEG was recorded at the beginning and end of each session. These segments are embedded within the raw EEG files and can be identified using the following trigger codes in events.tsv:

65285, 65286 → Resting state (before experiment); 65287, 65288 → Resting state (after experiment)

Interested users are encouraged to consult Poetry Assessment EEG Dataset 1 to gain a complete understanding of the full experiment and its validated main dataset. All preprocessing steps, event markers, and metadata structures were applied identically across both datasets (Poetry Assessment EEG Dataset 1 and Poetry Assessment EEG Dataset 2), ensuring consistency. This enables users to apply their own quality control pipelines and include these data if desired.

Of note, the anonymized participant IDs (e.g., PXXX) are used consistently across all data modalities, enabling reliable cross-referencing between EEG data, behavioural ratings, and psychometric responses. Data collection took place at the Department of Psychology at Goldsmiths, University of London, UK. The project was approved by the Local Ethics Committee at the Department of Psychology, Goldsmiths University of London. The experiment was conducted in accordance with the Declaration of Helsinki.

All EEG, behavioural, and psychometric data were anonymized. Participant identifiers were coded (P101–P151), and no names, dates of birth, or other direct identifiers are included.
i
Shah Abdul Latif Bhittai Poetry Dataset
ieee-dataport.org
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdul Majid Bhurgri Institute of Language Engineering Hyderabad (2025). Shah Abdul Latif Bhittai Poetry Dataset [Dataset]. https://ieee-dataport.org/documents/shah-abdul-latif-bhittai-poetry-dataset
Explore at:
Dataset updated
Aug 29, 2025
Authors
Abdul Majid Bhurgri Institute of Language Engineering Hyderabad
Description
Tourism
w
Dataset of books called Some of me poetry
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Some of me poetry [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Some+of+me+poetry
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 2 rows and is filtered where the book is Some of me poetry. It features 7 columns including author, publication date, language, and book publisher.
Poetry Assessment EEG Dataset 1
openneuro.org
Updated Sep 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soma Chaudhuri; Joydeep Bhattacharya (2025). Poetry Assessment EEG Dataset 1 [Dataset]. http://doi.org/10.18112/openneuro.ds006648.v1.0.0
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds006648.v1.0.0
Dataset updated
Sep 11, 2025
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Soma Chaudhuri; Joydeep Bhattacharya
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Understanding how the brain engages with poetic language is key to advancing empirical research on aesthetic and creative cognition. This experiment involved 64-channel EEG recordings and behavioural ratings from 51 participants who read and evaluated 210 short English-language texts — 70 Haiku (nature-themed), 70 Senryu (emotion-themed), and 70 non-poetic Control texts. Each poem/text was rated on five subjective dimensions: Aesthetic Appeal, Vivid Imagery, Being Moved, Originality, and Creativity — using a 7-point scale.

The full study involved 51 participants, and the data were divided into two BIDS-compliant datasets to ensure technical validation and facilitate upload to OpenNeuro.

Poetry Assessment EEG Dataset 1 (this dataset) contains data from 47 participants whose continuous EEG recordings passed technical validation and were used in the primary analyses. In this dataset, the participants.tsv file maps anonymized BIDS IDs (sub-001 to sub-047) to the original participant codes used during data collection (P101–P151)

Poetry Assessment EEG Dataset 2 includes the remaining 4 participants (P105, P141, P142, P146), whose EEG recordings were acquired in segments due to session interruptions and later concatenated during preprocessing. These participants were excluded from the PSD analysis to avoid potential artifacts but are included here for completeness and transparency.

Dataset Structure and Navigation: Each subject folder contains four core EEG files:

channels.tsv – EEG channel metadata eeg.json – EEG recording metadata eeg.set – Raw EEG data (EEGLAB format) events.tsv – Event markers aligned with poem presentation

The /code/ directory includes:

Preprocessing.m – MATLAB preprocessing script BioSemi64.loc – 64-channel coordinate file

The /derivatives/ directory contains:

Behavioural_Ratings/ – One .csv file per participant (e.g., P101.csv), including trial-by-trial ratings across five dimensions: Aesthetic Appeal, Vivid Imagery, Emotional Impact (labeled as 'being moved'), Originality, and Creativity.

Psychometric_Responses/ – A single .csv file with demographic and trait-level questionnaire responses per participant, including: PANAS (mood), Openness, Curiosity, VVIQ (visual imagery), AVIQ (auditory imagery), MAAS (mindfulness), and AReA (aesthetic responsiveness).

Also includes questionnaires.pdf with full questionnaire texts and scoring keys

The /stimuli/ directory includes:

All 210 texts used in the experiment: 70 Haiku (nature-themed poetry), 70 Senryu (emotion-themed poetry), 70 Control (non-poetic matched prose).

Block-wise trial assignments for all seven blocks

Resting-state EEG was recorded at the beginning and end of each session. These segments are embedded within the raw EEG files and can be identified using the following trigger codes in events.tsv:

65285, 65286 → Resting state (before experiment); 65287, 65288 → Resting state (after experiment)

Interested users may also consult Poetry Assessment EEG Dataset 2 to access recordings from the remaining 4 participants excluded from the main analyses. All preprocessing steps, event markers, and metadata structures were applied identically across both datasets (Poetry Assessment EEG Dataset 1 and Poetry Assessment EEG Dataset 2), ensuring consistency. This enables users to apply their own quality control pipelines and include these data if desired.

Of note, the anonymized participant IDs (e.g., PXXX) are used consistently across all data modalities, enabling reliable cross-referencing between EEG data, behavioural ratings, and psychometric responses. Data collection took place at the Department of Psychology at Goldsmiths, University of London, UK. The project was approved by the Local Ethics Committee at the Department of Psychology, Goldsmiths University of London. The experiment was conducted in accordance with the Declaration of Helsinki.

All EEG, behavioural, and psychometric data were anonymized. Participant identifiers were coded (P101–P151), and no names, dates of birth, or other direct identifiers are included.
Data from: Hindi poem Dataset
kaggle.com
Updated Oct 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dnyanesh Walwadkar (2024). Hindi poem Dataset [Dataset]. https://www.kaggle.com/datasets/dnyaneshwalwadkar/hindi-poem-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dnyanesh Walwadkar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Dnyanesh Walwadkar

Released under Apache 2.0

Contents
t
Chinese Poetry - Dataset - LDM
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Chinese Poetry - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/chinese-poetry
Explore at:
Dataset updated
Dec 2, 2024
Area covered
China
Description
The Chinese Poetry dataset is a dataset of Chinese poems used for language modeling.
H
20C Poetry
dataverse.harvard.edu
search.dataone.org
Updated Aug 7, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Piper (2018). 20C Poetry [Dataset]. http://doi.org/10.7910/DVN/YVN6IW
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/YVN6IW
Dataset updated
Aug 7, 2018
Dataset provided by
Harvard Dataverse
Authors
Andrew Piper
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a table of word counts for a collection of 75,297 English-language poems.
w
Dataset of books called The complete poetry of Catullus
workwithdata.com
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called The complete poetry of Catullus [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+complete+poetry+of+Catullus
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is The complete poetry of Catullus. It features 7 columns including author, publication date, language, and book publisher.
h
prompt-poem-dataset-20240921_004141
huggingface.co
Updated Sep 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vida Tayebati (2024). prompt-poem-dataset-20240921_004141 [Dataset]. https://huggingface.co/datasets/VidaEdco/prompt-poem-dataset-20240921_004141
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2024
Authors
Vida Tayebati
Description
VidaEdco/prompt-poem-dataset-20240921_004141 dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

merve (2021). poetry [Dataset]. https://huggingface.co/datasets/merve/poetry

poetry

merve/poetry

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 3, 2021

Authors

merve

Description

Dataset Card for poetry

  Dataset Summary

It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern

  Supported Tasks and Leaderboards

[Needs More Information]

  Languages

[Needs More Information]

  Dataset Structure





  Data Instances

[Needs More Information]

  Data Fields

Has 5 columns:

Content Author Poem name Age Type

  Data Splits

Only training… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.

Clear search

Close search

Google apps

Main menu

poetry

gutenberg-poetry-corpus

poem-dataset-3

Dataset

Contents

Emotion-Categorized Poetry Dataset

PoetryFoundationData

Automatic Analysis of Rhythmic Poetry - Dataset - LDM

Dataset: What the Eyes Reveal about (Reading) Poetry

Dataset of book subjects that contain Poetry : an introduction

Data from: Hindi Poem Dataset

Context

Content

Acknowledgements

Dataset of imagery and sentiment in frontier poetry throughout history

Open Poetry Vision Dataset

Overview

Use Cases

Using this Dataset

Version 5 of this dataset (classes_all_text-raw-images) has all classes remapped to be labeled as "text." This was accomplished by using Modify Classes as a preprocessing step.

Version 6 of this dataset (classes_all_text-augmented-FAST) has all classes remapped to be labeled as "text." and was trained with Roboflow's Fast Model.

Version 7 of this dataset (classes_all_text-augmented-ACCURATE) has all classes remapped to be labeled as "text." and was trained with Roboflow's Accurate Model.

About Roboflow

Poetry Assessment EEG Dataset 2

Shah Abdul Latif Bhittai Poetry Dataset

Dataset of books called Some of me poetry

Poetry Assessment EEG Dataset 1

Data from: Hindi poem Dataset

Dataset

Contents

Chinese Poetry - Dataset - LDM

20C Poetry

Dataset of books called The complete poetry of Catullus

prompt-poem-dataset-20240921_004141

poetry

merve/poetry