100+ datasets found
  1. h

    poetry

    • huggingface.co
    Updated Nov 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    merve (2021). poetry [Dataset]. https://huggingface.co/datasets/merve/poetry
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2021
    Authors
    merve
    Description

    Dataset Card for poetry

      Dataset Summary
    

    It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern

      Supported Tasks and Leaderboards
    

    [Needs More Information]

      Languages
    

    [Needs More Information]

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    [Needs More Information]

      Data Fields
    

    Has 5 columns:

    Content Author Poem name Age Type

      Data Splits
    

    Only training… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.

  2. h

    gutenberg-poetry-corpus

    • huggingface.co
    Updated Oct 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigLAM: BigScience Libraries, Archives and Museums (2022). gutenberg-poetry-corpus [Dataset]. https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2022
    Dataset authored and provided by
    BigLAM: BigScience Libraries, Archives and Museums
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Allison Parrish's Gutenberg Poetry Corpus

    This corpus was originally published under the CC0 license by Allison Parrish. Please visit Allison's fantastic accompanying GitHub repository for usage inspiration as well as more information on how the data was mined, how to create your own version of the corpus, and examples of projects using it. This dataset contains 3,085,117 lines of poetry from hundreds of Project Gutenberg books. Each line has a corresponding gutenberg_id (1191… See the full description on the dataset page: https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus.

  3. poem-dataset-3

    • kaggle.com
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pham Tuyet (2024). poem-dataset-3 [Dataset]. https://www.kaggle.com/datasets/phamtuyet/poem-dataset-3/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Pham Tuyet
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Pham Tuyet

    Released under MIT

    Contents

  4. Emotion-Categorized Poetry Dataset

    • kaggle.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    rohithkumarsaravanan (2024). Emotion-Categorized Poetry Dataset [Dataset]. https://www.kaggle.com/datasets/rohithkumarsaravanan/emotion-categorized-poetry-dataset/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    rohithkumarsaravanan
    Description

    Dataset Description This dataset is a curated collection of poems, each categorized by a specific emotion: Anger, Courage, Fear, Joy, Love, Peace, Sad, and Surprise. Each line of poetry captures the depth and essence of human emotions, making this dataset valuable for:

    • Emotion Analysis in Text: Useful for sentiment analysis and natural language processing (NLP).
    • Creative Writing and Inspiration: A resource for writers, poets, and artists seeking emotional depth.
    • Machine Learning and AI Training: Ideal for training models in text classification, emotion detection, and sentiment-based applications.

    Dataset Highlights - Structure: Two columns — "Poem" (a single poetic line) and "Emotion" (the associated emotional category). - Versatility: Combines artistic creativity with analytical rigor, suitable for academic, creative, and technical applications. - Volume: A comprehensive and growing repository of poetry that bridges art and machine learning. This dataset is designed to inspire both humans and machines to understand, generate, and respond to the spectrum of human emotions in literature.

  5. h

    PoetryFoundationData

    • huggingface.co
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahul Es (2023). PoetryFoundationData [Dataset]. https://huggingface.co/datasets/shahules786/PoetryFoundationData
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2023
    Authors
    Shahul Es
    Description

    This file contains nearly all poems from the Poetry Foundation Website. Content All poems have a title and author. Most poems are also labeled with the tags as available from the Poetry Foundation Website. The word cloud above shows the most used tags! Inspiration This dataset can be used for a variety of tasks related to poetry writing.

  6. t

    Automatic Analysis of Rhythmic Poetry - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Automatic Analysis of Rhythmic Poetry - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/automatic-analysis-of-rhythmic-poetry
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    Automatic analysis of rhythmic poetry with applications to generation and translation.

  7. f

    Dataset: What the Eyes Reveal about (Reading) Poetry

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Dec 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wallot, Sebastian; Menninghaus, Winfried (2020). Dataset: What the Eyes Reveal about (Reading) Poetry [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000555746
    Explore at:
    Dataset updated
    Dec 16, 2020
    Authors
    Wallot, Sebastian; Menninghaus, Winfried
    Description

    dataPOEM.csv The dataPOEM.csv data set contains data on the level of each poem. scoresAes = factor scores of moving, beauty, and melodious ratings. participant = participant number poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) poemIdentity = poem number avgWFreq = average word frequency of poem totalGazeSlopeLineLength totalGazeWordMeanNAByWordLen totalGazeWordMeanNADiff order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) firstFixDurMS_MINFIX_AVG = first fixation duration totalGazeMS_MINFIX_AVG = total gaze durations fixDurMS_MINFIX_NUM = number of fixations sacLenMS_MINFIX_AVG = average saccade length percRegMS_MINFIX_AVG = percentage of regressive eye movements pupilDial_AVG = average pupil dilation blink_NUM_TotalRT = number of blinks relative to total reading time totalReadingTime = total reading time of the poem areaTT = total score of the Aesthetic Responsiveness Assessment questionnaire dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem moving = rating of how moving the poem was beauty = rating of how beautiful the poem was melodious = rating of how melodious the poem was dataROI.csv The dataROI.csv data set contains data on the level of each line within a poem. order = order of presentation (1 = from A to D, 2 = from D to A; between participant factor) participant = participant number poemIdentity = poem number lineNr = line number within poem poemVersion = Version of poem presented: (A = original poem with rhyme and meter, B = poem variant with only rhyme, C = poem variant with only meter, D = poem variant without rhyme and meter) verseEnd = wheter a particular word/line was the last line of a stanza (0 = word/line within a stanza, 1 = last word/line of a stanza) BeginCloseRhyme = whether a particular line’s final word marked the opening or closing of a rhyme pair (1 = opening of rhyme, 2 = closing of rhyme) lastFix = whether a particular line or word was the last one of the poem (0 = word/line within a poem, 1 = last word/line of poem) totalGazeByWordNA = total gaze duration of final word of a line relative to word length gazeByLineLengthNA = total gaze duration of a line relative to line length dataIntegrity = percentage of valid position measurements by eye tracker during reading of a poem

  8. w

    Dataset of book subjects that contain Poetry : an introduction

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain Poetry : an introduction [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Poetry+%3A+an+introduction&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 3 rows and is filtered where the books is Poetry : an introduction. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  9. Data from: Hindi Poem Dataset

    • kaggle.com
    Updated Aug 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tushar Singh (2021). Hindi Poem Dataset [Dataset]. https://www.kaggle.com/tusharsingh1999/hindi-poem-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tushar Singh
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Context

    Since I could not find a good dataset online for Hindi poems, I decided to scrape public sites to find some beautiful poems. This dataset is the result of tha scraping process undertook using scrapy module in python.

    Content

    The dataset can be loaded as a python list of dictionaries by reading JSON line by line and converting each line using json module.

    Example:
    data = []
        with open("scraped_all.json", "r") as f:
          for line in f:
            data.append(json.loads(line))
    

    Acknowledgements

    Dataset is scraped from: https://www.amarujala.com/kavya/kavita.

  10. S

    Dataset of imagery and sentiment in frontier poetry throughout history

    • scidb.cn
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiang Zudong; Li Lin; Jia Zeyu; Li Chengcheng (2025). Dataset of imagery and sentiment in frontier poetry throughout history [Dataset]. http://doi.org/10.57760/sciencedb.25440
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 23, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Jiang Zudong; Li Lin; Jia Zeyu; Li Chengcheng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Frontier poetry is one of the most important themes in classical Chinese poetry, focusing on life and scenery in border regions. Imagery is a semantic composite of subjective and objective interactions, representing the objective objects of the poet's subjective emotions. The imagery system of frontier poetry exhibits significant regional convergence and cultural symbolism. This paper constructs a dataset of imagery sentiment in frontier poetry, which includes 40,000 frontier poems from the pre-Qin period to the present. It uses a combination of textual criticism and computational linguistics theories and methods to annotate and proofread the imagery and sentiments expressed in frontier poetry. This dataset not only provides rich research data for the study of frontier poetry, but also provides a macro perspective for in-depth exploration of the evolution of imagery sentiment in poetry.This dataset crawled 42,836 frontier poems from the Internet, covering war poems from the Book of Songs in the pre-Qin period to contemporary new poems, spanning the pre-Qin to modern and contemporary periods, striving to be complete, accurate, and reliable. The crawled data was cleaned and standardized, non-text symbols and redundant format tags were removed, a table of variant characters was established, and ancient texts were used to restore garbled characters through exegesis. Incorrectly identified poems were deleted, and finally, sentence segmentation and error correction were performed, with each sentence separated by commas and periods. In the end, a total of 42,807 high-quality frontier poems were obtained. Based on the collected poem texts, we constructed a data annotation system containing the encoding, author, name, imagery, and sentiment information of the poems. Each poem has a unique number, with the first two digits representing the dynasty number, such as “01” for the pre-Qin period, the middle four digits representing the author number, with poets sorted by their birth and death years, and the last two digits representing the serial number of the work, sorted by the first letter of the title. The imagery data of the poems and lyrics is annotated using a pre-trained model and manual review, while the sentiment is annotated manually.The final dataset consists of 11 CSV tables, with one table for each dynasty, and the files are named after the dynasty. Each data point consists of six parts: code, author, name, text, imagery, and sentiment.

  11. Open Poetry Vision Dataset

    • universe.roboflow.com
    zip
    Updated Apr 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow (2022). Open Poetry Vision Dataset [Dataset]. https://universe.roboflow.com/roboflow-gw7yv/open-poetry-vision/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 7, 2022
    Dataset authored and provided by
    Roboflowhttps://roboflow.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Text Bounding Boxes
    Description

    Overview

    The Open Poetry Vision dataset is a synthetic dataset created by Roboflow for OCR tasks.

    It combines a random image from the Open Images Dataset with text primarily sampled from Gwern's GPT-2 Poetry project. Each image in the dataset contains between 1 and 5 strings in a variety of fonts and colors randomly positioned in the 512x512 canvas. The classes correspond to the font of the text.

    Example Image: https://i.imgur.com/sZT516a.png" alt="Example Image">

    Use Cases

    A common OCR workflow is to use a neural network to isolate text for input into traditional optical character recognition software. This dataset could make a good starting point for an OCR project like business card parsing or automated paper form-processing.

    Alternatively, you could try your hand using this as a neural font identification dataset. Nvidia, amongst others, have had success with this task.

    Using this Dataset

    Use the fork button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.

    Version 5 of this dataset (classes_all_text-raw-images) has all classes remapped to be labeled as "text." This was accomplished by using Modify Classes as a preprocessing step.

    Version 6 of this dataset (classes_all_text-augmented-FAST) has all classes remapped to be labeled as "text." and was trained with Roboflow's Fast Model.

    Version 7 of this dataset (classes_all_text-augmented-ACCURATE) has all classes remapped to be labeled as "text." and was trained with Roboflow's Accurate Model.

    About Roboflow

    Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

    Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

    Roboflow Workmark

  12. Poetry Assessment EEG Dataset 2

    • openneuro.org
    Updated Sep 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soma Chaudhuri; Joydeep Bhattacharya (2025). Poetry Assessment EEG Dataset 2 [Dataset]. http://doi.org/10.18112/openneuro.ds006647.v1.0.1
    Explore at:
    Dataset updated
    Sep 11, 2025
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Soma Chaudhuri; Joydeep Bhattacharya
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Understanding how the brain engages with poetic language is key to advancing empirical research on aesthetic and creative cognition. This experiment involved 64-channel EEG recordings and behavioural ratings from 51 participants who read and evaluated 210 short English-language texts — 70 Haiku (nature-themed), 70 Senryu (emotion-themed), and 70 non-poetic Control texts. Each poem/text was rated on five subjective dimensions: Aesthetic Appeal, Vivid Imagery, Being Moved, Originality, and Creativity — using a 7-point scale.

    The full study involved 51 participants, and the data were divided into two BIDS-compliant datasets to ensure technical validation and facilitate upload to OpenNeuro.

    Poetry Assessment EEG Dataset 1 contains data from 47 participants whose continuous EEG recordings passed technical validation and were used in the primary analyses.

    Poetry Assessment EEG Dataset 2 (this dataset) includes the remaining 4 participants (P105, P141, P142, P146), whose EEG recordings were acquired in segments due to session interruptions and later concatenated during preprocessing. These participants were excluded from the PSD analysis to avoid potential artifacts but are included here for completeness and transparency. In this dataset, the participants.tsv file maps anonymized BIDS IDs (sub-001 to sub-004) to the original participant codes used during data collection (P105–P146), as follows:

    sub-001 → P105
    sub-002 → P141
    sub-003 → P142
    sub-004 → P146

    Dataset Structure and Navigation: Each subject folder contains four core EEG files:

    channels.tsv – EEG channel metadata eeg.json – EEG recording metadata eeg.set – Raw EEG data (EEGLAB format) events.tsv – Event markers aligned with poem presentation

    The /code/ directory includes:

    Preprocessing.m – MATLAB preprocessing script BioSemi64.loc – 64-channel coordinate file

    The /derivatives/ directory contains:

    Behavioural_Ratings/ – One .csv file per participant (e.g., P105.csv), including trial-by-trial ratings across five dimensions: Aesthetic Appeal, Vivid Imagery, Emotional Impact (labeled as 'being moved'), Originality, and Creativity.

    Psychometric_Responses/ – A single .csv file with demographic and trait-level questionnaire responses per participant, including: PANAS (mood), Openness, Curiosity, VVIQ (visual imagery), AVIQ (auditory imagery), MAAS (mindfulness), and AReA (aesthetic responsiveness).

    Also includes questionnaires.pdf with full questionnaire texts and scoring keys

    The /stimuli/ directory includes:

    All 210 texts used in the experiment: 70 Haiku (nature-themed poetry), 70 Senryu (emotion-themed poetry), 70 Control (non-poetic matched prose).

    Block-wise trial assignments for all seven blocks

    Resting-state EEG was recorded at the beginning and end of each session. These segments are embedded within the raw EEG files and can be identified using the following trigger codes in events.tsv:

    65285, 65286 → Resting state (before experiment); 65287, 65288 → Resting state (after experiment)

    Interested users are encouraged to consult Poetry Assessment EEG Dataset 1 to gain a complete understanding of the full experiment and its validated main dataset. All preprocessing steps, event markers, and metadata structures were applied identically across both datasets (Poetry Assessment EEG Dataset 1 and Poetry Assessment EEG Dataset 2), ensuring consistency. This enables users to apply their own quality control pipelines and include these data if desired.

    Of note, the anonymized participant IDs (e.g., PXXX) are used consistently across all data modalities, enabling reliable cross-referencing between EEG data, behavioural ratings, and psychometric responses. Data collection took place at the Department of Psychology at Goldsmiths, University of London, UK. The project was approved by the Local Ethics Committee at the Department of Psychology, Goldsmiths University of London. The experiment was conducted in accordance with the Declaration of Helsinki.

    All EEG, behavioural, and psychometric data were anonymized. Participant identifiers were coded (P101–P151), and no names, dates of birth, or other direct identifiers are included.

  13. i

    Shah Abdul Latif Bhittai Poetry Dataset

    • ieee-dataport.org
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdul Majid Bhurgri Institute of Language Engineering Hyderabad (2025). Shah Abdul Latif Bhittai Poetry Dataset [Dataset]. https://ieee-dataport.org/documents/shah-abdul-latif-bhittai-poetry-dataset
    Explore at:
    Dataset updated
    Aug 29, 2025
    Authors
    Abdul Majid Bhurgri Institute of Language Engineering Hyderabad
    Description

    Tourism

  14. w

    Dataset of books called Some of me poetry

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Some of me poetry [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Some+of+me+poetry
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 2 rows and is filtered where the book is Some of me poetry. It features 7 columns including author, publication date, language, and book publisher.

  15. Poetry Assessment EEG Dataset 1

    • openneuro.org
    Updated Sep 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soma Chaudhuri; Joydeep Bhattacharya (2025). Poetry Assessment EEG Dataset 1 [Dataset]. http://doi.org/10.18112/openneuro.ds006648.v1.0.0
    Explore at:
    Dataset updated
    Sep 11, 2025
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Soma Chaudhuri; Joydeep Bhattacharya
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Understanding how the brain engages with poetic language is key to advancing empirical research on aesthetic and creative cognition. This experiment involved 64-channel EEG recordings and behavioural ratings from 51 participants who read and evaluated 210 short English-language texts — 70 Haiku (nature-themed), 70 Senryu (emotion-themed), and 70 non-poetic Control texts. Each poem/text was rated on five subjective dimensions: Aesthetic Appeal, Vivid Imagery, Being Moved, Originality, and Creativity — using a 7-point scale.

    The full study involved 51 participants, and the data were divided into two BIDS-compliant datasets to ensure technical validation and facilitate upload to OpenNeuro.

    Poetry Assessment EEG Dataset 1 (this dataset) contains data from 47 participants whose continuous EEG recordings passed technical validation and were used in the primary analyses. In this dataset, the participants.tsv file maps anonymized BIDS IDs (sub-001 to sub-047) to the original participant codes used during data collection (P101–P151)

    Poetry Assessment EEG Dataset 2 includes the remaining 4 participants (P105, P141, P142, P146), whose EEG recordings were acquired in segments due to session interruptions and later concatenated during preprocessing. These participants were excluded from the PSD analysis to avoid potential artifacts but are included here for completeness and transparency.

    Dataset Structure and Navigation: Each subject folder contains four core EEG files:

    channels.tsv – EEG channel metadata eeg.json – EEG recording metadata eeg.set – Raw EEG data (EEGLAB format) events.tsv – Event markers aligned with poem presentation

    The /code/ directory includes:

    Preprocessing.m – MATLAB preprocessing script BioSemi64.loc – 64-channel coordinate file

    The /derivatives/ directory contains:

    Behavioural_Ratings/ – One .csv file per participant (e.g., P101.csv), including trial-by-trial ratings across five dimensions: Aesthetic Appeal, Vivid Imagery, Emotional Impact (labeled as 'being moved'), Originality, and Creativity.

    Psychometric_Responses/ – A single .csv file with demographic and trait-level questionnaire responses per participant, including: PANAS (mood), Openness, Curiosity, VVIQ (visual imagery), AVIQ (auditory imagery), MAAS (mindfulness), and AReA (aesthetic responsiveness).

    Also includes questionnaires.pdf with full questionnaire texts and scoring keys

    The /stimuli/ directory includes:

    All 210 texts used in the experiment: 70 Haiku (nature-themed poetry), 70 Senryu (emotion-themed poetry), 70 Control (non-poetic matched prose).

    Block-wise trial assignments for all seven blocks

    Resting-state EEG was recorded at the beginning and end of each session. These segments are embedded within the raw EEG files and can be identified using the following trigger codes in events.tsv:

    65285, 65286 → Resting state (before experiment); 65287, 65288 → Resting state (after experiment)

    Interested users may also consult Poetry Assessment EEG Dataset 2 to access recordings from the remaining 4 participants excluded from the main analyses. All preprocessing steps, event markers, and metadata structures were applied identically across both datasets (Poetry Assessment EEG Dataset 1 and Poetry Assessment EEG Dataset 2), ensuring consistency. This enables users to apply their own quality control pipelines and include these data if desired.

    Of note, the anonymized participant IDs (e.g., PXXX) are used consistently across all data modalities, enabling reliable cross-referencing between EEG data, behavioural ratings, and psychometric responses. Data collection took place at the Department of Psychology at Goldsmiths, University of London, UK. The project was approved by the Local Ethics Committee at the Department of Psychology, Goldsmiths University of London. The experiment was conducted in accordance with the Declaration of Helsinki.

    All EEG, behavioural, and psychometric data were anonymized. Participant identifiers were coded (P101–P151), and no names, dates of birth, or other direct identifiers are included.

  16. Data from: Hindi poem Dataset

    • kaggle.com
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dnyanesh Walwadkar (2024). Hindi poem Dataset [Dataset]. https://www.kaggle.com/datasets/dnyaneshwalwadkar/hindi-poem-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dnyanesh Walwadkar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Dnyanesh Walwadkar

    Released under Apache 2.0

    Contents

  17. t

    Chinese Poetry - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Chinese Poetry - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/chinese-poetry
    Explore at:
    Dataset updated
    Dec 2, 2024
    Area covered
    China
    Description

    The Chinese Poetry dataset is a dataset of Chinese poems used for language modeling.

  18. H

    20C Poetry

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Piper (2018). 20C Poetry [Dataset]. http://doi.org/10.7910/DVN/YVN6IW
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Andrew Piper
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a table of word counts for a collection of 75,297 English-language poems.

  19. w

    Dataset of books called The complete poetry of Catullus

    • workwithdata.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called The complete poetry of Catullus [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+complete+poetry+of+Catullus
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is The complete poetry of Catullus. It features 7 columns including author, publication date, language, and book publisher.

  20. h

    prompt-poem-dataset-20240921_004141

    • huggingface.co
    Updated Sep 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vida Tayebati (2024). prompt-poem-dataset-20240921_004141 [Dataset]. https://huggingface.co/datasets/VidaEdco/prompt-poem-dataset-20240921_004141
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2024
    Authors
    Vida Tayebati
    Description

    VidaEdco/prompt-poem-dataset-20240921_004141 dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
merve (2021). poetry [Dataset]. https://huggingface.co/datasets/merve/poetry

poetry

merve/poetry

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 3, 2021
Authors
merve
Description

Dataset Card for poetry

  Dataset Summary

It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern

  Supported Tasks and Leaderboards

[Needs More Information]

  Languages

[Needs More Information]

  Dataset Structure





  Data Instances

[Needs More Information]

  Data Fields

Has 5 columns:

Content Author Poem name Age Type

  Data Splits

Only training… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.

Search
Clear search
Close search
Google apps
Main menu