Search
Clear search
Close search
Main menu
Google apps
100+ datasets found
  1. h

    poetry

    • huggingface.co
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Merve Noyan (2024). poetry [Dataset]. https://huggingface.co/datasets/merve/poetry
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 14, 2024
    Authors
    Merve Noyan
    Description

    Dataset Card for poetry

      Dataset Summary
    

    It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern

      Supported Tasks and Leaderboards
    

    [Needs More Information]

      Languages
    

    [Needs More Information]

      Dataset Structure
    
    
    
    
    
    
    
      Data Instances
    

    [Needs More Information]

      Data Fields
    

    Has 5 columns:

    Content Author Poem name Age Type… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.

  2. American,British,Indian Emotion poetry dataset

    • kaggle.com
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pkkazipeta143 (2024). American,British,Indian Emotion poetry dataset [Dataset]. https://www.kaggle.com/datasets/pkkazipeta143/americanbritishindian-emotion-poetry-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    pkkazipeta143
    Area covered
    United Kingdom, United States
    Description

    Capturing emotion from reviews and tweets is a well studied task. reviews and tweets are not abundant with emotions, where poetry is a text which is abundant with emotions, so capturing emotions from poetry is an interesting task. In this regard we have collected poems from Poemhunter.com(we thank the website owners) and created a dataset and manually annotated the poems with 5 emotions namely Fear, Sad, Surprise, Happy and Angry. This dataset comprise of 3 files 1. ABIEMO: American, British and Indian poets poems 2. CAPEMO: Augmented Poems to resolve class imbalance problem using NLPAUG library(we thank the library developers) 3. BAPEMO: Extended Augmented poems to resolve class imbalance problem

    along with emotion country of poem is also assigned. We can use this dataset to perform poet style analysis, emotion analysis country wise differences in poetry etc.

  3. h

    PoetryFoundationData

    • huggingface.co
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahul Es (2023). PoetryFoundationData [Dataset]. https://huggingface.co/datasets/shahules786/PoetryFoundationData
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2023
    Authors
    Shahul Es
    Description

    This file contains nearly all poems from the Poetry Foundation Website. Content All poems have a title and author. Most poems are also labeled with the tags as available from the Poetry Foundation Website. The word cloud above shows the most used tags! Inspiration This dataset can be used for a variety of tasks related to poetry writing.

  4. Blackout Poetry Dataset

    • kaggle.com
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditeya Baral (2022). Blackout Poetry Dataset [Dataset]. https://www.kaggle.com/aditeyabaral/blackout-poetry-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 9, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aditeya Baral
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Blackout Poetry Dataset

    A blackout poetry dataset constructed from publicly available short stories and large poems. The dataset consists of two variants: 8K and 16K examples of passages along with a poem generated from the passage and the indices of the words in the passage from which words in the poem have been selected. The dataset also contains perplexity scores for each of the poems indicating the language quality of the poems.

    The dataset was constructed synthetically, and hence contains multiple poor poems and frequent grammatical errors. However, it is a great starting point for the task of applying machine learning to blackout poetry generation.

    The dataset was first introduced in MAPLE – MAsking words to generate blackout Poetry using sequence-to-sequence LEarning.

    Content

    The dataset has two variants: - 8K (sampled poems from the 16K dataset with the lowest perplexity scores) - 16K

    Both variants contain data in the following format:

    passagepoemindices
    Did the CIA tell the FBI that it knows the wor...cia fbi the biggest weapon[2, 5, 9, 24, 25]
    A vigilante lacking of heroic qualities that
    ...lacking qualities that damn criminals[2, 5, 6, 11, 12]

    The passage is the text from which the poem is generated. The poem is the generated poem. The indices are the indices of the words in the text that are chosen for the poem.

    Acknowledgements

    This dataset was generated synthetically using Liza Daly's pattern matching based blackout poetry generation.

  5. h

    gutenberg-poetry-corpus

    • huggingface.co
    Updated Oct 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigLAM: BigScience Libraries, Archives and Museums (2022). gutenberg-poetry-corpus [Dataset]. https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2022
    Dataset authored and provided by
    BigLAM: BigScience Libraries, Archives and Museums
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Allison Parrish's Gutenberg Poetry Corpus

    This corpus was originally published under the CC0 license by Allison Parrish. Please visit Allison's fantastic accompanying GitHub repository for usage inspiration as well as more information on how the data was mined, how to create your own version of the corpus, and examples of projects using it. This dataset contains 3,085,117 lines of poetry from hundreds of Project Gutenberg books. Each line has a corresponding gutenberg_id (1191… See the full description on the dataset page: https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus.

  6. d

    AraPoems: An Extensive Dataset of Arabic Poetry Associated with Verses,...

    • dataone.org
    • dataverse.harvard.edu
    • +1more
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qarah, Faisal (2024). AraPoems: An Extensive Dataset of Arabic Poetry Associated with Verses, Rhymes, Meters, and More [Dataset]. http://doi.org/10.7910/DVN/PJPWOY
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Qarah, Faisal
    Description

    The largest Arabic poetry dataset that contains more than 2.09 million verses. The dataset is comprehensive and contains additional information associated for each verse such as poet's name, poem's title, era, meter, sub-meter, etc.

  7. poem-dataset-3

    • kaggle.com
    zip
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pham Tuyet (2024). poem-dataset-3 [Dataset]. https://www.kaggle.com/datasets/phamtuyet/poem-dataset-3/suggestions
    Explore at:
    zip(2602855 bytes)Available download formats
    Dataset updated
    Jan 3, 2024
    Authors
    Pham Tuyet
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Pham Tuyet

    Released under MIT

    Contents

  8. d

    York-Helsinki Parsed Corpus of Old English Poetry - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Oct 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). York-Helsinki Parsed Corpus of Old English Poetry - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/5d44ff9c-6334-57e5-96a8-6b2348e75754
    Explore at:
    Dataset updated
    Oct 19, 2023
    Area covered
    Helsinki
    Description

    A selection of poetic texts (71,490 words) from the Old English Section of the Helsinki Corpus of English Texts, syntactically and morphologically annotated.

  9. poem-dataset

    • kaggle.com
    zip
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pham Tuyet (2023). poem-dataset [Dataset]. https://www.kaggle.com/datasets/phamtuyet/poem-dataset/suggestions
    Explore at:
    zip(43521091 bytes)Available download formats
    Dataset updated
    Dec 19, 2023
    Authors
    Pham Tuyet
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Pham Tuyet

    Released under Apache 2.0

    Contents

  10. poetry dataset

    • kaggle.com
    zip
    Updated Sep 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirza Ahmad Awais (2024). poetry dataset [Dataset]. https://www.kaggle.com/datasets/mirzaahmadawais/poetry-dataset/suggestions?status=pending&yourSuggestions=true
    Explore at:
    zip(21904 bytes)Available download formats
    Dataset updated
    Sep 10, 2024
    Authors
    Mirza Ahmad Awais
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Mirza Ahmad Awais

    Released under CC0: Public Domain

    Contents

  11. P

    CCPM Dataset

    • paperswithcode.com
    Updated Dec 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenhao Li; Fanchao Qi; Maosong Sun; Xiaoyuan Yi; Jiarui Zhang (2024). CCPM Dataset [Dataset]. https://paperswithcode.com/dataset/ccpm
    Explore at:
    Dataset updated
    Dec 31, 2024
    Authors
    Wenhao Li; Fanchao Qi; Maosong Sun; Xiaoyuan Yi; Jiarui Zhang
    Description

    Introduction

    CCPM is a large Chinese classical poetry matching dataset that can be used for poetry matching, understanding and translation.

    The main task of this dataset is: given a description in modern Chinese, the model is supposed to select one line of Chinese classical poetry from four candidates that semantically match the given description most.

    Size

    It contains 27,218 instances in total, which are split into training (21,778), validation (2,720) and test (2,720) sets.

    Format

    Each instance is composed of translation (the description in modern Chinese, a string), choice (four candidate lines of Chinese classical poetry, a list) and answer (the index of the correct line, an integer between 0 and 3).

  12. h

    prompt-poem-dataset-20240921_004141

    • huggingface.co
    Updated Sep 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vida Tayebati (2024). prompt-poem-dataset-20240921_004141 [Dataset]. https://huggingface.co/datasets/VidaEdco/prompt-poem-dataset-20240921_004141
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2024
    Authors
    Vida Tayebati
    Description

    VidaEdco/prompt-poem-dataset-20240921_004141 dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. t

    Chinese Poetry - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Chinese Poetry - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/chinese-poetry
    Explore at:
    Dataset updated
    Dec 2, 2024
    Area covered
    China
    Description

    The Chinese Poetry dataset is a dataset of Chinese poems used for language modeling.

  14. Open Poetry Vision Dataset

    • universe.roboflow.com
    zip
    Updated Apr 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow (2022). Open Poetry Vision Dataset [Dataset]. https://universe.roboflow.com/roboflow-gw7yv/open-poetry-vision/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 7, 2022
    Dataset authored and provided by
    Roboflow
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Text Bounding Boxes
    Description

    Overview

    The Open Poetry Vision dataset is a synthetic dataset created by Roboflow for OCR tasks.

    It combines a random image from the Open Images Dataset with text primarily sampled from Gwern's GPT-2 Poetry project. Each image in the dataset contains between 1 and 5 strings in a variety of fonts and colors randomly positioned in the 512x512 canvas. The classes correspond to the font of the text.

    Example Image: https://i.imgur.com/sZT516a.png" alt="Example Image">

    Use Cases

    A common OCR workflow is to use a neural network to isolate text for input into traditional optical character recognition software. This dataset could make a good starting point for an OCR project like business card parsing or automated paper form-processing.

    Alternatively, you could try your hand using this as a neural font identification dataset. Nvidia, amongst others, have had success with this task.

    Using this Dataset

    Use the fork button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.

    Version 5 of this dataset (classes_all_text-raw-images) has all classes remapped to be labeled as "text." This was accomplished by using Modify Classes as a preprocessing step.

    Version 6 of this dataset (classes_all_text-augmented-FAST) has all classes remapped to be labeled as "text." and was trained with Roboflow's Fast Model.

    Version 7 of this dataset (classes_all_text-augmented-ACCURATE) has all classes remapped to be labeled as "text." and was trained with Roboflow's Accurate Model.

    About Roboflow

    Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

    Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

    Roboflow Workmark

  15. H

    20C Poetry

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    20C Poetry [Dataset]. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/YVN6IW
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Andrew Piper
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a table of word counts for a collection of 75,297 English-language poems.

  16. d

    PANGAEA - Data from Physical Oceanography of the Eastern Mediterranean...

    • b2find.dkrz.de
    Updated May 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). PANGAEA - Data from Physical Oceanography of the Eastern Mediterranean (POEM) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/73d284ca-13a9-5b5e-83f9-0729611c9c45
    Explore at:
    Dataset updated
    May 4, 2023
    Area covered
    Mediterranean Sea
    Description

    The ultimate goal of the Program for the Exploration of the Eastern Mediterranean (POEM) is to reach a comprehensive knowledge of the physical, chemical, and biological oceanography of the Eastern Mediterranean. Such knowledge is an essential basis for environmental management, resource exploration, and marine operations. The overall scientific objectives are to: (1) describe the physical phenomena and quantify their kinematics; (2) define basic dynamical processes; and (3) construct physical models suitable for general ocean scientific studies and applications.

  17. h

    Poetry-Foundation-Poems

    • huggingface.co
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Şuayp Talha Kocabay (2025). Poetry-Foundation-Poems [Dataset]. https://huggingface.co/datasets/suayptalha/Poetry-Foundation-Poems
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2025
    Authors
    Şuayp Talha Kocabay
    License

    https://choosealicense.com/licenses/agpl-3.0/https://choosealicense.com/licenses/agpl-3.0/

    Description

    From: https://www.kaggle.com/datasets/tgdivy/poetry-foundation-poems Poetry Foundation Poems Dataset Overview This dataset contains a collection of 13.9k poems sourced from the Poetry Foundation website. Each poem entry includes its title, author, and associated tags (if available). The dataset provides a robust resource for exploring poetry, analyzing thematic trends, or creating applications such as poem generators. Dataset Structure The dataset consists of the following columns: 1. Title:… See the full description on the dataset page: https://huggingface.co/datasets/suayptalha/Poetry-Foundation-Poems.

  18. russian-poetry-dataset

    • kaggle.com
    zip
    Updated May 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mira318 (2023). russian-poetry-dataset [Dataset]. https://www.kaggle.com/datasets/mira318/russian-poetry-dataset
    Explore at:
    zip(5273183 bytes)Available download formats
    Dataset updated
    May 22, 2023
    Authors
    mira318
    Description

    Dataset

    This dataset was created by mira318

    Contents

  19. w

    Books called Antebellum American women's poetry : a rhetoric of sentiment

    • workwithdata.com
    Updated Nov 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Books called Antebellum American women's poetry : a rhetoric of sentiment [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Antebellum+American+women%27s+poetry+%3A+a+rhetoric+of+sentiment
    Explore at:
    Dataset updated
    Nov 9, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books and is filtered where the book is Antebellum American women's poetry : a rhetoric of sentiment, featuring 7 columns including author, BNB id, book, book publisher, and ISBN. The preview is ordered by publication date (descending).

  20. b

    Evaluating Immersive Experiences - Cancer Alley: Evaluation Report

    • data.bathspa.ac.uk
    pdf
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dion Dobrzynski (2025). Evaluating Immersive Experiences - Cancer Alley: Evaluation Report [Dataset]. http://doi.org/10.17870/bathspa.27377532.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    BathSPAdata
    Authors
    Dion Dobrzynski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This report evaluates the impact of Cancer Alley, an immersive poetry film created by poet Lucy English and filmmakers Pamela Faulkenberg and Jack Cochran and screened at the Watershed as part of the Lyra Bristol Poetry Festival 2024. This report examines data collected before, during, and after the screening from a group of poets and filmmakers as well as members of the audience, through in-depth qualitative analysis of questionnaires, interviews, focus group, and journals.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Merve Noyan (2024). poetry [Dataset]. https://huggingface.co/datasets/merve/poetry

poetry

merve/poetry

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 14, 2024
Authors
Merve Noyan
Description

Dataset Card for poetry

  Dataset Summary

It contains poems from subjects: Love, Nature and Mythology & Folklore that belong to two periods namely Renaissance and Modern

  Supported Tasks and Leaderboards

[Needs More Information]

  Languages

[Needs More Information]

  Dataset Structure







  Data Instances

[Needs More Information]

  Data Fields

Has 5 columns:

Content Author Poem name Age Type… See the full description on the dataset page: https://huggingface.co/datasets/merve/poetry.