44 datasets found
  1. h

    booksum

    • huggingface.co
    Updated Dec 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karim Foda (2021). booksum [Dataset]. https://huggingface.co/datasets/kmfoda/booksum
    Explore at:
    Dataset updated
    Dec 24, 2021
    Authors
    Karim Foda
    License

    https://choosealicense.com/licenses/bsd-3-clause/https://choosealicense.com/licenses/bsd-3-clause/

    Description

    BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization

    Authors: Wojciech Kryściński, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong, Dragomir Radev

      Introduction
    

    The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text… See the full description on the dataset page: https://huggingface.co/datasets/kmfoda/booksum.

  2. T

    booksum

    • tensorflow.org
    • opendatalab.com
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). booksum [Dataset]. https://www.tensorflow.org/datasets/catalog/booksum
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    BookSum: A Collection of Datasets for Long-form Narrative Summarization

    This implementation currently only supports book and chapter summaries.

    GitHub: https://github.com/salesforce/booksum

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('booksum', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  3. Financial Narrative Summaristion 2022

    • kaggle.com
    zip
    Updated Mar 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AKashyap (2024). Financial Narrative Summaristion 2022 [Dataset]. https://www.kaggle.com/datasets/aryankashyapnaveen/financial-narrative-summaristion-2022
    Explore at:
    zip(200328901 bytes)Available download formats
    Dataset updated
    Mar 5, 2024
    Authors
    AKashyap
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    FNS 2022

    Financial Narrative summarisation challenge dataset.

    Contains the first gold summary from the original dataset.

    Useful for financial report summarisation.

  4. Z

    Video Storytelling Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junnan Li; Yongkang Wong; Qi Zhao; Mohan S. Kankanhalli (2020). Video Storytelling Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2383738
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Mohan S. Kankanhalli
    National University of Singapore
    University of Minnesota
    Authors
    Junnan Li; Yongkang Wong; Qi Zhao; Mohan S. Kankanhalli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Video Storytelling is a dataset for generating text story/summarization for videos containing social events. It consists of 105 videos from four categories: birthday, camping, Christmas and wedding. For each video, we provide at least 5 human-written stories.

    Videos are contained in the .tar file with their corresponding category name.

    Text stories are contained in Text.tar.

    In each txt file, the first line is the video id. The start and end time (in seconds) of each sentence is also given.

    test_id.txt provides the id for videos in the test set

    Please cite the following paper if you use the Video Storytelling dataset in your work (papers, articles, reports, books, software, etc):

    Video Storytelling: Textual Summaries for Events. J. Li, Y. Wong, Q.Zhao, M. Kankanhalli. IEEE Transactions on Multimedia.

  5. Adolescent discourse summarization (Lundine et al., 2018)

    • asha.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer P. Lundine; Stacy M. Harnish; Rebecca J. McCauley; Deena Schwen Blackett; Alexandra Zezinka; Wei Chen; Robert A. Fox (2023). Adolescent discourse summarization (Lundine et al., 2018) [Dataset]. http://doi.org/10.23641/asha.6167879.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    American Speech–Language–Hearing Associationhttps://www.asha.org/
    Authors
    Jennifer P. Lundine; Stacy M. Harnish; Rebecca J. McCauley; Deena Schwen Blackett; Alexandra Zezinka; Wei Chen; Robert A. Fox
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose: Summarizing expository passages is a critical academic skill that is understudied in language research. The purpose of this study was to compare the quality of verbal summaries produced by adolescents for 3 different discourse types and to determine whether a composite measure of cognitive skill or a test of expressive syntax predicted their performance.Method: Fifty adolescents listened to, and then verbally summarized, 1 narrative and 2 expository lectures (compare–contrast and cause–effect). They also participated in testing that targeted expressive syntax and 5 cognitive subdomains.Results: Summary quality scores were significantly different across discourse types, with a medium effect size. Analyse revealed significantly higher summary quality scores for cause–effect than compare–contrast summaries. Although the composite cognitive measure contributed significantly to the prediction of quality scores for both types of expository summaries, the expressive syntax score only contributed significantly to the quality scores for narrative summaries.Conclusions: These results support previous research indicating that type of expository discourse may impact student performance. These results also show, for the first time, that cognition may play a predictive role in determining summary quality for expository but not narrative passages in this population. In addition, despite the more complex syntax commonly associated with exposition versus narratives, an expressive syntax score was only predictive of performance on narrative summaries. These findings provide new information, questions, and directions for future research for those who study academic discourse and for professionals who must identify and manage the problems of students struggling with different types of academic discourse.Supplemental Material S1. Descriptive block-level U.S. Census values for participants and rotated structure matrix for principal component analysis with Varimax rotation of socioeconomic status (SES) variables.Supplemental Material S2. Descriptions of compare–contrast, cause–effect, and narrative lectures. Supplemental Material S3. Tests used from the National Institutes of Health Toolbox Cognition Battery.Supplemental Material S4. Pearson correlations for Expressive Syntax score, MLCU, and SI for compare–contrast, cause–effect, and narrative summaries (N = 50).Supplemental Material S5. Pearson correlations for total summarization quality scores for compare–contrast, cause–effect, and narrative lectures, age, socioeconomic status (SES) factors, cognitive composite score, and expressive syntax score (N = 48).Lundine, J. P., Harnish, S. M., McCauley, R. J., Blackett, D. S., Zezinka, A., Chen, W., & Fox, R. A. (2018). Adolescent summaries of narrative and expository discourse: Differences and predictors. Language, Speech, and Hearing Services in Schools, 49, 551–568. https://doi.org/10.1044/2018_LSHSS-17-0105

  6. f

    Summary of narrative review.

    • datasetcatalog.nlm.nih.gov
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cherla, Avi; Srivastava, Divya; Delgrange, Marine; Van Kessel, Robin; Mossialos, Elias; Sood, Harpreet (2023). Summary of narrative review. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000968064
    Explore at:
    Dataset updated
    Nov 8, 2023
    Authors
    Cherla, Avi; Srivastava, Divya; Delgrange, Marine; Van Kessel, Robin; Mossialos, Elias; Sood, Harpreet
    Description

    Digital health technologies used in primary care, referred to as, virtual primary care, allow patients to interact with primary healthcare professionals remotely though the current iteration of virtual primary care may also come with several unintended consequences, such as accessibility barriers and cream skimming. The World Health Organization (WHO) has a well-established framework to understand the functional components of health systems. However, the existing building blocks framework does not sufficiently account for the disruptive and multi-modal impact of digital transformations. In this review, we aimed to develop the first iteration of this updated framework by reviewing the deployment of virtual primary care systems in five leading countries: Canada, Finland, Germany and Sweden and the United Kingdom (England). We found that all five countries have taken different approaches with the deployment of virtual primary care, yet seven common themes were highlighted across countries: (1) stated policy objectives, (2) regulation and governance, (3) financing and reimbursement, (4) delivery and integration, (5) workforce training and support, (6) IT systems and data sharing, and (7) the extent of patient involvement in the virtual primary care system. The conceptual framework that was derived from these findings offers a set of guiding principles that can facilitate the assessment of virtual primary care in health system settings.

  7. Syn-D-CNN Dataset

    • figshare.com
    bin
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akib Sadmanee (2025). Syn-D-CNN Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27367917.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Akib Sadmanee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Text summarization condenses extensive content into concise summaries; however, current approaches often rely on large language models (LLMs), which can lack interpretability and are susceptible to generating hallucinated content. To address these issues, we propose Docusage, an interpretable framework that replicates human summaries through a hierarchical clustering approach combined with extractive summarization, augmented by selective, LLM-based abstraction. Docusage minimizes the risk of hallucinations, ensures contextual relevance, and mitigates the computational costs inherent in leveraging an LLM.Our results show that Docusage aligns closely with journalist-generated summaries, outperforming foundational and specialized models. Additionally, Docusage offers an interpretable framework that is not constrained by context size, ensures transparency regarding the role of extracted sentences within the narrative, and adapts to the style of the training data.

  8. i

    PlotSnap

    • india-data.org
    Updated Jan 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IIIT Hyderabad, IHUB (2025). PlotSnap [Dataset]. https://india-data.org/googleSEO-list-dataset-search
    Explore at:
    structured data, video frameAvailable download formats
    Dataset updated
    Jan 2, 2025
    Dataset authored and provided by
    IIIT Hyderabad, IHUB
    License

    https://india-data.org/terms-conditionshttps://india-data.org/terms-conditions

    Area covered
    India
    Description

    This is a dataset of two popular crime thriller TV Shows "24" and "Prison Break" crafted for story-summarization task. In term of inputs this consists of, per episode frame embeddings generated from CLIP vision encoder, MViT, and DenseNet, as well as utterance embeddings generated from finetuned RoBERTa encoder. For output we have treated recap signals to form story-summary labels cached as per shot and utterance scores for an episode. In total we have a total of 205 episodes.

  9. d

    Expository and narrative discourse summary statistics and demographic...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Mar 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gavin Collins; Jennifer Lundine; Eloise Kaizar (2021). Expository and narrative discourse summary statistics and demographic information for adolescents with and without traumatic brain injury [Dataset]. http://doi.org/10.5061/dryad.v15dv41v8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 2, 2021
    Dataset provided by
    Dryad
    Authors
    Gavin Collins; Jennifer Lundine; Eloise Kaizar
    Time period covered
    Feb 24, 2021
    Description

    The dataset is a .csv file consisting of 10 columns: "subject" is the number assigned to each of the adolescents in the study, 1,...,55; "lecture_type" is either cc (compare-contrast), ce (cause-effect), or n (narrative), and each of the 55 subjects have a row for each lecture type; "development_type" is collected at the subject level, and is either TD (typically developing) or TBI (traumatic brain injury); "sex," (Male/Female) "age," (13-19) and "ses" (a summary of socioeconomic status; a standardized "z-value") are also collected at the subject level; "U" (>=1) is the total number of utterances in the discourse; "C" (>=U) is the total number of clauses in the discourse; "W" (>=C) is the total number of words in the discourse; and "D" (<=W) is the total number of distinct words in the discourse.

  10. TREC 2023 CrisisFACTS Track Dataset

    • catalog.data.gov
    • data.nist.gov
    Updated Sep 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). TREC 2023 CrisisFACTS Track Dataset [Dataset]. https://catalog.data.gov/dataset/trec-2023-crisisfacts-track-dataset
    Explore at:
    Dataset updated
    Sep 11, 2024
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    The CrisisFACTS track focuses on temporal summarization for first responders in emergency situations. These summaries differ from traditional summarization in that they order information by time and produce a series of short updates instead of a longer narrative.

  11. Summary Narrative for creative development

    • figshare.com
    docx
    Updated Jan 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hilary Engward (2024). Summary Narrative for creative development [Dataset]. http://doi.org/10.6084/m9.figshare.24033138.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jan 23, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Hilary Engward
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Main points narrative for creative dissemination development

  12. w

    hbasa2_NAD83_Teale

    • data.wu.ac.at
    Updated Apr 25, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of California (2015). hbasa2_NAD83_Teale [Dataset]. https://data.wu.ac.at/schema/data_gov/NTMzZTA4ZTctODg2OC00YTQwLWFlYzktOTYyM2I3Nzc4ZDlk
    Explore at:
    Dataset updated
    Apr 25, 2015
    Dataset provided by
    State of California
    Area covered
    7c0de00b6bff527e4e9b25ebfb3a285a827b962c
    Description

    REQUIRED: A brief narrative summary of the data set.

  13. d

    NewTransects

    • datasets.ai
    55
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2023). NewTransects [Dataset]. https://datasets.ai/datasets/newtransects-f2037
    Explore at:
    55Available download formats
    Dataset updated
    Jun 1, 2023
    Dataset authored and provided by
    Department of the Interior
    Description

    REQUIRED: A brief narrative summary of the data set.

  14. Fictional Characters Dataset

    • kaggle.com
    zip
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratyush Puri (2025). Fictional Characters Dataset [Dataset]. https://www.kaggle.com/datasets/pratyushpuri/synthetic-fictional-characters-dataset/discussion
    Explore at:
    zip(447343 bytes)Available download formats
    Dataset updated
    Jun 15, 2025
    Authors
    Pratyush Puri
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Description

    This dataset contains 1,500 profiles of fictional characters, each described across 15 diverse and creative columns. The dataset offers a rich variety of character attributes and narrative elements, designed to support a wide range of Natural Language Processing (NLP), generative AI, and storytelling applications.

    Columns include:

    • Character Name

    • Media Type (e.g., Novel, Movie, Webcomic, TV Show, Video Game)

    • Media Source (fictional title/source)

    • Genre (e.g., Fantasy, Sci-Fi, Mystery, Romance, Horror, Thriller)

    • Role (e.g., Protagonist, Antagonist, Sidekick, Mentor, Villain, Hero)

    • Personality Traits (comma-separated adjectives)

    • Backstory (short narrative)

    • Skills/Abilities (comma-separated)

    • Appearance Description (physical summary)

    • Alignment (Hero, Villain, Neutral)

    • Interests/Hobbies

    • Relationships (summary of key connections)

    • Significance/Impact (their importance in the story)

    • Description (detailed narrative)

    • Scenario/Dialogue Example (sample interaction or scenario)

    Potential Uses:

    • Training and benchmarking NLP models (classification, entity recognition, summarization)
    • Building and testing generative AI and chatbots
    • Storytelling, creative writing, and game design prototyping
    • Character analysis and clustering

    Key Features:

    • 1,500 unique, non-repeating character entries
    • All data is synthetic and safe for public use
    • Rich narrative and categorical diversity

    Data Fields

    Column NameDescription
    Character NameFull name of the fictional character
    Media TypeOrigin medium (Novel, Movie, etc.)
    Media SourceFictional title or work
    GenreGenre classification
    RoleNarrative role (Protagonist, Antagonist, etc.)
    Personality TraitsKey adjectives describing personality
    BackstoryBrief background story
    Skills/AbilitiesNotable skills or powers
    Appearance DescriptionPhysical or visual description
    AlignmentMoral alignment
    Interests/HobbiesActivities or interests
    RelationshipsKey relationships or connections
    Significance/ImpactImportance or influence in their story
    DescriptionDetailed narrative description
    Scenario/Dialogue ExampleExample scenario or dialogue for context
    Inspiration

    Fictional character datasets are valuable for advancing research in text generation, character modeling, and creative AI. This dataset is ideal for anyone looking to experiment with synthetic narrative data, prototype new storytelling tools, or benchmark NLP models on character-driven content.

    Acknowledgements

    Data generated using Python Faker and randomization.

    No real persons or copyrighted works are included.

    Licensing

    CC0: Public Domain. This dataset is fully free to use for any purpose.

    You can copy and adapt this structure for your Kaggle submission. It clearly explains what the dataset is, how it was built, what each column means, and why it might be useful to the community.

  15. C

    Public School

    • data.chattlibrary.org
    • chattadata.org
    Updated Aug 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Public School [Dataset]. https://data.chattlibrary.org/w/7c8u-4twi/default?cur=V5Yhl09xK8n&from=IcI8E_sepdJ
    Explore at:
    kmz, xml, xlsx, kml, csv, application/geo+jsonAvailable download formats
    Dataset updated
    Aug 23, 2021
    Description

    REQUIRED: A brief narrative summary of the data set.

  16. d

    Big Muddy National Fish and Wildlife Refuge: Narrative Summary for Fiscal...

    • datadiscoverystudio.org
    Updated May 19, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Big Muddy National Fish and Wildlife Refuge: Narrative Summary for Fiscal Years 1994-1997. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/e7292e107ea24b0380a59b7a559b718f/html
    Explore at:
    Dataset updated
    May 19, 2018
    Description

    description: This annual narrative report for Big Muddy National Fish and Wildlife Refuge summarizes refuge activities during the fiscal years 1994-1997. The report begins with an introduction to the refuge and a summary of the year s highlights and climatic conditions. Information about monitoring and studies- including fishery surveys, amphibian monitoring and seasonal flooding is provided next. Habitat and wildlife management were not discussed because the refuge was in its early stages of development. Coordination activities, such as private land activities and cooperative organizations, are outlined. The resource protection section provides information about law enforcement, water rights, and land acquisition. Information about public education and recreation is given including visitor services and refuge visitation. Finally, refuge planning and administration are discussed.; abstract: This annual narrative report for Big Muddy National Fish and Wildlife Refuge summarizes refuge activities during the fiscal years 1994-1997. The report begins with an introduction to the refuge and a summary of the year s highlights and climatic conditions. Information about monitoring and studies- including fishery surveys, amphibian monitoring and seasonal flooding is provided next. Habitat and wildlife management were not discussed because the refuge was in its early stages of development. Coordination activities, such as private land activities and cooperative organizations, are outlined. The resource protection section provides information about law enforcement, water rights, and land acquisition. Information about public education and recreation is given including visitor services and refuge visitation. Finally, refuge planning and administration are discussed.

  17. g

    Parks and Refuges

    • genesee2050.com
    Updated Jan 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    plcscheiner (2022). Parks and Refuges [Dataset]. https://www.genesee2050.com/datasets/a087b1cdc8b34684b11d687f183a11d4
    Explore at:
    Dataset updated
    Jan 27, 2022
    Dataset authored and provided by
    plcscheiner
    Area covered
    Description

    REQUIRED: A brief narrative summary of the data set.

  18. h

    Long-Data-Collections-booksum-binidx

    • huggingface.co
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZINWIN(Zuojun-Ye) (2023). Long-Data-Collections-booksum-binidx [Dataset]. https://huggingface.co/datasets/win10/Long-Data-Collections-booksum-binidx
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2023
    Authors
    ZINWIN(Zuojun-Ye)
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Fine-tune Data BookSum: BookSum is a dataset for long context summarization. It includes a vast collection of books from various genres, and the task is to generate a coherent and concise summary given a long context from the book. This dataset is designed to test and train models on their ability to understand and summarize long, complex narratives. to convert to binidx format.

  19. Towards story-based classification of movie scenes

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chang Liu; Armin Shmilovici; Mark Last (2023). Towards story-based classification of movie scenes [Dataset]. http://doi.org/10.1371/journal.pone.0228579
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Chang Liu; Armin Shmilovici; Mark Last
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Humans are entertained and emotionally captivated by a good story. Artworks, such as operas, theatre plays, movies, TV series, cartoons, etc., contain implicit stories, which are conveyed visually (e.g., through scenes) and audially (e.g., via music and speech). Story theorists have explored the structure of various artworks and identified forms and paradigms that are common to most well-written stories. Further, typical story structures have been formalized in different ways and used by professional screenwriters as guidelines. Currently, computers cannot yet identify such a latent narrative structure of a movie story. Therefore, in this work, we raise the novel challenge of understanding and formulating the movie story structure and introduce the first ever story-based labeled dataset—the Flintstones Scene Dataset (FSD). The dataset consists of 1, 569 scenes taken from a manual annotation of 60 episodes of a famous cartoon series, The Flintstones, by 105 distinct annotators. The various labels assigned to each scene by different annotators are summarized by a probability vector over 10 possible story elements representing the function of each scene in the advancement of the story, such as the Climax of Act One or the Midpoint. These elements are learned from guidelines for professional script-writing. The annotated dataset is used to investigate the effectiveness of various story-related features and multi-label classification algorithms for the task of predicting the probability distribution of scene labels. We use cosine similarity and KL divergence to measure the quality of predicted distributions. The best approaches demonstrated 0.81 average similarity and 0.67 KL divergence between the predicted label vectors and the ground truth vectors based on the manual annotations. These results demonstrate the ability of machine learning approaches to detect the narrative structure in movies, which could lead to the development of story-related video analytics tools, such as automatic video summarization and recommendation systems.

  20. h

    filtered_convos_research_llm_summaries

    • huggingface.co
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Christopher Grau (2025). filtered_convos_research_llm_summaries [Dataset]. https://huggingface.co/datasets/marccgrau/filtered_convos_research_llm_summaries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2025
    Authors
    Marc Christopher Grau
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Synthetic Call Center Summaries Dataset

      Overview
    

    This dataset contains synthetic summaries of call center conversations generated by different prompt configurations. Each record (in JSON Lines format) includes:

    The original dialogue metadata. A generated summary tailored to provide quick insights for call center service agents. Evaluation metrics

      Prompts for summarization
    

    Narrative: A narrative summary of the conversation. Bullet Points: A summary of the… See the full description on the dataset page: https://huggingface.co/datasets/marccgrau/filtered_convos_research_llm_summaries.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Karim Foda (2021). booksum [Dataset]. https://huggingface.co/datasets/kmfoda/booksum

booksum

kmfoda/booksum

Explore at:
Dataset updated
Dec 24, 2021
Authors
Karim Foda
License

https://choosealicense.com/licenses/bsd-3-clause/https://choosealicense.com/licenses/bsd-3-clause/

Description

BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization

Authors: Wojciech Kryściński, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong, Dragomir Radev

  Introduction

The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text… See the full description on the dataset page: https://huggingface.co/datasets/kmfoda/booksum.

Search
Clear search
Close search
Google apps
Main menu