10 datasets found
  1. d

    Replication Data for: Quality of Legislation and Compliance: A Natural...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Osnabrügge, Moritz; Vannoni, Matia (2024). Replication Data for: Quality of Legislation and Compliance: A Natural Language Processing Approach [Dataset]. http://doi.org/10.7910/DVN/Z8LCHG
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Osnabrügge, Moritz; Vannoni, Matia
    Description

    Several disciplines, such as economics, law and political science, emphasize the importance of legislative quality, namely well-written legislation. Low-quality legislation cannot be easily implemented because the texts create interpretation problems. To measure the quality of legal texts, we use information from the syntactic and lexical features of their language and apply these measures to a dataset of European Union legislation that contains detailed information on its transposition and decision-making process. We find that syntactic complexity and vagueness are negatively related to member states' compliance with legislation. The finding on vagueness is robust to controlling for member states' preferences, administrative resources, discretion and the length of texts. However, the results for syntactic complexity are less robust.

  2. H

    Replication Data for: Matching with Text Data: An Experimental Evaluation of...

    • dataverse.harvard.edu
    Updated Dec 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reagan Mozer (2019). Replication Data for: Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality [Dataset]. http://doi.org/10.7910/DVN/K8IL3V
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 24, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Reagan Mozer
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This repository contains the materials needed to replicate the results presented in Mozer et al. (2019), "Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality", forthcoming in Political Analysis.

  3. P

    T$^3$Bench Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuze He; Yushi Bai; Matthieu Lin; Wang Zhao; Yubin Hu; Jenny Sheng; Ran Yi; Juanzi Li; Yong-Jin Liu, T$^3$Bench Dataset [Dataset]. https://paperswithcode.com/dataset/t-3-bench
    Explore at:
    Authors
    Yuze He; Yushi Bai; Matthieu Lin; Wang Zhao; Yubin Hu; Jenny Sheng; Ran Yi; Juanzi Li; Yong-Jin Liu
    Description

    T$^3$Bench is the first comprehensive text-to-3D benchmark containing diverse text prompts of three increasing complexity levels that are specially designed for 3D generation (300 prompts in total). To assess both the subjective quality and the text alignment, we propose two automatic metrics based on multi-view images produced by the 3D contents. The quality metric combines multi-view text-image scores and regional convolution to detect quality and view inconsistency. The alignment metric uses multi-view captioning and Large Language Model (LLM) evaluation to measure text-3D consistency.

  4. P

    GenEval Dataset

    • paperswithcode.com
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dhruba Ghosh; Hanna Hajishirzi; Ludwig Schmidt (2025). GenEval Dataset [Dataset]. https://paperswithcode.com/dataset/geneval
    Explore at:
    Dataset updated
    Mar 11, 2025
    Authors
    Dhruba Ghosh; Hanna Hajishirzi; Ludwig Schmidt
    Description

    Recent breakthroughs in diffusion models, multimodal pretraining, and efficient finetuning have led to an explosion of text-to-image generative models. Given human evaluation is expensive and difficult to scale, automated methods are critical for evaluating the increasingly large number of new models. However, most current automated evaluation metrics like FID or CLIPScore only offer a holistic measure of image quality or image-text alignment, and are unsuited for fine-grained or instance-level analysis. In this paper, we introduce GenEval, an object-focused framework to evaluate compositional image properties such as object co-occurrence, position, count, and color. We show that current object detection models can be leveraged to evaluate text-to-image models on a variety of generation tasks with strong human agreement, and that other discriminative vision models can be linked to this pipeline to further verify properties like object color. We then evaluate several open-source text-to-image models and analyze their relative generative capabilities on our benchmark. We find that recent models demonstrate significant improvement on these tasks, though they are still lacking in complex capabilities such as spatial relations and attribute binding. Finally, we demonstrate how GenEval might be used to help discover existing failure modes, in order to inform development of the next generation of text-to-image models. Our code to run the GenEval framework is publicly available at this https URL.

  5. f

    Sample text message content after the translation of standard and...

    • plos.figshare.com
    xls
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Catherine A. Staton; Deepti Agnihotri; Ashley J. Phillips; Kennedy Ngowi; Lily Huo; Judith Boshe; Francis Sakita; Anna Tupetz; Brian Suffoletto; Blandina T. Mmbaga; Joao Ricardo Nickenig Vissoci (2024). Sample text message content after the translation of standard and personalized components. [Dataset]. http://doi.org/10.1371/journal.pgph.0002717.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 25, 2024
    Dataset provided by
    PLOS Global Public Health
    Authors
    Catherine A. Staton; Deepti Agnihotri; Ashley J. Phillips; Kennedy Ngowi; Lily Huo; Judith Boshe; Francis Sakita; Anna Tupetz; Brian Suffoletto; Blandina T. Mmbaga; Joao Ricardo Nickenig Vissoci
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample text message content after the translation of standard and personalized components.

  6. h

    SGDD-TST

    • huggingface.co
    Updated Dec 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikolay Babakov (2023). SGDD-TST [Dataset]. https://huggingface.co/datasets/NiGuLa/SGDD-TST
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2023
    Authors
    Nikolay Babakov
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    Overview

    SGDD-TST - Schema-Guided Dialogue Dataset for Text Style Transfer is a dataset for evaluating the quality of content similarity measures for text style transfer in the domain of the personal plans. The original texts were obtained from The Schema-Guided Dialogue Dataset and were paraphrased by the T5-based model trained on GYAFC formality dataset. The results were annotated by the crowdsource workers using Yandex.Toloka.

      File description
    

    The file consists of the… See the full description on the dataset page: https://huggingface.co/datasets/NiGuLa/SGDD-TST.

  7. f

    Data Sheet 1_Weakly supervised text classification on free-text comments in...

    • frontiersin.figshare.com
    pdf
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna-Grace Linton; Vania Gatseva Dimitrova; Amy Downing; Richard Wagland; Adam W. Glaser (2025). Data Sheet 1_Weakly supervised text classification on free-text comments in patient-reported outcome measures.pdf [Dataset]. http://doi.org/10.3389/fdgth.2025.1345360.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    Frontiers
    Authors
    Anna-Grace Linton; Vania Gatseva Dimitrova; Amy Downing; Richard Wagland; Adam W. Glaser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundFree-text comments in patient-reported outcome measures (PROMs) data provide insights into health-related quality of life (HRQoL). However, these comments are typically analysed using manual methods, such as content analysis, which is labour-intensive and time-consuming. Machine learning analysis methods are largely unsupervised, necessitating post-analysis interpretation. Weakly supervised text classification (WSTC) can be a valuable analytical method of analysis for classifying domain-specific text data, especially when limited labelled data are available. In this paper, we applied five WSTC techniques to PROMs comment data to explore the extent to which they can be used to identify HRQoL themes reported by patients with prostate and colorectal cancer.MethodsThe main HRQoL themes and associated keywords were identified from a scoping review. They were used to classify PROMs comments with these themes from two national PROMs datasets: colorectal cancer (n = 5,634) and prostate cancer (n = 59,768). Classification was done using five keyword-based WSTC methods (anchored CorEx, BERTopic, Guided LDA, WeSTClass, and X-Class). To evaluate these methods, we assessed the overall performance of the methods and by theme. Domain experts reviewed the interpretability of the methods using the keywords extracted from the methods during training.ResultsBased on the 12 papers identified in the scoping review, we determined six main themes and corresponding keywords to label PROMs comments using WSTC methods. These themes were: Comorbidities, Daily Life, Health Pathways and Services, Physical Function, Psychological and Emotional Function, and Social Function. The performance of the methods varied across themes and between the datasets. While the best-performing model for both datasets, CorEx, attained weighted F1 scores of 0.57 (colorectal cancer) and 0.61 (prostate cancer), methods achieved an F1 score of up to 0.92 (Social Function) on individual themes. By evaluating the keywords extracted from the trained models, we saw that the methods that can utilise expert-driven seed terms and extrapolate based on limited data performed the best.ConclusionsOverall, evaluating these WSTC methods provided insight into their applicability for analysing PROMs comments. Evaluating the classification performance illustrated the potential and limitations of keyword-based WSTC in labelling PROMs comments when labelled data are limited.

  8. f

    First round of scale surveys.

    • figshare.com
    xls
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yamei Liu (2024). First round of scale surveys. [Dataset]. http://doi.org/10.1371/journal.pone.0308475.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Yamei Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe increase in mental health problems among college students has become a global challenge, with anxiety and depression in particular becoming increasingly prevalent. Positive psychology has gained attention as an important psychological intervention that emphasizes improving mental health by promoting positive emotions and mindfulness. However, with the diversity of reading styles, however, there is a lack of systematic research on these effects. Therefore, this study aims to explore the specific effects of different reading styles on college students’ mental health and quality of life based on positive psychology, with the aim of providing more effective interventions and recommendations for improving college students’ mental health.MethodsThis study used a two-round questionnaire to select students with mental health problems and divided them into four experimental groups with a control group. The study was conducted by distributing questionnaires and experimental interventions, and a total of 2860 valid questionnaires were collected. The study used the Self-Assessment Scale for Anxiety (SAS) and the Self-Depression Scale (SDS) to assess the participants’ anxiety and depression levels. In addition, the study used the Physical Composite Score (PCS) and the Mental Composite Score (MCS) to assess the participants’ quality of life. SPSS 26.0 was used for data statistics and repeated measures ANOVA was used.ResultsPaper text reading and audio reading methods were effective in reducing anxiety levels and improving sleep quality. However, the electronic text reading approach was less effective compared to paper text reading and audio reading, and the video reading approach was not effective in improving depression. In addition, the positive psychology literature reading intervention showed significant improvements in college students’ quality of life scores.ConclusionThe results of this study suggest that paper text reading and audio reading modalities have a positive impact on the mental health and quality of life of college students, while e-text reading and video reading modalities are less effective. These findings provide suggestions for college students to choose appropriate reading styles and further demonstrate the effectiveness of positive psychology reading on mental health. These results have important academic and practical implications for promoting mental health and improving quality of life among college students.

  9. Levels of evidence for the quality of the measurement property.[19, 30].

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olalekan Lee Aiyegbusi; Derek Kyte; Paul Cockwell; Tom Marshall; Adrian Gheorghe; Thomas Keeley; Anita Slade; Melanie Calvert (2023). Levels of evidence for the quality of the measurement property.[19, 30]. [Dataset]. http://doi.org/10.1371/journal.pone.0179733.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Olalekan Lee Aiyegbusi; Derek Kyte; Paul Cockwell; Tom Marshall; Adrian Gheorghe; Thomas Keeley; Anita Slade; Melanie Calvert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Levels of evidence for the quality of the measurement property.[19, 30].

  10. Quality criteria for measurement properties.

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olalekan Lee Aiyegbusi; Derek Kyte; Paul Cockwell; Tom Marshall; Adrian Gheorghe; Thomas Keeley; Anita Slade; Melanie Calvert (2023). Quality criteria for measurement properties. [Dataset]. http://doi.org/10.1371/journal.pone.0179733.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Olalekan Lee Aiyegbusi; Derek Kyte; Paul Cockwell; Tom Marshall; Adrian Gheorghe; Thomas Keeley; Anita Slade; Melanie Calvert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quality criteria for measurement properties.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Osnabrügge, Moritz; Vannoni, Matia (2024). Replication Data for: Quality of Legislation and Compliance: A Natural Language Processing Approach [Dataset]. http://doi.org/10.7910/DVN/Z8LCHG

Replication Data for: Quality of Legislation and Compliance: A Natural Language Processing Approach

Related Article
Explore at:
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Osnabrügge, Moritz; Vannoni, Matia
Description

Several disciplines, such as economics, law and political science, emphasize the importance of legislative quality, namely well-written legislation. Low-quality legislation cannot be easily implemented because the texts create interpretation problems. To measure the quality of legal texts, we use information from the syntactic and lexical features of their language and apply these measures to a dataset of European Union legislation that contains detailed information on its transposition and decision-making process. We find that syntactic complexity and vagueness are negatively related to member states' compliance with legislation. The finding on vagueness is robust to controlling for member states' preferences, administrative resources, discretion and the length of texts. However, the results for syntactic complexity are less robust.

Search
Clear search
Close search
Google apps
Main menu