10 datasets found

d
Replication Data for: Quality of Legislation and Compliance: A Natural...
search.dataone.org
dataverse.harvard.edu
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Osnabrügge, Moritz; Vannoni, Matia (2024). Replication Data for: Quality of Legislation and Compliance: A Natural Language Processing Approach [Dataset]. http://doi.org/10.7910/DVN/Z8LCHG
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/Z8LCHG
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Osnabrügge, Moritz; Vannoni, Matia
Description
Several disciplines, such as economics, law and political science, emphasize the importance of legislative quality, namely well-written legislation. Low-quality legislation cannot be easily implemented because the texts create interpretation problems. To measure the quality of legal texts, we use information from the syntactic and lexical features of their language and apply these measures to a dataset of European Union legislation that contains detailed information on its transposition and decision-making process. We find that syntactic complexity and vagueness are negatively related to member states' compliance with legislation. The finding on vagueness is robust to controlling for member states' preferences, administrative resources, discretion and the length of texts. However, the results for syntactic complexity are less robust.
H
Replication Data for: Matching with Text Data: An Experimental Evaluation of...
dataverse.harvard.edu
Updated Dec 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reagan Mozer (2019). Replication Data for: Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality [Dataset]. http://doi.org/10.7910/DVN/K8IL3V
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/K8IL3V
Dataset updated
Dec 24, 2019
Dataset provided by
Harvard Dataverse
Authors
Reagan Mozer
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This repository contains the materials needed to replicate the results presented in Mozer et al. (2019), "Matching with Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality", forthcoming in Political Analysis.
P
T$^3$Bench Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuze He; Yushi Bai; Matthieu Lin; Wang Zhao; Yubin Hu; Jenny Sheng; Ran Yi; Juanzi Li; Yong-Jin Liu, T$^3$Bench Dataset [Dataset]. https://paperswithcode.com/dataset/t-3-bench
Explore at:
Authors
Yuze He; Yushi Bai; Matthieu Lin; Wang Zhao; Yubin Hu; Jenny Sheng; Ran Yi; Juanzi Li; Yong-Jin Liu
Description
T$^3$Bench is the first comprehensive text-to-3D benchmark containing diverse text prompts of three increasing complexity levels that are specially designed for 3D generation (300 prompts in total). To assess both the subjective quality and the text alignment, we propose two automatic metrics based on multi-view images produced by the 3D contents. The quality metric combines multi-view text-image scores and regional convolution to detect quality and view inconsistency. The alignment metric uses multi-view captioning and Large Language Model (LLM) evaluation to measure text-3D consistency.
P
GenEval Dataset
paperswithcode.com
Updated Mar 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dhruba Ghosh; Hanna Hajishirzi; Ludwig Schmidt (2025). GenEval Dataset [Dataset]. https://paperswithcode.com/dataset/geneval
Explore at:
Dataset updated
Mar 11, 2025
Authors
Dhruba Ghosh; Hanna Hajishirzi; Ludwig Schmidt
Description
Recent breakthroughs in diffusion models, multimodal pretraining, and efficient finetuning have led to an explosion of text-to-image generative models. Given human evaluation is expensive and difficult to scale, automated methods are critical for evaluating the increasingly large number of new models. However, most current automated evaluation metrics like FID or CLIPScore only offer a holistic measure of image quality or image-text alignment, and are unsuited for fine-grained or instance-level analysis. In this paper, we introduce GenEval, an object-focused framework to evaluate compositional image properties such as object co-occurrence, position, count, and color. We show that current object detection models can be leveraged to evaluate text-to-image models on a variety of generation tasks with strong human agreement, and that other discriminative vision models can be linked to this pipeline to further verify properties like object color. We then evaluate several open-source text-to-image models and analyze their relative generative capabilities on our benchmark. We find that recent models demonstrate significant improvement on these tasks, though they are still lacking in complex capabilities such as spatial relations and attribute binding. Finally, we demonstrate how GenEval might be used to help discover existing failure modes, in order to inform development of the next generation of text-to-image models. Our code to run the GenEval framework is publicly available at this https URL.
f
Sample text message content after the translation of standard and...
plos.figshare.com
xls
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Catherine A. Staton; Deepti Agnihotri; Ashley J. Phillips; Kennedy Ngowi; Lily Huo; Judith Boshe; Francis Sakita; Anna Tupetz; Brian Suffoletto; Blandina T. Mmbaga; Joao Ricardo Nickenig Vissoci (2024). Sample text message content after the translation of standard and personalized components. [Dataset]. http://doi.org/10.1371/journal.pgph.0002717.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgph.0002717.t001
Dataset updated
Jul 25, 2024
Dataset provided by
PLOS Global Public Health
Authors
Catherine A. Staton; Deepti Agnihotri; Ashley J. Phillips; Kennedy Ngowi; Lily Huo; Judith Boshe; Francis Sakita; Anna Tupetz; Brian Suffoletto; Blandina T. Mmbaga; Joao Ricardo Nickenig Vissoci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sample text message content after the translation of standard and personalized components.
h
SGDD-TST
huggingface.co
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikolay Babakov (2023). SGDD-TST [Dataset]. https://huggingface.co/datasets/NiGuLa/SGDD-TST
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2023
Authors
Nikolay Babakov
License
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Description
Overview

SGDD-TST - Schema-Guided Dialogue Dataset for Text Style Transfer is a dataset for evaluating the quality of content similarity measures for text style transfer in the domain of the personal plans. The original texts were obtained from The Schema-Guided Dialogue Dataset and were paraphrased by the T5-based model trained on GYAFC formality dataset. The results were annotated by the crowdsource workers using Yandex.Toloka.

File description

The file consists of the… See the full description on the dataset page: https://huggingface.co/datasets/NiGuLa/SGDD-TST.
f
Data Sheet 1_Weakly supervised text classification on free-text comments in...
frontiersin.figshare.com
pdf
Updated Apr 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna-Grace Linton; Vania Gatseva Dimitrova; Amy Downing; Richard Wagland; Adam W. Glaser (2025). Data Sheet 1_Weakly supervised text classification on free-text comments in patient-reported outcome measures.pdf [Dataset]. http://doi.org/10.3389/fdgth.2025.1345360.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fdgth.2025.1345360.s001
Dataset updated
Apr 30, 2025
Dataset provided by
Frontiers
Authors
Anna-Grace Linton; Vania Gatseva Dimitrova; Amy Downing; Richard Wagland; Adam W. Glaser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundFree-text comments in patient-reported outcome measures (PROMs) data provide insights into health-related quality of life (HRQoL). However, these comments are typically analysed using manual methods, such as content analysis, which is labour-intensive and time-consuming. Machine learning analysis methods are largely unsupervised, necessitating post-analysis interpretation. Weakly supervised text classification (WSTC) can be a valuable analytical method of analysis for classifying domain-specific text data, especially when limited labelled data are available. In this paper, we applied five WSTC techniques to PROMs comment data to explore the extent to which they can be used to identify HRQoL themes reported by patients with prostate and colorectal cancer.MethodsThe main HRQoL themes and associated keywords were identified from a scoping review. They were used to classify PROMs comments with these themes from two national PROMs datasets: colorectal cancer (n = 5,634) and prostate cancer (n = 59,768). Classification was done using five keyword-based WSTC methods (anchored CorEx, BERTopic, Guided LDA, WeSTClass, and X-Class). To evaluate these methods, we assessed the overall performance of the methods and by theme. Domain experts reviewed the interpretability of the methods using the keywords extracted from the methods during training.ResultsBased on the 12 papers identified in the scoping review, we determined six main themes and corresponding keywords to label PROMs comments using WSTC methods. These themes were: Comorbidities, Daily Life, Health Pathways and Services, Physical Function, Psychological and Emotional Function, and Social Function. The performance of the methods varied across themes and between the datasets. While the best-performing model for both datasets, CorEx, attained weighted F1 scores of 0.57 (colorectal cancer) and 0.61 (prostate cancer), methods achieved an F1 score of up to 0.92 (Social Function) on individual themes. By evaluating the keywords extracted from the trained models, we saw that the methods that can utilise expert-driven seed terms and extrapolate based on limited data performed the best.ConclusionsOverall, evaluating these WSTC methods provided insight into their applicability for analysing PROMs comments. Evaluating the classification performance illustrated the potential and limitations of keyword-based WSTC in labelling PROMs comments when labelled data are limited.
f
First round of scale surveys.
figshare.com
xls
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yamei Liu (2024). First round of scale surveys. [Dataset]. http://doi.org/10.1371/journal.pone.0308475.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0308475.t001
Dataset updated
Aug 28, 2024
Dataset provided by
PLOS ONE
Authors
Yamei Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe increase in mental health problems among college students has become a global challenge, with anxiety and depression in particular becoming increasingly prevalent. Positive psychology has gained attention as an important psychological intervention that emphasizes improving mental health by promoting positive emotions and mindfulness. However, with the diversity of reading styles, however, there is a lack of systematic research on these effects. Therefore, this study aims to explore the specific effects of different reading styles on college students’ mental health and quality of life based on positive psychology, with the aim of providing more effective interventions and recommendations for improving college students’ mental health.MethodsThis study used a two-round questionnaire to select students with mental health problems and divided them into four experimental groups with a control group. The study was conducted by distributing questionnaires and experimental interventions, and a total of 2860 valid questionnaires were collected. The study used the Self-Assessment Scale for Anxiety (SAS) and the Self-Depression Scale (SDS) to assess the participants’ anxiety and depression levels. In addition, the study used the Physical Composite Score (PCS) and the Mental Composite Score (MCS) to assess the participants’ quality of life. SPSS 26.0 was used for data statistics and repeated measures ANOVA was used.ResultsPaper text reading and audio reading methods were effective in reducing anxiety levels and improving sleep quality. However, the electronic text reading approach was less effective compared to paper text reading and audio reading, and the video reading approach was not effective in improving depression. In addition, the positive psychology literature reading intervention showed significant improvements in college students’ quality of life scores.ConclusionThe results of this study suggest that paper text reading and audio reading modalities have a positive impact on the mental health and quality of life of college students, while e-text reading and video reading modalities are less effective. These findings provide suggestions for college students to choose appropriate reading styles and further demonstrate the effectiveness of positive psychology reading on mental health. These results have important academic and practical implications for promoting mental health and improving quality of life among college students.
Levels of evidence for the quality of the measurement property.[19, 30].
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olalekan Lee Aiyegbusi; Derek Kyte; Paul Cockwell; Tom Marshall; Adrian Gheorghe; Thomas Keeley; Anita Slade; Melanie Calvert (2023). Levels of evidence for the quality of the measurement property.[19, 30]. [Dataset]. http://doi.org/10.1371/journal.pone.0179733.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0179733.t002
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Olalekan Lee Aiyegbusi; Derek Kyte; Paul Cockwell; Tom Marshall; Adrian Gheorghe; Thomas Keeley; Anita Slade; Melanie Calvert
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Levels of evidence for the quality of the measurement property.[19, 30].
Quality criteria for measurement properties.
plos.figshare.com
xls
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olalekan Lee Aiyegbusi; Derek Kyte; Paul Cockwell; Tom Marshall; Adrian Gheorghe; Thomas Keeley; Anita Slade; Melanie Calvert (2023). Quality criteria for measurement properties. [Dataset]. http://doi.org/10.1371/journal.pone.0179733.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0179733.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Olalekan Lee Aiyegbusi; Derek Kyte; Paul Cockwell; Tom Marshall; Adrian Gheorghe; Thomas Keeley; Anita Slade; Melanie Calvert
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Quality criteria for measurement properties.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Osnabrügge, Moritz; Vannoni, Matia (2024). Replication Data for: Quality of Legislation and Compliance: A Natural Language Processing Approach [Dataset]. http://doi.org/10.7910/DVN/Z8LCHG

Replication Data for: Quality of Legislation and Compliance: A Natural Language Processing Approach

Explore at:

Unique identifier

https://doi.org/10.7910/DVN/Z8LCHG

Dataset updated

Sep 25, 2024

Dataset provided by

Harvard Dataverse

Authors

Osnabrügge, Moritz; Vannoni, Matia

Description

Several disciplines, such as economics, law and political science, emphasize the importance of legislative quality, namely well-written legislation. Low-quality legislation cannot be easily implemented because the texts create interpretation problems. To measure the quality of legal texts, we use information from the syntactic and lexical features of their language and apply these measures to a dataset of European Union legislation that contains detailed information on its transposition and decision-making process. We find that syntactic complexity and vagueness are negatively related to member states' compliance with legislation. The finding on vagueness is robust to controlling for member states' preferences, administrative resources, discretion and the length of texts. However, the results for syntactic complexity are less robust.

Clear search

Close search

Google apps

Main menu

Replication Data for: Quality of Legislation and Compliance: A Natural...

Replication Data for: Matching with Text Data: An Experimental Evaluation of...

T$^3$Bench Dataset

GenEval Dataset

Sample text message content after the translation of standard and...

SGDD-TST

Data Sheet 1_Weakly supervised text classification on free-text comments in...

First round of scale surveys.

Levels of evidence for the quality of the measurement property.[19, 30].

Quality criteria for measurement properties.

Replication Data for: Quality of Legislation and Compliance: A Natural Language Processing Approach