Introduction
AGIEval is a human-centric benchmark specifically designed to evaluate the general abilities of foundation models in tasks pertinent to human cognition and problem-solving. This benchmark is derived from 20 official, public, and high-standard admission and qualification exams intended for general human test-takers, such as general college admission tests (e.g., Chinese College Entrance Exam (Gaokao) and American SAT), law school admission tests, math competitions… See the full description on the dataset page: https://huggingface.co/datasets/lighteval/agi_eval_en.
Q-Eval-100K Dataset (CVPR 2025 Oral)
📝 Introduction
The Q-Eval-100K dataset encompasses both text-to-image and text-to-video models, with 960K human annotations specifically focused on visual quality and alignment for 100K instances (60K images and 40K videos). We utilize multiple popular text-to- image and text-to-video models to ensure diversity, which include FLUX, Lumina-T2X, PixArt, Stable Diffusion 3, Stable Diffusion XL, DALL·E 3, Wanx, Midjourney, Hunyuan-DiT… See the full description on the dataset page: https://huggingface.co/datasets/AGI-Eval-Official/Q-Eval-100K.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Test013
Released under MIT
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
OIBench Dataset
Dataset Overview
OIBench is a high-quality, private, and challenging olympiad-level informatics benchmark consisting of 250 carefully curated original problems. The OIBench Dataset's HuggingFace repo contains algorithm problem statements, solutions, and associated metadata such as test cases, pseudo code, and difficulty levels. The dataset has been processed and stored in Parquet format for efficient access and analysis. We provide complete information… See the full description on the dataset page: https://huggingface.co/datasets/AGI-Eval/OIBench.
orion-research/agi-eval dataset hosted on Hugging Face and contributed by the HF Datasets community
freddie/agi-eval-sat-math-judgments-no-multiple-choice dataset hosted on Hugging Face and contributed by the HF Datasets community
The Adolescent Girls Initiative (AGI) pilot was implemented by the Government of Rwanda as part of an eight-country initiative led by the World Bank aimed at promoting the economic empowerment of adolescent girls. The development objective of the Rwanda AGI was to improve employment, incomes and empowerment of disadvantaged adolescent girls and young women (aged 16-24), and to test two integrated models for promoting these goals.
The Rwanda AGI had three components: Component I: Skills Development and Entrepreneurship Support, Component II: Scholarships to Resume Formal Education, Component III: Project Implementation Support
This evaluation focused exclusively on Component I, which was carried out by the Workforce Development Authority (WDA), under the supervision of the Ministry of Gender and Family Promotion (MIGEPROF). It was delivered sequentially to roughly 2,000 vulnerable girls and young women in three equal-sized cohorts between 2012 and 2014. The project was targeted geographically in four districts (Gasabo, Kicukiro, Gicumbi, and Rulindo), where nine vocational training centers (VTCs) provided the training.
The three objectives of the evaluation were: - To examine how well the AGI project delivered the planned activities - To assess the usefulness of the training provided - To measure the change in beneficiary outcomes before and after the AGI project.
The evaluation was conducted on the second cohort of beneficiaries, from which 160 girls were randomly selected to participate in baseline and endline surveys.
The project targeted geographically to four districts that already had training centers: Gasabo, Kicukiro, Gicumbi and Rulindo.
Sample survey data [ssd]
After the initial pre-screening for eligibility, the sample was stratified by the sector of participants' residence and selected through a public lottery conducted by Workforce Development Authority and the Ministry of Gender and Family Promotion in each of the 11 recruitment sectors. The girls were invited to attend, and directly after the lottery, Laterite Limited - an independently contracted research firm - conducted uniform random sampling (in Excel) to select a subset of admitted applicants for the baseline survey. However, the baseline survey was administered only to those who were physically present at the lottery. In 6 of the 11sectors of recruitment, girls who did not appear for the lottery were excluded from the project, so the evaluation sample reflects the project sample. In the other 5 sectors, absent applicants who were randomly selected for project admission were still allowed to join, but they were still excluded from the baseline survey. Specifically, cohort 2 had 1,364 applicants who passed the screening committee and 712 were randomly selected for project admission. Further, unsuccessful but eligible applicants were allowed to enter the lottery for the third cohort, which started just one month after the second cohort. Hence, there was no feasible way to use the rejected applicants as a control group for an evaluation.
A follow-up survey was administered to 160 of the 182 randomly sampled beneficiaries that responded to the baseline survey. Though special effort was made to follow up with the 43 individuals from the baseline survey who did not complete the project, the team was only able to interview 21 of them.
Computer Assisted Personal Interview [capi]
After the collection of survey data, Laterite Limited prepared the data for analysis by correcting duplicate identification numbers, renaming endline variable names in order to match baseline variable names, dropping confidential personal identification variables (e.g. name, mobile phone number), GPS coordinates, device numbers, codifying variables stored as names of income-generating activities (IGAs), and merging baseline and endline datasets.
A number of additional changes to the data were made during the quantitative analysis: - Values of specific variables (e.g. business type, first or second income-generating activity) recorded as "other" that fit existing answer options were re-codified; - To address inconsistencies between different sections of the survey, values entered for the IGA screening sections (whether respondents was engaged in any household agricultural activities, wage employment, non-farm business or internship) were corrected based on information provided in subsequent, more detailed, questions on the two main income-generating activities and/or business. No changes were made in the absence of supporting information. Where both wage employment and non-farm businesses were indicated for the same IGA, answers to screening questions were reconciled based on whether the respondent reported working for herself (business) or for a non-relative (paid job). - Because 86 out of 160 values for age at baseline were missing in the merged dataset provided by Laterite Limited, data on age was extracted from the baseline dataset; - Outliers - 3 income values (extra 0 at the end, or amount entered as in-kind daily payment instead of monthly income) and 4 in-kind amount values (divided by 10 to fit in ranges of reported in-kind amounts for same occupation) were considered typos; for the remaining outliers, values above the 99th quintile were dropped from the estimations.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for "agieval-logiqa-en"
Dataset taken from https://github.com/microsoft/AGIEval and processed as in that repo. Raw datset: https://github.com/lgw863/LogiQA-dataset Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) @misc{zhong2023agieval, title={AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models}, author={Wanjun Zhong and Ruixiang Cui and Yiduo Guo and Yaobo Liang and Shuai Lu and Yanlin Wang and Amin Saied and Weizhu… See the full description on the dataset page: https://huggingface.co/datasets/dmayhem93/agieval-logiqa-en.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for AGIEval
Dataset Summary
AGIEval is a human-centric benchmark specifically designed to evaluate the general abilities of foundation models in tasks pertinent to human cognition and problem-solving. This benchmark is derived from 20 official, public, and high-standard admission and qualification exams intended for general human test-takers, such as general college admission tests (e.g., Chinese College Entrance Exam (Gaokao) and American SAT), law school… See the full description on the dataset page: https://huggingface.co/datasets/baber/agieval.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Adolescent Girls Initiative (AGI) pilot was implemented by the Government of Rwanda as part of an eight-country initiative led by the World Bank aimed at promoting the economic empowerment of adolescent girls. The development objective of the Rwanda AGI was to improve employment, incomes and empowerment of disadvantaged adolescent girls and young women (aged 16-24), and to test two integrated models for promoting these goals. The Rwanda AGI had three components: Component I: Skills Development and Entrepreneurship Support, Component II: Scholarships to Resume Formal Education, Component III: Project Implementation Support This evaluation focused exclusively on Component I, which was carried out by the Workforce Development Authority (WDA), under the supervision of the Ministry of Gender and Family Promotion (MIGEPROF). It was delivered sequentially to roughly 2,000 vulnerable girls and young women in three equal-sized cohorts between 2012 and 2014. The project was targeted geographically in four districts (Gasabo, Kicukiro, Gicumbi, and Rulindo), where nine vocational training centers (VTCs) provided the training. The three objectives of the evaluation were: To examine how well the AGI project delivered the planned activities To assess the usefulness of the training provided To measure the change in beneficiary outcomes before and after the AGI project. The evaluation was conducted on the second cohort of beneficiaries, from which 160 girls were randomly selected to participate in baseline and endline surveys.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Face evaluation and first impression generation can be affected by multiple face elements such as invariant facial features, gaze direction and environmental context; however, the composite modulation of eye gaze and illumination on faces of different gender and ages has not been previously investigated. We aimed at testing how these different facial and contextual features affect ratings of social attributes. Thus, we created and validated the Bi-AGI Database, a freely available new set of male and female face stimuli varying in age across lifespan from 18 to 87 years, gaze direction and illumination conditions. Judgments on attractiveness, femininity-masculinity, dominance and trustworthiness were collected for each stimulus. Results evidence the interaction of the different variables in modulating social trait attribution, in particular illumination differently affects ratings across age, gaze and gender, with less impact on older adults and greater effect on young faces.
freddie/agi-eval-sat-math-judgments dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "agieval-sat-math"
Dataset taken from https://github.com/microsoft/AGIEval and processed as in that repo. MIT License Copyright (c) Microsoft Corporation. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of… See the full description on the dataset page: https://huggingface.co/datasets/dmayhem93/agieval-sat-math.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "agieval-lsat-lr"
Dataset taken from https://github.com/microsoft/AGIEval and processed as in that repo. Raw datset: https://github.com/zhongwanjun/AR-LSAT MIT License Copyright (c) 2022 Wanjun Zhong Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish… See the full description on the dataset page: https://huggingface.co/datasets/dmayhem93/agieval-lsat-lr.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quantitative predictor and outcome variables used to analyze the indirect costs of acute gastrointestinal illness (AGI) in Rigolet, Nunatsiavut, Canada.
AI Generated Content (AIGC) refers to any form of content, such as text, images, audio, or video, that is created with the help of artificial intelligence technology. With the flourishing development of deep learning, the efficiency of AIGC generation has increased, and AI-Generated Image (AGI) is becoming more prevalent in areas such as culture, entertainment, education, social media, etc.
Unlike Natural Scene Images (NSIs) captured from natural scenes, AGIs are directly generated from AI models. Thus, AGIs obtain some unique quality characteristics and viewers tend to evaluate the quality of AGIs from some different aspects of NSIs.
Therefore, we propose the first perceptual AGI Quality Assessment (AGIQA-1K) database, which provides 1,080 AGIs along with quality labels, including technical issues, AI artifacts, unnaturalness, discrepancy, and aesthetics as major evaluation aspects.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Protein-Protein, Genetic, and Chemical Interactions for Liu Z (2018):Novel ASK1 inhibitor AGI-1067 improves AGE-induced cardiac dysfunction by inhibiting MKKs/p38 MAPK and NF-?B apoptotic signaling. curated by BioGRID (https://thebiogrid.org); ABSTRACT: Heart failure has been identified as one of the clinical manifestations of diabetic cardiovascular complications. Excessive myocardium apoptosis characterizes cardiac dysfunctions, which are correlated with an increased level of advanced glycation end products (AGEs). In this study, we investigated the participation of reactive oxygen species (ROS) and the involvements of apoptosis signal-regulating kinase 1 (ASK1)/mitogen-activated protein kinase (MAPK) kinases (MKKs)/p38 MAPK and nuclear factor ?B (NF-?B) pathways in AGE-induced apoptosis-mediated cardiac dysfunctions. The antioxidant and therapeutic effects of a novel ASK1 inhibitor, AGI-1067, were also studied. Myocardium and isolated primary myocytes were exposed to AGEs and treated with AGI-1067. Invasive hemodynamic and echocardiographic assessments were used to evaluate the cardiac functions. ROS formation was evaluated by dihydroethidium fluorescence staining. A terminal deoxynucleotidyl transferase dUTP nick end labelling assay was used to detect the apoptotic cells. ASK1 and NADPH activities were determined by kinase assays. The association between ASK1 and thioredoxin 1 (Trx1) was assessed by immunoprecipitation. Western blotting was used to evaluate the phosphorylation and expression levels of proteins. Our results showed that AGE exposure significantly activated ASK1/MKKs/p38 MAPK, which led to increased cardiac apoptosis and cardiac impairments. AGI-1067 administration inhibited the activation of MKKs/p38 MAPK by inhibiting the disassociation of ASK1 and Trx1, which suppressed the AGE-induced myocyte apoptosis. Moreover, the NF-?B activation as well as the ROS generation was inhibited. As a result, cardiac functions were improved. Our findings suggested that AGI-1067 recovered AGE-induced cardiac dysfunction by blocking both ASK1/MKKs/p38 and NF-?B apoptotic signaling pathways.
LangAGI-Lab/step-wise-eval-addtional-with-tao dataset hosted on Hugging Face and contributed by the HF Datasets community
LangAGI-Lab/step-wise-eval-additional-refined-tao dataset hosted on Hugging Face and contributed by the HF Datasets community
RIT4AGI/Eval dataset hosted on Hugging Face and contributed by the HF Datasets community
Introduction
AGIEval is a human-centric benchmark specifically designed to evaluate the general abilities of foundation models in tasks pertinent to human cognition and problem-solving. This benchmark is derived from 20 official, public, and high-standard admission and qualification exams intended for general human test-takers, such as general college admission tests (e.g., Chinese College Entrance Exam (Gaokao) and American SAT), law school admission tests, math competitions… See the full description on the dataset page: https://huggingface.co/datasets/lighteval/agi_eval_en.