29 datasets found

h
alpaca-gpt4
huggingface.co
opendatalab.com
Updated Apr 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
alpaca-gpt4 [Dataset]. https://huggingface.co/datasets/vicgalle/alpaca-gpt4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 14, 2023
Authors
Victor Gallego
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for "alpaca-gpt4"

This dataset contains English Instruction-Following generated by GPT-4 using Alpaca prompts for fine-tuning LLMs. The dataset was originaly shared in this repository: https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM. This is just a wraper for compatibility with huggingface's datasets library.

Dataset structure

It contains 52K instruction-following data generated by GPT-4 using the same prompts as in Alpaca. The dataset has… See the full description on the dataset page: https://huggingface.co/datasets/vicgalle/alpaca-gpt4.
Estimated water consumption for training GPT-3 2023
statista.com
Updated Nov 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/
Explore at:
Dataset updated
Nov 19, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jul 2023
Area covered
Worldwide
Description
GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.
f
Summary of GPT-4 TR review.
plos.figshare.com
xls
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack Gallifant; Amelia Fiske; Yulia A. Levites Strekalova; Juan S. Osorio-Valencia; Rachael Parke; Rogers Mwavu; Nicole Martinez; Judy Wawira Gichoya; Marzyeh Ghassemi; Dina Demner-Fushman; Liam G. McCoy; Leo Anthony Celi; Robin Pierce (2024). Summary of GPT-4 TR review. [Dataset]. http://doi.org/10.1371/journal.pdig.0000417.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000417.t001
Dataset updated
Jan 18, 2024
Dataset provided by
PLOS Digital Health
Authors
Jack Gallifant; Amelia Fiske; Yulia A. Levites Strekalova; Juan S. Osorio-Valencia; Rachael Parke; Rogers Mwavu; Nicole Martinez; Judy Wawira Gichoya; Marzyeh Ghassemi; Dina Demner-Fushman; Liam G. McCoy; Leo Anthony Celi; Robin Pierce
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The study provides a comprehensive review of OpenAI’s Generative Pre-trained Transformer 4 (GPT-4) technical report, with an emphasis on applications in high-risk settings like healthcare. A diverse team, including experts in artificial intelligence (AI), natural language processing, public health, law, policy, social science, healthcare research, and bioethics, analyzed the report against established peer review guidelines. The GPT-4 report shows a significant commitment to transparent AI research, particularly in creating a systems card for risk assessment and mitigation. However, it reveals limitations such as restricted access to training data, inadequate confidence and uncertainty estimations, and concerns over privacy and intellectual property rights. Key strengths identified include the considerable time and economic investment in transparent AI research and the creation of a comprehensive systems card. On the other hand, the lack of clarity in training processes and data raises concerns about encoded biases and interests in GPT-4. The report also lacks confidence and uncertainty estimations, crucial in high-risk areas like healthcare, and fails to address potential privacy and intellectual property issues. Furthermore, this study emphasizes the need for diverse, global involvement in developing and evaluating large language models (LLMs) to ensure broad societal benefits and mitigate risks. The paper presents recommendations such as improving data transparency, developing accountability frameworks, establishing confidence standards for LLM outputs in high-risk settings, and enhancing industry research review processes. It concludes that while GPT-4’s report is a step towards open discussions on LLMs, more extensive interdisciplinary reviews are essential for addressing bias, harm, and risk concerns, especially in high-risk domains. The review aims to expand the understanding of LLMs in general and highlights the need for new reflection forms on how LLMs are reviewed, the data required for effective evaluation, and addressing critical issues like bias and risk.
f
Data Sheet 1_Evaluating the strengths and limitations of multimodal...
frontiersin.figshare.com
docx
Updated Jun 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saif Aldeen AlRyalat; Ayman Mohammed Musleh; Malik Y. Kahook (2024). Data Sheet 1_Evaluating the strengths and limitations of multimodal ChatGPT-4 in detecting glaucoma using fundus images.docx [Dataset]. http://doi.org/10.3389/fopht.2024.1387190.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fopht.2024.1387190.s001
Dataset updated
Jun 7, 2024
Dataset provided by
Frontiers
Authors
Saif Aldeen AlRyalat; Ayman Mohammed Musleh; Malik Y. Kahook
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OverviewThis study evaluates the diagnostic accuracy of a multimodal large language model (LLM), ChatGPT-4, in recognizing glaucoma using color fundus photographs (CFPs) with a benchmark dataset and without prior training or fine tuning.MethodsThe publicly accessible Retinal Fundus Glaucoma Challenge “REFUGE” dataset was utilized for analyses. The input data consisted of the entire 400 image testing set. The task involved classifying fundus images into either ‘Likely Glaucomatous’ or ‘Likely Non-Glaucomatous’. We constructed a confusion matrix to visualize the results of predictions from ChatGPT-4, focusing on accuracy of binary classifications (glaucoma vs non-glaucoma).ResultsChatGPT-4 demonstrated an accuracy of 90% with a 95% confidence interval (CI) of 87.06%-92.94%. The sensitivity was found to be 50% (95% CI: 34.51%-65.49%), while the specificity was 94.44% (95% CI: 92.08%-96.81%). The precision was recorded at 50% (95% CI: 34.51%-65.49%), and the F1 Score was 0.50.ConclusionChatGPT-4 achieved relatively high diagnostic accuracy without prior fine tuning on CFPs. Considering the scarcity of data in specialized medical fields, including ophthalmology, the use of advanced AI techniques, such as LLMs, might require less data for training compared to other forms of AI with potential savings in time and financial resources. It may also pave the way for the development of innovative tools to support specialized medical care, particularly those dependent on multimodal data for diagnosis and follow-up, irrespective of resource constraints.
ChatQA2-Long-SFT-data
huggingface.co
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2025). ChatQA2-Long-SFT-data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA2-Long-SFT-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 18, 2025
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
Description
Data Description

Here, we release the full long SFT training dataset of ChatQA2. It consists of two parts: long_sft and NarrativeQA_131072. The long_sft dataset is built and derived from existing datasets: LongAlpaca12k, GPT-4 samples from Open Orca, and Long Data Collections. The NarrativeQA_131072 dataset is synthetically generated from NarrativeQA by adding related paragraphs to the given ground truth summary. For the first two steps training of ChatQA-2, we follow ChatQA1.5.… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA2-Long-SFT-data.
h
ScaleBiO-Train-Open-Orca-1million-gpt-4
huggingface.co
Updated Nov 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ScaleBiO (2024). ScaleBiO-Train-Open-Orca-1million-gpt-4 [Dataset]. https://huggingface.co/datasets/ScaleBiO/ScaleBiO-Train-Open-Orca-1million-gpt-4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2024
Dataset authored and provided by
ScaleBiO
Description
Dataset Card for "ScaleBiO-Train-Open-Orca-1million-gpt-4"

More Information needed
Z
Curlie Enhanced with LLM Annotations: Two Datasets for Advancing...
data.niaid.nih.gov
Updated Dec 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Curlie Enhanced with LLM Annotations: Two Datasets for Advancing Homepage2Vec's Multilingual Website Classification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10413067
Explore at:
Dataset updated
Dec 21, 2023
Dataset provided by
Senghaas, Mika
Cizinsky, Ludek
Nutter, Peter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Advancing Homepage2Vec with LLM-Generated Datasets for Multilingual Website Classification

This dataset contains two subsets of labeled website data, specifically created to enhance the performance of Homepage2Vec, a multi-label model for website classification. The datasets were generated using Large Language Models (LLMs) to provide more accurate and diverse topic annotations for websites, addressing a limitation of existing Homepage2Vec training data.

Key Features:

LLM-generated annotations: Both datasets feature website topic labels generated using LLMs, a novel approach to creating high-quality training data for website classification models.

Improved multi-label classification: Fine-tuning Homepage2Vec with these datasets has been shown to improve its macro F1 score from 38% to 43% evaluated on a human-labeled dataset, demonstrating their effectiveness in capturing a broader range of website topics.

Multilingual applicability: The datasets facilitate classification of websites in multiple languages, reflecting the inherent multilingual nature of Homepage2Vec.

Dataset Composition:

curlie-gpt3.5-10k: 10,000 websites labeled using GPT-3.5, context 2 and 1-shot

curlie-gpt4-10k: 10,000 websites labeled using GPT-4, context 2 and zero-shot

Intended Use:

Fine-tuning and advancing Homepage2Vec or similar website classification models

Research on LLM-generated datasets for text classification tasks

Exploration of multilingual website classification

Additional Information:

Project and report repository: https://github.com/CS-433/ml-project-2-mlp

Acknowledgments:

This dataset was created as part of a project at EPFL's Data Science Lab (DLab) in collaboration with Prof. Robert West and Tiziano Piccardi.
h
Essay-quetions-auto-grading-arabic
huggingface.co
Updated Jun 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
emam (2024). Essay-quetions-auto-grading-arabic [Dataset]. https://huggingface.co/datasets/mohamedemam/Essay-quetions-auto-grading-arabic
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 28, 2024
Authors
emam
License
https://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/
Description
Dataset Overview The Open Orca Enhanced Dataset is meticulously designed to improve the performance of automated essay grading models using deep learning techniques. This dataset integrates robust data instances from the FLAN collection, augmented with responses generated by GPT-3.5 or GPT-4, creating a diverse and context-rich resource for training models. Dataset Structure The dataset is structured in a tabular format, with the following key fields: id: A unique identifier for each data… See the full description on the dataset page: https://huggingface.co/datasets/mohamedemam/Essay-quetions-auto-grading-arabic.
d
Replication Data for: Large Language Models as a Substitute for Human...
search.dataone.org
dataverse.harvard.edu
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Replication Data for: Large Language Models as a Substitute for Human Experts in Annotating Political Text [Dataset]. https://search.dataone.org/view/sha256:e5cec1392761939dcd26b8f7739b7f1222e17e59d4fd23d65d168f240d539ec0
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/V2P6YL
Dataset updated
Mar 6, 2024
Dataset provided by
Harvard Dataverse
Authors
Heseltine, Michael
Description
Large-scale text analysis has grown rapidly as a method in political science and beyond. To date, text-as-data methods rely on large volumes of human-annotated training examples, which places a premium on researcher resources. However, advances in large language models (LLMs) may make automated annotation increasingly viable. This paper tests the performance of GPT-4 across a range of scenarios relevant for analysis of political text. We compare GPT-4 coding with human expert coding of tweets and news articles across four variables (whether text is political, its negativity, its sentiment, and its ideology) and across four countries (the United States, Chile, Germany, and Italy). GPT-4 coding is highly accurate, especially for shorter texts such as tweets, correctly classifying texts up to 95\% of the time. Performance drops for longer news articles, and very slightly for non-English text. We introduce a ``hybrid'' coding approach, in which disagreements of multiple GPT-4 runs are adjudicated by a human expert, which boosts accuracy. Finally, we explore downstream effects, finding that transformer models trained on hand-coded or GPT-4-coded data yield almost identical outcomes. Our results suggests that LLM-assisted coding is a viable and cost-efficient approach, although consideration should be given to task complexity.
Experimental materials for the work entitled "GPT-Powered Elicitation...
figshare.com
docx
Updated May 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
binnur gorer (2024). Experimental materials for the work entitled "GPT-Powered Elicitation Interview Script Generator for Requirements Engineering Training" [Dataset]. http://doi.org/10.6084/m9.figshare.25193657.v6
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25193657.v6
Dataset updated
May 7, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
binnur gorer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository includes the materials of the work entitled"GPT-Powered Elicitation Interview Script Generator for Requirements Engineering Training",by Binnur Görer and Fatma Başak Aydemir, which is accepted to be presented in RE@Next! track of RE'24.Files to create Custom GPT: We provide three domain-specific files to augment GPT-4.common_mistakes.txt: includes the details about the common mistakes encountered in requirement elicitation interviews.example_conversation_simplified.txt: includes a sample requirements elicitation interview script.guidelines_for_interviewers_questions_short.docx: includes best practices of requirements elicitation interviewing.Experiment Output: We provide generated interview scripts for four different domains;Social housing app: Interview Script Social Housing App.docxDigital health tracking app: Interview Script for Digital Health Tracking App.docxFood delivery app: Interview Script for Food Delivery App.docxMeeting scheduler system: Interview Script for Meeting Scheduler System.docx gpt_conversation_links.txt includes the ChatGPT links for each generated interview script.Evaluation: The grading rubric used in expert study is provided in REI Grading - Expert Study.xlsx
P
GPTFuzzer Dataset
paperswithcode.com
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiahao Yu; Xingwei Lin; Zheng Yu; Xinyu Xing (2025). GPTFuzzer Dataset [Dataset]. https://paperswithcode.com/dataset/gptfuzzer
Explore at:
Dataset updated
Jan 21, 2025
Authors
Jiahao Yu; Xingwei Lin; Zheng Yu; Xinyu Xing
Description
GPTFuzzer is a fascinating project that explores red teaming of large language models (LLMs) using auto-generated jailbreak prompts. Let's dive into the details:

Project Overview: GPTFuzzer aims to assess the security and robustness of LLMs by crafting prompts that can potentially lead to harmful or unintended behavior.

The project focuses on GPT-3 and similar models.

Datasets:

The datasets used in GPTFuzzer include:

Harmful Questions: Sampled from public datasets like llm-jailbreak-study and hh-rlhf. Human-Written Templates: Collected from llm-jailbreak-study. Responses: Gathered by querying models like Vicuna-7B, ChatGPT, and Llama-2-7B-chat.

Models:

The judgment model is a finetuned RoBERTa-large model. The training code and data are available in the repository.

During fuzzing experiments, the model is automatically downloaded and cached.

Updates:

The project has received recognition and awards at conferences like Geekcon 2023. The team continues to improve the codebase and aims to build a general black-box fuzzing framework for LLMs.

Source: Conversation with Bing, 3/17/2024 (1) sherdencooper/GPTFuzz: Official repo for GPTFUZZER - GitHub. https://github.com/sherdencooper/GPTFuzz. (2) GPTFUZZER : Red Teaming Large Language Models with Auto ... - GitHub. https://github.com/sherdencooper/GPTFuzz/blob/master/README.md. (3) GPTFUZZER : Red Teaming Large Language Models with Auto ... - GitHub. https://github.com/CriticalPulsar/GPTFuzz/blob/master/README.md. (4) undefined. https://avatars.githubusercontent.com/u/37368657?v=4. (5) undefined. https://github.com/sherdencooper/GPTFuzz/blob/master/README.md?raw=true. (6) undefined. https://desktop.github.com. (7) undefined. https://github.com/sherdencooper/GPTFuzz/raw/master/README.md. (8) undefined. https://opensource.org/licenses/MIT. (9) undefined. https://camo.githubusercontent.com/a4426cbe5c21edb002526331c7a8fbfa089e84a550567b02a0d829a98b136ad0/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d79656c6c6f772e737667. (10) undefined. https://img.shields.io/badge/License-MIT-yellow.svg. (11) undefined. https://arxiv.org/pdf/2309.10253.pdf. (12) undefined. https://sherdencooper.github.io/. (13) undefined. https://scholar.google.com/citations?user=Zv_rC0AAAAAJ&. (14) undefined. http://www.dataisland.org/. (15) undefined. http://xinyuxing.org/. (16) undefined. https://geekcon.darknavy.com/2023/china/en/index.html. (17) undefined. https://avatars.githubusercontent.com/u/35443979?v=4. (18) undefined. https://github.com/CriticalPulsar/GPTFuzz/blob/master/README.md?raw=true. (19) undefined. https://docs.github.com/articles/about-issue-and-pull-request-templates. (20) undefined. https://github.com/CriticalPulsar/GPTFuzz/raw/master/README.md. (21) undefined. https://scholar.google.com/citations?user=Zv_rC0AAAAAJ&hl=en.
h
covid-bing-query-gpt4
huggingface.co
Updated Dec 15, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aivin Solatorio (2019). covid-bing-query-gpt4 [Dataset]. https://huggingface.co/datasets/avsolatorio/covid-bing-query-gpt4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2019
Authors
Aivin Solatorio
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Bing x GPT-4 Synthetic Query Dataset

This dataset was used in the paper GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning. Refer to https://arxiv.org/abs/2402.16829 for details. The code for generating the data is available at https://github.com/avsolatorio/GISTEmbed.

Citation

@article{solatorio2024gistembed, title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}… See the full description on the dataset page: https://huggingface.co/datasets/avsolatorio/covid-bing-query-gpt4.
f
Data from: Is artificial intelligence successful in the Turkish neurology...
tandf.figshare.com
pdf
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayse Betul Acar; Ece Yanik; Emine Altin; Ozlem Kurtkaya Kocak (2025). Is artificial intelligence successful in the Turkish neurology board exam? [Dataset]. http://doi.org/10.6084/m9.figshare.28637251.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28637251.v1
Dataset updated
Mar 21, 2025
Dataset provided by
Taylor & Francis
Authors
Ayse Betul Acar; Ece Yanik; Emine Altin; Ozlem Kurtkaya Kocak
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OpenAI declared that GPT-4 performed better in academic and certain specialty areas. Medical licensing exams assess the clinical competence of doctors. We aimed to investigate for the first time howChatGPT will perform in the Turkish Neurology Proficiency Exam. TheGPT-4 version of ChatGPT was used in the study due to the presence ofimage-based questions. The multiple-choice sections of the TurkishNeurology Proficiency Exams conducted by the Turkish NeurologyAssociation (TND) in 2021, 2022 and 2023 were applied to ChatGPT-4.Questions and multiple-choice answers were used in their originalTurkish forms in the official national examination standards. The success rate in all three exams ranges from 79% to 82%. There were common and different mistakes in the two trials. When the incorrect answers were re-evaluated, the correct answers were obtained. This is the first study to investigate the performance of ChatGPT on the real Neurology Proficiency Examination. The success rate was shown to be above GPT-3.5. Furthermore, this study showed that translating questions from the original language into English did not affect the performance of GPT-4 in medical licensing exams, unlike GPT-3.5. It is therefore very important that the information obtained is accurate and verifiable. ChatGPT-4’s ability to find the correct answer with feedback on questions that it initially answered incorrectly may be due to the model’s ability to generate flexible and adaptive answers.These models should be used carefully and consciously, knowing that they will not always give the correct answer.
E
Data from: Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0
live.european-language-grid.eu
binary format
Updated Nov 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/23730
Explore at:
binary formatAvailable download formats
Dataset updated
Nov 14, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Knowledge-Enhanced Winograd Schema Challenge KE-WSC is an upgraded version of the original WSC dataset. It includes the following extensions:

Annotation of semantically or syntactically solvable examples: Some samples from the original dataset can be solved without deeper semantic processing due to the morphologically richness of Slovene. For example, the sentence: “Riba je pojedla črva. Bila je lačna.” requires only the knowledge of gender and does not require any deep semantical processing to infer that the fish was hungry and not the worm. To have a representative set of syntactical samples, we decided to create 197 new examples by modifying the existing ones.

Two-Level Knowledge ontology: We developed a hierarchical scheme to categorize knowledge required to successfully solve a problem. In our analysis, we detected 9 high-level knowledge categories (social knowledge, psychological knowledge, etc.) and 37 lower-level more nuanced knowledge (physical laws/the laws of nature, social roles, causal relationships, etc.).

Semi-Automatic Explanation Generation: Textual explanations were generated using GPT-4, followed by verification and correction by human annotators to ensure accuracy and clarity. For instance, a textual explanation for the sentence “Pokal ne gre v rjav kovček, ker je prevelik.” is “Če je nekaj preveliko, se ne prilega v manjši prostor.”.

Translation to English: The finalized explanations were translated into English using a trained translator, enabling broader applicability.

SPO Triplet Generation: Subject-Predicate-Object triplets were extracted using GPT-4 to highlight key semantic relationships within each example.

The dataset can be used to study knowledge explanation in models and enables knowledge-enhanced machine learning. It can be used to train a classification or generative models. It comprises 601 training samples, 200 validation samples, and 200 test samples, and is released in a tabular TSV format. The README.txt file contains a description of the attributes. The test set labels are private, as the dataset is integrated into the SloBENCH evaluation framework (https://slobench.cjvt.si/). If you use the dataset to train your models, please consider submitting the test set predictions to SloBENCH to get the evaluation score and see how it compares to others.

References: Levesque, H., Davis, E., & Morgenstern, L. (2012, May). The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning.
Data from: Generative artificial intelligence and machine learning methods...
zenodo.org
zip
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). Generative artificial intelligence and machine learning methods to screen social media content [Dataset]. http://doi.org/10.5281/zenodo.14285107
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14285107
Dataset updated
Dec 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TikTok Pregnancy-Vape Data Collection README

Introduction

This repository contains scripts for collecting data from TikTok, including videos, metadata, and visual insights. The data collection process involves several steps, each outlined below along with instructions for setup and usage.

Requirements

Python 3.x

Required Python packages: pyktok, opencv-python, requests

Oracle Cloud Vision API credentials (for visual insights extraction)

Data Collection Steps

1. Metadata Extraction

Search for hashtags on tiktok and the use zeeschuimer extension from https://github.com/digitalmethodsinitiative/zeeschuimer

2. Deduplication

Combine metadata from multiple hashtag pairs into a single dataset.

Use the provided Python script for deduplication based on unique video IDs.

Run combineHashtagMetadata.py

3. Video Download

Utilize the pyktok library to download TikTok videos based on video IDs extracted during metadata collection.

Modify the script to specify the desired video download settings and output directory.

Run pyktokVideoCollection.py

4. Transcript Generation

Extract text overlays and descriptions from videos.

Transcribe files using OpenAI's Whisper.

Set up whisper according to https://github.com/openai/whisper

Run whisperTranscriptGenerator

5. Object and Text Detection

Extract image frames from videos at regular intervals.

Run framesGeneration.py

Analyze frames using Oracle Cloud Vision API to identify objects, text, faces, and other visual elements.

Record attributes such as detected faces, errors, image classification labels, and object detection model versions.

This script in oracleFramefeatureExtractor.py

Usage

Run each script sequentially, following the instructions provided within the scripts.

Customize script parameters as needed to suit specific data collection requirements.

Ensure proper API credentials, permissions and setup Oracle Cloud Vision API according to Oracle documentation.
RheumaMIR
zenodo.org
bin
Updated Nov 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfredo Madrid García; Alfredo Madrid García; Zulema Rosales Rosado; Zulema Rosales Rosado; Dalifer Freites Núñez; Dalifer Freites Núñez; Inés Pérez San Cristobal; Inés Pérez San Cristobal; Esperanza Pato Cour; Esperanza Pato Cour; Chamaida Plasencia Rodríguez; Chamaida Plasencia Rodríguez; Luis Cabeza Osorio; Luis Cabeza Osorio (2023). RheumaMIR [Dataset]. http://doi.org/10.5281/zenodo.10204293
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10204293
Dataset updated
Nov 27, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alfredo Madrid García; Alfredo Madrid García; Zulema Rosales Rosado; Zulema Rosales Rosado; Dalifer Freites Núñez; Dalifer Freites Núñez; Inés Pérez San Cristobal; Inés Pérez San Cristobal; Esperanza Pato Cour; Esperanza Pato Cour; Chamaida Plasencia Rodríguez; Chamaida Plasencia Rodríguez; Luis Cabeza Osorio; Luis Cabeza Osorio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset accompanies the research paper entitled:
Harnessing ChatGPT and GPT-4 for Evaluating the Rheumatology Questions of the Spanish Access Exam to Specialized Medical Training.
Alfredo Madrid-García, Zulema Rosales-Rosado, Dalifer Dayanira Freites-Núñez, Inés Pérez-Sancristobal, Esperanza Pato-Cour, Chamaida Plasencia-Rodríguez, Luis Cabeza-Osorio, Lydia Abasolo Alcazar, Leticia Leon Mateos, Benjamín Fernández-Gutiérrez, Luis Rodríguez-Rodríguez
medRxiv 2023.07.21.23292821; doi: https://doi.org/10.1101/2023.07.21.23292821
The dataset contains 145 rheumatology-related questions extracted from the Spanish MIR exams held between the academic years 2009-2010 to 2022-2023. The questions are evaluated by ChatGPT, GPT-4, BARD and CLAUDE. Six rheumatologists assess the clinical reasoning of ChatGPT and GPT-4.
A more detailed description of the dataset can be found in "Dataset Description" Sheet

DART-Math-Uniform Dataset

paperswithcode.com

Updated Jun 17, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

DART-Math-Uniform Dataset [Dataset]. https://paperswithcode.com/dataset/dart-math-uniform

Explore at:

Dataset updated

Jun 17, 2024

Description

🎯 DART-Math

Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

📝 Paper@arXiv | 🤗 Datasets&Models@HF | 🐱 Code@GitHub

🐦 Thread@X(Twitter) | 🐶 中文博客@知乎 | 📊 Leaderboard@PapersWithCode | 📑 BibTeX

Datasets: DART-Math DART-Math datasets are the state-of-the-art and data-efficient open-source instruction tuning datasets for mathematical reasoning.

DART-Math-Hard contains ~585k mathematical QA pair samples constructed by applying DARS-Prop2Diff to the query set from MATH and GSK8K training sets, achieves SOTA on many challenging mathematical reasoning benchmarks. It introduces a deliberate bias towards hard queries, opposite to vanilla rejection sampling.

Performance produced by DART-Math-Hard is usually but not necessarily slightly better (~1% absolutely) than DART-Math-Uniform, which contains ~591k samples constructed by applying DARS-Uniform.

Comparison between Mathematical Instruction Tuning Datasets Most of previous datasets are constructed with ChatGPT, and many of them are not open-source, especially for ones of the best performance.

Math SFT Dataset	# of Samples	MATH	GSM8K	College	Synthesis Agent(s)	Open-Source
WizardMath	96k	32.3	80.4	23.1	GPT-4	✗
MetaMathQA	395k	29.8	76.5	19.3	GPT-3.5	✓
MMIQC	2294k	37.4	75.4	28.5	GPT-4+GPT-3.5+Human	✓
Orca-Math	200k	--	--	--	GPT-4	✓
Xwin-Math-V1.1	1440k	45.5	84.9	27.6	GPT-4	✗
KPMath-Plus	1576k	46.8	82.1	-–	GPT-4	✗
MathScaleQA	2021k	35.2	74.8	21.8	GPT-3.5+Human	✗
DART-Math-Uniform	591k	43.5	82.6	26.9	DeepSeekMath-7B-RL	✓
DART-Math-Hard	585k	45.5	81.1	29.4	DeepSeekMath-7B-RL	✓

^{MATH and GSM8K are in-domain, while College(Math) is out-of-domain. Performance here are of models fine-tuned from Mistral-7B, except for Xwin-Math-V1.1 based on Llama2-7B. Bold/Italic means the best/second best score here.}

Dataset Construction: DARS - Difficulty-Aware Rejection Sampling Previous works usually synthesize data from proprietary models to augment existing datasets, followed by instruction tuning to achieve top-tier results. However, our analysis of these datasets reveals severe biases towards easy queries, with frequent failures to generate any correct response for the most challenging queries.

Motivated by the observation above, we propose to Difficulty-Aware Rejection Sampling (DARS), to collect more responses for more difficult queries. Specifically, we introduce two strategies to increase the number of correct responses for difficult queries:

1) Uniform, which involves sampling responses for each query until each query accumulates $k_u$ correct responses, where $k_u$ is a preset hyperparameter determined by the desired size of the synthetic dataset; 2) Prop2Diff, where we continue sampling responses until the number of correct responses for each query is proportional to its difficulty score. The most challenging queries will receive $k_p$ responses and kp is a hyperparameter. This method introduces a deliberate bias in the opposite direction to vanilla rejection sampling, towards more difficult queries, inspired by previous works that demonstrate difficult samples can be more effective to enhance model capabilities (Sorscher et al., 2022; Liu et al., 2024b).

See Figure 1 (Right) for examples of DART-Math-Uniform by DARS-Uniform and DART-Math-Hard by DARS-Prop2Diff.

Citation If you find our data, model or code useful for your work, please kindly cite our paper:

latex @article{tong2024dartmath, title={DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving}, author={Yuxuan Tong and Xiwen Zhang and Rui Wang and Ruidong Wu and Junxian He}, year={2024}, eprint={2407.13690}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2407.13690}, }

Data from: Large Language Models as Tools for Molecular Toxicity Prediction:...
acs.figshare.com
zip
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hengzheng Yang; Jian Xiu; Weiqi yan; Kaifeng Liu; Huizi Cui; Zhibang Wang; Qizheng He; Yilin Gao; Weiwei Han (2025). Large Language Models as Tools for Molecular Toxicity Prediction: AI Insights into Cardiotoxicity [Dataset]. http://doi.org/10.1021/acs.jcim.4c01371.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.4c01371.s001
Dataset updated
Feb 21, 2025
Dataset provided by
ACS Publications
Authors
Hengzheng Yang; Jian Xiu; Weiqi yan; Kaifeng Liu; Huizi Cui; Zhibang Wang; Qizheng He; Yilin Gao; Weiwei Han
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The importance of drug toxicity assessment lies in ensuring the safety and efficacy of the pharmaceutical compounds. Predicting toxicity is crucial in drug development and risk assessment. This study compares the performance of GPT-4 and GPT-4o with traditional deep-learning and machine-learning models, WeaveGNN, MorganFP-MLP, SVC, and KNN, in predicting molecular toxicity, focusing on bone, neuro, and reproductive toxicity. The results indicate that GPT-4 is comparable to deep-learning and machine-learning models in certain areas. We utilized GPT-4 combined with molecular docking techniques to study the cardiotoxicity of three specific targets, examining traditional Chinese medicinal materials listed as both food and medicine. This approach aimed to explore the potential cardiotoxicity and mechanisms of action. The study found that components in Black Sesame, Ginger, Perilla, Sichuan Pagoda Tree Fruit, Galangal, Turmeric, Licorice, Chinese Yam, Amla, and Nutmeg exhibit toxic effects on cardiac target Cav1.2. The docking results indicated significant binding affinities, supporting the hypothesis of potential cardiotoxic effects.This research highlights the potential of ChatGPT in predicting molecular properties and its significance in medicinal chemistry, demonstrating its facilitation of a new research paradigm: with a data set, high-accuracy learning models can be generated without requiring computational knowledge or coding skills, making it accessible and easy to use.
P
OCW Dataset
paperswithcode.com
library.toponeai.link
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saeid Naeini; Raeid Saqur; Mozhgan Saeidi; John Giorgi; Babak Taati, OCW Dataset [Dataset]. https://paperswithcode.com/dataset/only-connect-wall-ocw-dataset
Explore at:
Authors
Saeid Naeini; Raeid Saqur; Mozhgan Saeidi; John Giorgi; Babak Taati
Description
The OCW dataset is for evaluating creative problem solving tasks by curating the problems and human performance results from the popular British quiz show Only Connect.

The OCW dataset contains 618 connecting wall puzzles and solutions in total from 15 seasons of the show. Each show episode has two walls.

The dataset has two tasks: Task 1 (Grouping), and Task 2 (Connections) are identical to the quiz-show’s human participant tasks.

Task 1 (Groupings) is evaluated via six metrics: number of solved walls, number of correct groups (max. four per wall), Adjusted Mutual Information (AMI), Adjusted Rand Index (ARI), Fowlkes Mallows Score (FMS), and Wasserstein Distance (WD), normalized to (0, 1) range, between predicted and ground-truth labels.

Task 2 (Connections) is evaluated with three metrics: exact string matching, ROUGE-1 F1, and BERTScore F1.

Baseline results with pre-trained language models and with few-shot In-context Learning (ICL) with LLMs such as GPT-4 are available here:

"Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset" Saeid Alavi Naeini, Raeid Saqur, Mozhgan Saeidi, John Giorgi, Babak Taati. 2023 https://neurips.cc/virtual/2023/poster/73547
P
MATH Dataset
paperswithcode.com
opendatalab.com
+2more
Updated Jan 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Hendrycks; Collin Burns; Saurav Kadavath; Akul Arora; Steven Basart; Eric Tang; Dawn Song; Jacob Steinhardt (2025). MATH Dataset [Dataset]. https://paperswithcode.com/dataset/math
Explore at:
Dataset updated
Jan 10, 2025
Authors
Dan Hendrycks; Collin Burns; Saurav Kadavath; Akul Arora; Steven Basart; Eric Tang; Dawn Song; Jacob Steinhardt
Description
MATH is a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations.

Facebook

Twitter

Click to copy link

Link copied

Cite

alpaca-gpt4 [Dataset]. https://huggingface.co/datasets/vicgalle/alpaca-gpt4

alpaca-gpt4

vicgalle/alpaca-gpt4

Explore at:

175 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 14, 2023

Authors

Victor Gallego

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Dataset Card for "alpaca-gpt4"

This dataset contains English Instruction-Following generated by GPT-4 using Alpaca prompts for fine-tuning LLMs. The dataset was originaly shared in this repository: https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM. This is just a wraper for compatibility with huggingface's datasets library.

  Dataset structure

It contains 52K instruction-following data generated by GPT-4 using the same prompts as in Alpaca. The dataset has… See the full description on the dataset page: https://huggingface.co/datasets/vicgalle/alpaca-gpt4.

Clear search

Close search

Google apps

Main menu

alpaca-gpt4

Estimated water consumption for training GPT-3 2023

Summary of GPT-4 TR review.

Data Sheet 1_Evaluating the strengths and limitations of multimodal...

ChatQA2-Long-SFT-data

ScaleBiO-Train-Open-Orca-1million-gpt-4

Curlie Enhanced with LLM Annotations: Two Datasets for Advancing...

Essay-quetions-auto-grading-arabic

Replication Data for: Large Language Models as a Substitute for Human...

Experimental materials for the work entitled "GPT-Powered Elicitation...

GPTFuzzer Dataset

covid-bing-query-gpt4

Data from: Is artificial intelligence successful in the Turkish neurology...

Data from: Knowledge-Enhanced Winograd Schema Challenge KE-WSC 1.0

Data from: Generative artificial intelligence and machine learning methods...

TikTok Pregnancy-Vape Data Collection README

Introduction

Requirements

Data Collection Steps

1. Metadata Extraction

2. Deduplication

3. Video Download

4. Transcript Generation

5. Object and Text Detection

Usage

RheumaMIR

DART-Math-Uniform Dataset

Data from: Large Language Models as Tools for Molecular Toxicity Prediction:...

OCW Dataset

MATH Dataset

alpaca-gpt4See More Versions

vicgalle/alpaca-gpt4

alpaca-gpt4