94 datasets found

S
Test dataset of ChatGPT in medical field
scidb.cn
Updated Mar 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
robin shen (2023). Test dataset of ChatGPT in medical field [Dataset]. http://doi.org/10.57760/sciencedb.o00130.00001
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.o00130.00001
Dataset updated
Mar 3, 2023
Dataset provided by
Science Data Bank
Authors
robin shen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.
ChatGPT Classification Dataset
kaggle.com
zip
Updated Sep 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahdi (2023). ChatGPT Classification Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimaktabdar/chatgpt-classification-dataset
Explore at:
zip(718710 bytes)Available download formats
Dataset updated
Sep 7, 2023
Authors
Mahdi
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).

This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).

We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:

Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.

https://www.techrxiv.org/users/692552/articles/682641/master/file/data/ChatGPT_generated_Content_Detection/ChatGPT_generated_Content_Detection.pdf
m
Data from: ChatGPT as an education and learning tool for engineering,...
data.mendeley.com
Updated May 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RAVINDRA BHARDWAJ (2024). ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India [Dataset]. http://doi.org/10.17632/995zwcz5yt.1
Explore at:
Unique identifier
https://doi.org/10.17632/995zwcz5yt.1
Dataset updated
May 14, 2024
Authors
RAVINDRA BHARDWAJ
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
This is the raw data that is used in the publication: ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India.
Z
A dataset to investigate ChatGPT for enhancing Students' Learning Experience...
data.niaid.nih.gov
zenodo.org
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schicchi, Daniele; Taibi, Davide (2024). A dataset to investigate ChatGPT for enhancing Students' Learning Experience via Concept Maps [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12076680
Explore at:
Dataset updated
Jun 19, 2024
Dataset provided by
Institute for Educational Technology, National Research Council of Italy
Authors
Schicchi, Daniele; Taibi, Davide
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset was compiled to examine the use of ChatGPT 3.5 in educational settings, particularly for creating and personalizing concept maps. The data has been organized into three folders: Maps, Texts, and Questionnaires. The Maps folder contains the graphical representation of the concept maps and the PlanUML code for drawing them in Italian and English. The Texts folder contains the source text used as input for the map's creation The Questionnaires folder includes the students' responses to the three administered questionnaires.
Estimated water consumption for training GPT-3 2023
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jul 2023
Area covered
Worldwide
Description
GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.
f
Data Sheet 2_Large language models generating synthetic clinical datasets: a...
frontiersin.figshare.com
figshare.com
xlsx
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2025.1533508.s002
Dataset updated
Feb 5, 2025
Dataset provided by
Frontiers
Authors
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
Wiki-AI-generated-dataset
kaggle.com
zip
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arjun Prakash rao (2023). Wiki-AI-generated-dataset [Dataset]. https://www.kaggle.com/datasets/arjunprakashrao/wiki-ai-generated-dataset
Explore at:
zip(128223556 bytes)Available download formats
Dataset updated
Oct 10, 2023
Authors
Arjun Prakash rao
Description
This is not my dataset, credit goes to

{aaditya_bhat_2023, author = { {Aaditya Bhat} }, title = { GPT-wiki-intro (Revision 0e458f5) }, year = 2023, url = { https://huggingface.co/datasets/aadityaubhat/GPT-wiki-intro }, doi = { 10.57967/hf/0326 }, publisher = { Hugging Face } }
h
chinese_chatgpt_corpus
huggingface.co
Updated Apr 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zeye sun (2023). chinese_chatgpt_corpus [Dataset]. https://huggingface.co/datasets/sunzeyeah/chinese_chatgpt_corpus
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 29, 2023
Authors
zeye sun
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for chinese_chatgpt_corpus

Dataset Summary

This repo collects chinese corpus for Supervised Finetuning (SFT) and Reinforcement Learning From Human Feedback (RLHF).

Supported Tasks and Leaderboards

More Information Needed

Languages

Chinese

Dataset Structure Data Instances train_data_external_v1.jsonl

Size of downloaded dataset files: 5.04 GB Size of the generated dataset: 0 GB Total amount of disk used:… See the full description on the dataset page: https://huggingface.co/datasets/sunzeyeah/chinese_chatgpt_corpus.
Data from: Enhancing self-directed learning with custom GPT AI facilitation...
tandf.figshare.com
docx
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang Shalong; Zuo Yi; Zou Bin; Liu Ganglei; Zhou Jinyu; Zheng Yanwen; Zhang Zequn; Yuan Lianwen; Ren Feng (2025). Enhancing self-directed learning with custom GPT AI facilitation among medical students: A randomized controlled trial [Dataset]. http://doi.org/10.6084/m9.figshare.29039163.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29039163.v1
Dataset updated
Oct 8, 2025
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Wang Shalong; Zuo Yi; Zou Bin; Liu Ganglei; Zhou Jinyu; Zheng Yanwen; Zhang Zequn; Yuan Lianwen; Ren Feng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study aims to assess the impact of LearnGuide, a specialized ChatGPT tool designed to support self-directed learning among medical students. In this 14-week randomized controlled trial (ClinicalTrials.gov NCT06276049), 103 medical students were assigned to either an intervention group, which received 12 weeks of problem-based training with LearnGuide support, or a control group, which received identical training without AI assistance. Primary and secondary outcomes, including Self-Directed Learning Scale scores at 6 and 12 weeks, Cornell Critical Thinking Test Level Z scores, and Global Flow Scores, were evaluated with a 14-week follow-up. Mann-Whitney U tests were used for statistical comparisons between the groups. At 6 weeks, the intervention group showed a marginally higher median Self-Directed Learning Scale score, which further improved by 12 weeks (4.15 [95% CI, 0.82 to 7.48]; p = 0.01) and was sustained at the 14-week follow-up. Additionally, this group demonstrated notable improvements in the Cornell Critical Thinking Test Score at 12 weeks (7.11 [95% CI, 4.50 to 9.72]; p
f
Data Sheet 1_One year in the classroom with ChatGPT: empirical insights and...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li, Tian; Guo, Feng; Cunningham, Christopher J. L. (2025). Data Sheet 1_One year in the classroom with ChatGPT: empirical insights and transformative impacts.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002075136
Explore at:
Dataset updated
May 27, 2025
Authors
Li, Tian; Guo, Feng; Cunningham, Christopher J. L.
Description
Generative Artificial Intelligence (GAI), such as OpenAI’s ChatGPT, has rapidly emerged as a transformative tool in higher education, offering opportunities to enhance teaching and learning. This paper describes the design and implementation of ChatGPT-integrated curriculum activities, featuring coding learning in psychology and conceptual discussions in physics, and presents the findings of a year-long experimental study in both types of classrooms. Our findings suggest that students generally found ChatGPT easy to use and beneficial to their learning, reporting improved confidence, motivation, and engagement. However, its ability to address individual needs or replace instructors was viewed less favorably. Comparative analyses showed that coding activities in psychology led to higher levels of activity satisfaction and perceived usefulness of ChatGPT compared to the more abstract discussion activities in physics. While graduate students were more enthusiastic about using ChatGPT for skill acquisition than undergraduates, demographic factors such as gender, race, and first-generation college status showed no significant influence on such perceptions. Meanwhile, instructors’ reflections emphasize the importance of thoughtful integration, technical support, and pedagogical balance to maximize GAI’s potential while mitigating its limitations. Recommendations for integrating GAI into teaching practices and future research directions are discussed, contributing to the evolving discourse on GAI’s role in transforming modern classrooms.
DAIGT Proper Train Dataset
kaggle.com
zip
Updated Nov 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darek Kłeczek (2023). DAIGT Proper Train Dataset [Dataset]. https://www.kaggle.com/datasets/thedrcat/daigt-proper-train-dataset
Explore at:
zip(124388618 bytes)Available download formats
Dataset updated
Nov 5, 2023
Authors
Darek Kłeczek
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Version 2 updated on 11/2/2023:

Since there is no proper train dataset for LLM - Detect AI Generated Text competition, I decided to create one.

Ingredients (please upvote the included datasets!): - Text generated with ChatGPT by MOTH (https://www.kaggle.com/datasets/alejopaullier/daigt-external-dataset) - Persuade corpus contributed by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/persaude-corpus-2/) - Text generated with Llama-70b and Falcon180b by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/daigt-data-llama-70b-and-falcon180b) - Text generated with ChatGPT by Radek (https://www.kaggle.com/datasets/radek1/llm-generated-essays) - Official train essays - Essays I generated with various LLMs

New version includes: - EssayID if available - Generation prompt if available - Random 10 fold split stratified by source dataset

Version 3 updated on 11/3/2023: - Additional 2400+ AI examples generated with Mistral 7B instruct and a new prompt (let's see how it works!)

Version 4 updated on 11/5/2023: - Additional 2000 Claude essays generated by @darraghdog (https://www.kaggle.com/datasets/darraghdog/hello-claude-1000-essays-from-anthropic)
m
The Impact of AI and ChatGPT on Bangladeshi University Students
data.mendeley.com
Updated Jan 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Jhirul Islam (2025). The Impact of AI and ChatGPT on Bangladeshi University Students [Dataset]. http://doi.org/10.17632/zykphpvbr7.2
Explore at:
Unique identifier
https://doi.org/10.17632/zykphpvbr7.2
Dataset updated
Jan 6, 2025
Authors
Md Jhirul Islam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh
Description
The data set records the perceptions of Bangladeshi university students on the influence that AI tools, especially ChatGPT, have on their academic practices, learning experiences, and problem-solving abilities. The varying role of AI in education, which covers common usage statistics, what AI does to our creative abilities, its impact on our learning, and whether it could invade our privacy. This dataset reveals perspective on how AI tools are changing education in the country and offering valuable information for researchers, educators, policymakers, to understand trends, challenges, and opportunities in the adoption of AI in the academic contex.

Methodology Data Collection Method: Online survey using google from Participants: A total of 3,512 students from various Bangladeshi universities participated. Survey Questions:The survey included questions on demographic information, frequency of AI tool usage, perceived benefits, concerns regarding privacy, and impacts on creativity and learning.

Sampling Technique: Random sampling of university students Data Collection Period: June 2024 to December 2024

Privacy Compliance This dataset has been anonymized to remove any personally identifiable information (PII). It adheres to relevant privacy regulations to ensure the confidentiality of participants.

For further inquiries, please contact: Name: Md Jhirul Islam, Daffodil International University Email: jhirul15-4063@diu.edu.bd Phone: 01316317573
Top web domains cited by LLMs 2025
statista.com
Updated Jun 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Top web domains cited by LLMs 2025 [Dataset]. https://www.statista.com/statistics/1620335/top-web-domains-cited-by-llms/
Explore at:
Dataset updated
Jun 29, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2025
Area covered
Worldwide
Description
A June 2025 study found that ****** was the most frequently cited web domain by large language models (LLMs). The platform was referenced in approximately ** percent of the analyzed cases, likely due to the content licensing agreement between Google and Reddit in early 2024 for the purpose of AI models training. ********* ranked second, being mentioned in roughly ** percent of the times, while ****** and ******* were mentioned ** percent.
Expert Evaluations of ChatGPT Responses to Adobe Photoshop User Queries...
zenodo.org
csv
Updated Sep 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Necati VARDAR; Necati VARDAR; ÇAĞRI GÜMÜŞ; ÇAĞRI GÜMÜŞ (2025). Expert Evaluations of ChatGPT Responses to Adobe Photoshop User Queries (Original Data in Turkish) [Dataset]. http://doi.org/10.5281/zenodo.17152576
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17152576
Dataset updated
Sep 18, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Necati VARDAR; Necati VARDAR; ÇAĞRI GÜMÜŞ; ÇAĞRI GÜMÜŞ
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the raw survey responses from three experts who evaluated ChatGPT’s answers to Adobe Photoshop user queries. The survey was conducted in Turkish via Google Forms. An English summary of the dataset is provided in the supplementary materials of the related article.
f
Data Sheet 1_The perceived impact of artificial intelligence on academic...
figshare.com
docx
Updated Oct 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariana Dogaru; Olivia Pisică; Cosmin-Ștefan Popa; Andrei-Adrian Răgman; Ilinca-Roxana Tololoi (2025). Data Sheet 1_The perceived impact of artificial intelligence on academic learning.docx [Dataset]. http://doi.org/10.3389/frai.2025.1611183.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2025.1611183.s001
Dataset updated
Oct 3, 2025
Dataset provided by
Frontiers
Authors
Mariana Dogaru; Olivia Pisică; Cosmin-Ștefan Popa; Andrei-Adrian Răgman; Ilinca-Roxana Tololoi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Generative artificial intelligence, such as ChatGPT, is transforming higher education by enabling personalized learning, while raising ethical challenges. This study explores how technical university students perceive and leverage ChatGPT in academic tasks, focusing on motivation, learning outcomes, and ethical awareness. Using the Technology Acceptance Model and Self-Determination Theory, the research surveyed 84 students from a technical university via a 5-point Likert-scale questionnaire. Six salient dimensions of student engagement with ChatGPT emerged: perceived usefulness for problem solving, learning retention and skill acquisition, structured interaction with familiar content, consultation on unfamiliar topics, preference for conciseness, and confidence in the accuracy of AI responses. Students who perceived ChatGPT as a valuable resource for addressing academic problems reported enhanced motivation and competence, and frequent structured interaction was linked to the practice of verifying uncertain information, indicating the emergence of AI literacy. However, extensive reliance was correlated with dependence and limited citation practices, revealing risks to academic integrity. By examining ChatGPT’s role in STEM education, this study substantiates the relevance of AI literacy training and institutional policies to ensure responsible use. The findings offer practical insights for educators to integrate AI tools effectively while fostering critical thinking and academic integrity in technology-driven learning environments.
ChatGPT Evaluation Dataset v.2.0
zenodo.org
data.niaid.nih.gov
Updated Oct 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Kocoń; Jan Kocoń; Przemysław Kazienko; Przemysław Kazienko (2024). ChatGPT Evaluation Dataset v.2.0 [Dataset]. http://doi.org/10.5281/zenodo.14019715
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14019715
Dataset updated
Oct 31, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jan Kocoń; Jan Kocoń; Przemysław Kazienko; Przemysław Kazienko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 2023
Description
We tested ChatGPT on 25 tasks focusing on solving common NLP problems and requiring analytical reasoning. These tasks include (1) a relatively simple binary classification of texts like spam, humor, sarcasm, aggression detection, or grammatical correctness of the text; (2) a more complex multiclass and multi-label classification of texts such as sentiment analysis, emotion recognition; (3) reasoning with the personal context, i.e., personalized versions of the problems that make use of additional information about text perception of a given user (user’s examples provided to ChatGPT); (4) semantic annotation and acceptance of the text going towards natural language understanding (NLU) like word sense disambiguation (WSD), and (5) answering questions based on the input text. More information in the paper: https://www.sciencedirect.com/science/article/pii/S156625352300177X
Large Language Model content safety considerations text data
nexdata.ai
m.nexdata.ai
Updated Oct 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2023). Large Language Model content safety considerations text data [Dataset]. https://www.nexdata.ai/datasets/llm/1349
Explore at:
Dataset updated
Oct 3, 2023
Dataset authored and provided by
Nexdata
Variables measured
Language, Data size, Data content, Storage format, Collecting type, Collecting method
Description
Large Language Model content safety considerations text data, about 570,000 in total, this dataset can be used for tasks such as LLM training, chatgpt
Z
Data from: ChatGPT's performance in dentistry and allergy-immunology...
data.niaid.nih.gov
Updated Sep 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eggmann, Florin (2023). ChatGPT's performance in dentistry and allergy-immunology assessments: a comparative study [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8331146
Explore at:
Dataset updated
Sep 30, 2023
Dataset provided by
Eggmann, Florin
Fuchs, Alexander
Weiger, Roland
Trachsel, Tina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data on ChatGPT 3's and ChatGPT 4's performance on self-assessment questions for dentistry (SFLEDM) and allergy and clinical immunology (EEAACI), sourced from the University of Bern’s Institute for Medical Education platform.
ChatGPT for MLM
kaggle.com
zip
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evgenii Pishchik (2023). ChatGPT for MLM [Dataset]. https://www.kaggle.com/datasets/pe4eniks/chatgpt-for-mlm/code
Explore at:
zip(20267 bytes)Available download formats
Dataset updated
Mar 15, 2023
Authors
Evgenii Pishchik
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description.

This is a small dataset of synthetically generated samples for the MLM task using ChatGPT.

For data construction I use these requests. All requests were generated consistently and within one chat.

140 queries about general CV. 40 queries about datasets for CV. 40 queries about articles in CV. 20 queries about transformers in CV. 20 queries about training pipelines in CV. 20 queries about libraries for CV. 20 queries about hardware for CV.

Training.

You have a prompt with one [MASK] token that you need to predict and correct word at this position.

Data structure.

data.csv - main file with all data.

synthetic.txt - raw outputs from ChatGPT.

preprocess.py - convertation from raw to structured data.
Few, But More Orgnized data for train and test!
kaggle.com
zip
Updated Nov 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reza JafariRaviz (2023). Few, But More Orgnized data for train and test! [Dataset]. https://www.kaggle.com/datasets/rezajafariraviz/few-but-more-orgnized-data-for-train-and-test
Explore at:
zip(2185756 bytes)Available download formats
Dataset updated
Nov 26, 2023
Authors
Reza JafariRaviz
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The data has been created for use in an AI detection competition. Two prompts are passed to chatbots to elicit responses. The chatbots used are Bing, Bard, and ChatGPT. The data is also labeled to indicate whether the prompt includes the source text or not.

Facebook

Twitter

Click to copy link

Link copied

Cite

robin shen (2023). Test dataset of ChatGPT in medical field [Dataset]. http://doi.org/10.57760/sciencedb.o00130.00001

Test dataset of ChatGPT in medical field

Explore at:

315 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.57760/sciencedb.o00130.00001

Dataset updated

Mar 3, 2023

Dataset provided by

Science Data Bank

Authors

robin shen

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.

Clear search

Close search

Google apps

Main menu

Test dataset of ChatGPT in medical field

ChatGPT Classification Dataset

Data from: ChatGPT as an education and learning tool for engineering,...

A dataset to investigate ChatGPT for enhancing Students' Learning Experience...

Estimated water consumption for training GPT-3 2023

Data Sheet 2_Large language models generating synthetic clinical datasets: a...

Wiki-AI-generated-dataset

chinese_chatgpt_corpus

Data from: Enhancing self-directed learning with custom GPT AI facilitation...

Data Sheet 1_One year in the classroom with ChatGPT: empirical insights and...

DAIGT Proper Train Dataset

The Impact of AI and ChatGPT on Bangladeshi University Students

Top web domains cited by LLMs 2025

Expert Evaluations of ChatGPT Responses to Adobe Photoshop User Queries...

Data Sheet 1_The perceived impact of artificial intelligence on academic...

ChatGPT Evaluation Dataset v.2.0

Large Language Model content safety considerations text data

Data from: ChatGPT's performance in dentistry and allergy-immunology...

ChatGPT for MLM

Description.

Training.

Data structure.

Few, But More Orgnized data for train and test!

Test dataset of ChatGPT in medical field