94 datasets found
  1. S

    Test dataset of ChatGPT in medical field

    • scidb.cn
    Updated Mar 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    robin shen (2023). Test dataset of ChatGPT in medical field [Dataset]. http://doi.org/10.57760/sciencedb.o00130.00001
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2023
    Dataset provided by
    Science Data Bank
    Authors
    robin shen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.

  2. ChatGPT Classification Dataset

    • kaggle.com
    zip
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi (2023). ChatGPT Classification Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimaktabdar/chatgpt-classification-dataset
    Explore at:
    zip(718710 bytes)Available download formats
    Dataset updated
    Sep 7, 2023
    Authors
    Mahdi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).

    This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).

    We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:

    Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.

    https://www.techrxiv.org/users/692552/articles/682641/master/file/data/ChatGPT_generated_Content_Detection/ChatGPT_generated_Content_Detection.pdf

  3. m

    Data from: ChatGPT as an education and learning tool for engineering,...

    • data.mendeley.com
    Updated May 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RAVINDRA BHARDWAJ (2024). ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India [Dataset]. http://doi.org/10.17632/995zwcz5yt.1
    Explore at:
    Dataset updated
    May 14, 2024
    Authors
    RAVINDRA BHARDWAJ
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    This is the raw data that is used in the publication: ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India.

  4. Z

    A dataset to investigate ChatGPT for enhancing Students' Learning Experience...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schicchi, Daniele; Taibi, Davide (2024). A dataset to investigate ChatGPT for enhancing Students' Learning Experience via Concept Maps [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12076680
    Explore at:
    Dataset updated
    Jun 19, 2024
    Dataset provided by
    Institute for Educational Technology, National Research Council of Italy
    Authors
    Schicchi, Daniele; Taibi, Davide
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset was compiled to examine the use of ChatGPT 3.5 in educational settings, particularly for creating and personalizing concept maps. The data has been organized into three folders: Maps, Texts, and Questionnaires. The Maps folder contains the graphical representation of the concept maps and the PlanUML code for drawing them in Italian and English. The Texts folder contains the source text used as input for the map's creation The Questionnaires folder includes the students' responses to the three administered questionnaires.

  5. Estimated water consumption for training GPT-3 2023

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Estimated water consumption for training GPT-3 2023 [Dataset]. https://www.statista.com/statistics/1536925/gpt-3-estimated-water-consumption-training/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2023
    Area covered
    Worldwide
    Description

    GPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.

  6. f

    Data Sheet 2_Large language models generating synthetic clinical datasets: a...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

  7. Wiki-AI-generated-dataset

    • kaggle.com
    zip
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arjun Prakash rao (2023). Wiki-AI-generated-dataset [Dataset]. https://www.kaggle.com/datasets/arjunprakashrao/wiki-ai-generated-dataset
    Explore at:
    zip(128223556 bytes)Available download formats
    Dataset updated
    Oct 10, 2023
    Authors
    Arjun Prakash rao
    Description

    This is not my dataset, credit goes to

    {aaditya_bhat_2023, author = { {Aaditya Bhat} }, title = { GPT-wiki-intro (Revision 0e458f5) }, year = 2023, url = { https://huggingface.co/datasets/aadityaubhat/GPT-wiki-intro }, doi = { 10.57967/hf/0326 }, publisher = { Hugging Face } }

  8. h

    chinese_chatgpt_corpus

    • huggingface.co
    Updated Apr 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zeye sun (2023). chinese_chatgpt_corpus [Dataset]. https://huggingface.co/datasets/sunzeyeah/chinese_chatgpt_corpus
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2023
    Authors
    zeye sun
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for chinese_chatgpt_corpus

      Dataset Summary
    

    This repo collects chinese corpus for Supervised Finetuning (SFT) and Reinforcement Learning From Human Feedback (RLHF).

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    Chinese

      Dataset Structure
    
    
    
    
    
      Data Instances
    
    
    
    
    
      train_data_external_v1.jsonl
    

    Size of downloaded dataset files: 5.04 GB Size of the generated dataset: 0 GB Total amount of disk used:… See the full description on the dataset page: https://huggingface.co/datasets/sunzeyeah/chinese_chatgpt_corpus.

  9. Data from: Enhancing self-directed learning with custom GPT AI facilitation...

    • tandf.figshare.com
    docx
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang Shalong; Zuo Yi; Zou Bin; Liu Ganglei; Zhou Jinyu; Zheng Yanwen; Zhang Zequn; Yuan Lianwen; Ren Feng (2025). Enhancing self-directed learning with custom GPT AI facilitation among medical students: A randomized controlled trial [Dataset]. http://doi.org/10.6084/m9.figshare.29039163.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Wang Shalong; Zuo Yi; Zou Bin; Liu Ganglei; Zhou Jinyu; Zheng Yanwen; Zhang Zequn; Yuan Lianwen; Ren Feng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study aims to assess the impact of LearnGuide, a specialized ChatGPT tool designed to support self-directed learning among medical students. In this 14-week randomized controlled trial (ClinicalTrials.gov NCT06276049), 103 medical students were assigned to either an intervention group, which received 12 weeks of problem-based training with LearnGuide support, or a control group, which received identical training without AI assistance. Primary and secondary outcomes, including Self-Directed Learning Scale scores at 6 and 12 weeks, Cornell Critical Thinking Test Level Z scores, and Global Flow Scores, were evaluated with a 14-week follow-up. Mann-Whitney U tests were used for statistical comparisons between the groups. At 6 weeks, the intervention group showed a marginally higher median Self-Directed Learning Scale score, which further improved by 12 weeks (4.15 [95% CI, 0.82 to 7.48]; p = 0.01) and was sustained at the 14-week follow-up. Additionally, this group demonstrated notable improvements in the Cornell Critical Thinking Test Score at 12 weeks (7.11 [95% CI, 4.50 to 9.72]; p 

  10. f

    Data Sheet 1_One year in the classroom with ChatGPT: empirical insights and...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li, Tian; Guo, Feng; Cunningham, Christopher J. L. (2025). Data Sheet 1_One year in the classroom with ChatGPT: empirical insights and transformative impacts.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002075136
    Explore at:
    Dataset updated
    May 27, 2025
    Authors
    Li, Tian; Guo, Feng; Cunningham, Christopher J. L.
    Description

    Generative Artificial Intelligence (GAI), such as OpenAI’s ChatGPT, has rapidly emerged as a transformative tool in higher education, offering opportunities to enhance teaching and learning. This paper describes the design and implementation of ChatGPT-integrated curriculum activities, featuring coding learning in psychology and conceptual discussions in physics, and presents the findings of a year-long experimental study in both types of classrooms. Our findings suggest that students generally found ChatGPT easy to use and beneficial to their learning, reporting improved confidence, motivation, and engagement. However, its ability to address individual needs or replace instructors was viewed less favorably. Comparative analyses showed that coding activities in psychology led to higher levels of activity satisfaction and perceived usefulness of ChatGPT compared to the more abstract discussion activities in physics. While graduate students were more enthusiastic about using ChatGPT for skill acquisition than undergraduates, demographic factors such as gender, race, and first-generation college status showed no significant influence on such perceptions. Meanwhile, instructors’ reflections emphasize the importance of thoughtful integration, technical support, and pedagogical balance to maximize GAI’s potential while mitigating its limitations. Recommendations for integrating GAI into teaching practices and future research directions are discussed, contributing to the evolving discourse on GAI’s role in transforming modern classrooms.

  11. DAIGT Proper Train Dataset

    • kaggle.com
    zip
    Updated Nov 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darek Kłeczek (2023). DAIGT Proper Train Dataset [Dataset]. https://www.kaggle.com/datasets/thedrcat/daigt-proper-train-dataset
    Explore at:
    zip(124388618 bytes)Available download formats
    Dataset updated
    Nov 5, 2023
    Authors
    Darek Kłeczek
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Version 2 updated on 11/2/2023:

    Since there is no proper train dataset for LLM - Detect AI Generated Text competition, I decided to create one.

    Ingredients (please upvote the included datasets!): - Text generated with ChatGPT by MOTH (https://www.kaggle.com/datasets/alejopaullier/daigt-external-dataset) - Persuade corpus contributed by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/persaude-corpus-2/) - Text generated with Llama-70b and Falcon180b by Nicholas Broad (https://www.kaggle.com/datasets/nbroad/daigt-data-llama-70b-and-falcon180b) - Text generated with ChatGPT by Radek (https://www.kaggle.com/datasets/radek1/llm-generated-essays) - Official train essays - Essays I generated with various LLMs

    New version includes: - EssayID if available - Generation prompt if available - Random 10 fold split stratified by source dataset

    Version 3 updated on 11/3/2023: - Additional 2400+ AI examples generated with Mistral 7B instruct and a new prompt (let's see how it works!)

    Version 4 updated on 11/5/2023: - Additional 2000 Claude essays generated by @darraghdog (https://www.kaggle.com/datasets/darraghdog/hello-claude-1000-essays-from-anthropic)

  12. m

    The Impact of AI and ChatGPT on Bangladeshi University Students

    • data.mendeley.com
    Updated Jan 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Jhirul Islam (2025). The Impact of AI and ChatGPT on Bangladeshi University Students [Dataset]. http://doi.org/10.17632/zykphpvbr7.2
    Explore at:
    Dataset updated
    Jan 6, 2025
    Authors
    Md Jhirul Islam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh
    Description

    The data set records the perceptions of Bangladeshi university students on the influence that AI tools, especially ChatGPT, have on their academic practices, learning experiences, and problem-solving abilities. The varying role of AI in education, which covers common usage statistics, what AI does to our creative abilities, its impact on our learning, and whether it could invade our privacy. This dataset reveals perspective on how AI tools are changing education in the country and offering valuable information for researchers, educators, policymakers, to understand trends, challenges, and opportunities in the adoption of AI in the academic contex.

    Methodology Data Collection Method: Online survey using google from Participants: A total of 3,512 students from various Bangladeshi universities participated. Survey Questions:The survey included questions on demographic information, frequency of AI tool usage, perceived benefits, concerns regarding privacy, and impacts on creativity and learning.

    Sampling Technique: Random sampling of university students Data Collection Period: June 2024 to December 2024

    Privacy Compliance This dataset has been anonymized to remove any personally identifiable information (PII). It adheres to relevant privacy regulations to ensure the confidentiality of participants.

    For further inquiries, please contact: Name: Md Jhirul Islam, Daffodil International University Email: jhirul15-4063@diu.edu.bd Phone: 01316317573

  13. Top web domains cited by LLMs 2025

    • statista.com
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Top web domains cited by LLMs 2025 [Dataset]. https://www.statista.com/statistics/1620335/top-web-domains-cited-by-llms/
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2025
    Area covered
    Worldwide
    Description

    A June 2025 study found that ****** was the most frequently cited web domain by large language models (LLMs). The platform was referenced in approximately ** percent of the analyzed cases, likely due to the content licensing agreement between Google and Reddit in early 2024 for the purpose of AI models training. ********* ranked second, being mentioned in roughly ** percent of the times, while ****** and ******* were mentioned ** percent.

  14. Expert Evaluations of ChatGPT Responses to Adobe Photoshop User Queries...

    • zenodo.org
    csv
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Necati VARDAR; Necati VARDAR; ÇAĞRI GÜMÜŞ; ÇAĞRI GÜMÜŞ (2025). Expert Evaluations of ChatGPT Responses to Adobe Photoshop User Queries (Original Data in Turkish) [Dataset]. http://doi.org/10.5281/zenodo.17152576
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Necati VARDAR; Necati VARDAR; ÇAĞRI GÜMÜŞ; ÇAĞRI GÜMÜŞ
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the raw survey responses from three experts who evaluated ChatGPT’s answers to Adobe Photoshop user queries. The survey was conducted in Turkish via Google Forms. An English summary of the dataset is provided in the supplementary materials of the related article.

  15. f

    Data Sheet 1_The perceived impact of artificial intelligence on academic...

    • figshare.com
    docx
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariana Dogaru; Olivia Pisică; Cosmin-Ștefan Popa; Andrei-Adrian Răgman; Ilinca-Roxana Tololoi (2025). Data Sheet 1_The perceived impact of artificial intelligence on academic learning.docx [Dataset]. http://doi.org/10.3389/frai.2025.1611183.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Oct 3, 2025
    Dataset provided by
    Frontiers
    Authors
    Mariana Dogaru; Olivia Pisică; Cosmin-Ștefan Popa; Andrei-Adrian Răgman; Ilinca-Roxana Tololoi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generative artificial intelligence, such as ChatGPT, is transforming higher education by enabling personalized learning, while raising ethical challenges. This study explores how technical university students perceive and leverage ChatGPT in academic tasks, focusing on motivation, learning outcomes, and ethical awareness. Using the Technology Acceptance Model and Self-Determination Theory, the research surveyed 84 students from a technical university via a 5-point Likert-scale questionnaire. Six salient dimensions of student engagement with ChatGPT emerged: perceived usefulness for problem solving, learning retention and skill acquisition, structured interaction with familiar content, consultation on unfamiliar topics, preference for conciseness, and confidence in the accuracy of AI responses. Students who perceived ChatGPT as a valuable resource for addressing academic problems reported enhanced motivation and competence, and frequent structured interaction was linked to the practice of verifying uncertain information, indicating the emergence of AI literacy. However, extensive reliance was correlated with dependence and limited citation practices, revealing risks to academic integrity. By examining ChatGPT’s role in STEM education, this study substantiates the relevance of AI literacy training and institutional policies to ensure responsible use. The findings offer practical insights for educators to integrate AI tools effectively while fostering critical thinking and academic integrity in technology-driven learning environments.

  16. ChatGPT Evaluation Dataset v.2.0

    • zenodo.org
    • data.niaid.nih.gov
    Updated Oct 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Kocoń; Jan Kocoń; Przemysław Kazienko; Przemysław Kazienko (2024). ChatGPT Evaluation Dataset v.2.0 [Dataset]. http://doi.org/10.5281/zenodo.14019715
    Explore at:
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jan Kocoń; Jan Kocoń; Przemysław Kazienko; Przemysław Kazienko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 2023
    Description

    We tested ChatGPT on 25 tasks focusing on solving common NLP problems and requiring analytical reasoning. These tasks include (1) a relatively simple binary classification of texts like spam, humor, sarcasm, aggression detection, or grammatical correctness of the text; (2) a more complex multiclass and multi-label classification of texts such as sentiment analysis, emotion recognition; (3) reasoning with the personal context, i.e., personalized versions of the problems that make use of additional information about text perception of a given user (user’s examples provided to ChatGPT); (4) semantic annotation and acceptance of the text going towards natural language understanding (NLU) like word sense disambiguation (WSD), and (5) answering questions based on the input text. More information in the paper: https://www.sciencedirect.com/science/article/pii/S156625352300177X

  17. Large Language Model content safety considerations text data

    • nexdata.ai
    • m.nexdata.ai
    Updated Oct 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Large Language Model content safety considerations text data [Dataset]. https://www.nexdata.ai/datasets/llm/1349
    Explore at:
    Dataset updated
    Oct 3, 2023
    Dataset authored and provided by
    Nexdata
    Variables measured
    Language, Data size, Data content, Storage format, Collecting type, Collecting method
    Description

    Large Language Model content safety considerations text data, about 570,000 in total, this dataset can be used for tasks such as LLM training, chatgpt

  18. Z

    Data from: ChatGPT's performance in dentistry and allergy-immunology...

    • data.niaid.nih.gov
    Updated Sep 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eggmann, Florin (2023). ChatGPT's performance in dentistry and allergy-immunology assessments: a comparative study [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8331146
    Explore at:
    Dataset updated
    Sep 30, 2023
    Dataset provided by
    Eggmann, Florin
    Fuchs, Alexander
    Weiger, Roland
    Trachsel, Tina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data on ChatGPT 3's and ChatGPT 4's performance on self-assessment questions for dentistry (SFLEDM) and allergy and clinical immunology (EEAACI), sourced from the University of Bern’s Institute for Medical Education platform.

  19. ChatGPT for MLM

    • kaggle.com
    zip
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evgenii Pishchik (2023). ChatGPT for MLM [Dataset]. https://www.kaggle.com/datasets/pe4eniks/chatgpt-for-mlm/code
    Explore at:
    zip(20267 bytes)Available download formats
    Dataset updated
    Mar 15, 2023
    Authors
    Evgenii Pishchik
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description.

    This is a small dataset of synthetically generated samples for the MLM task using ChatGPT.

    For data construction I use these requests. All requests were generated consistently and within one chat.

    140 queries about general CV.
    40 queries about datasets for CV.
    40 queries about articles in CV.
    20 queries about transformers in CV.
    20 queries about training pipelines in CV.
    20 queries about libraries for CV.
    20 queries about hardware for CV.
    

    Training.

    You have a prompt with one [MASK] token that you need to predict and correct word at this position.

    Data structure.

    • data.csv - main file with all data.
    • synthetic.txt - raw outputs from ChatGPT.
    • preprocess.py - convertation from raw to structured data.
  20. Few, But More Orgnized data for train and test!

    • kaggle.com
    zip
    Updated Nov 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reza JafariRaviz (2023). Few, But More Orgnized data for train and test! [Dataset]. https://www.kaggle.com/datasets/rezajafariraviz/few-but-more-orgnized-data-for-train-and-test
    Explore at:
    zip(2185756 bytes)Available download formats
    Dataset updated
    Nov 26, 2023
    Authors
    Reza JafariRaviz
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The data has been created for use in an AI detection competition. Two prompts are passed to chatbots to elicit responses. The chatbots used are Bing, Bard, and ChatGPT. The data is also labeled to indicate whether the prompt includes the source text or not.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
robin shen (2023). Test dataset of ChatGPT in medical field [Dataset]. http://doi.org/10.57760/sciencedb.o00130.00001

Test dataset of ChatGPT in medical field

Explore at:
315 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 3, 2023
Dataset provided by
Science Data Bank
Authors
robin shen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.

Search
Clear search
Close search
Google apps
Main menu