90 datasets found
  1. ChatGPT Classification Dataset

    • kaggle.com
    zip
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi (2023). ChatGPT Classification Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimaktabdar/chatgpt-classification-dataset
    Explore at:
    zip(718710 bytes)Available download formats
    Dataset updated
    Sep 7, 2023
    Authors
    Mahdi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).

    This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).

    We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:

    Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.

    https://www.techrxiv.org/users/692552/articles/682641/master/file/data/ChatGPT_generated_Content_Detection/ChatGPT_generated_Content_Detection.pdf

  2. S

    Test dataset of ChatGPT in medical field

    • scidb.cn
    Updated Mar 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    robin shen (2023). Test dataset of ChatGPT in medical field [Dataset]. http://doi.org/10.57760/sciencedb.o00130.00001
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2023
    Dataset provided by
    Science Data Bank
    Authors
    robin shen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.

  3. ChatGPT User Reviews

    • kaggle.com
    zip
    Updated Jun 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavik Jikadara (2024). ChatGPT User Reviews [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/chatgpt-user-feedback
    Explore at:
    zip(5709734 bytes)Available download formats
    Dataset updated
    Jun 30, 2024
    Authors
    Bhavik Jikadara
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

    This dataset consists of daily-updated user reviews and ratings for the ChatGPT Android App. The dataset includes several key attributes that capture various aspects of the reviews, providing insights into user experiences and feedback over time.

    Columns Explanation

    • userName: The display name of the user who posted the review.
    • content: The text content of the review. This column contains the actual review text written by the user. It includes user opinions, feedback, and detailed descriptions of their experiences with the ChatGPT app.
    • score: The rating given by the user, typically ranging from 1 to 5. This column captures the numerical rating provided by the user. Higher scores indicate better experiences, while lower scores indicate dissatisfaction.
    • thumbsUpCount: The number of thumbs up (likes) the review received. This column shows how many other users found the review helpful or agreed with the sentiments expressed. It serves as a measure of the review's relevancy and impact.
    • at: The timestamp of when the review was posted. This column includes the date and time when the review was submitted. It is crucial for tracking the temporal distribution of reviews and analyzing trends over time.

    Collection Methods

    • Data Source: The data is collected from user reviews submitted through the ChatGPT Android App's review section on the Google Play Store.
    • Frequency: The dataset is updated daily to capture the most recent user feedback and ratings.
    • Automation: An automated script is used to scrape and compile the reviews, ensuring that the dataset is current and comprehensive.
    • Data Cleaning: Basic preprocessing is performed to ensure data quality, such as removing duplicates and handling missing values.
  4. H

    Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its...

    • dataverse.harvard.edu
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jieshu Wang; Elif Kiran; Aurora Mai (also known as Mai P. Trinh); Michael Simeone; José Lobo (2024). Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its Performance in the Public Sector Workforce [Dataset]. http://doi.org/10.7910/DVN/P3CDHS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 31, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Jieshu Wang; Elif Kiran; Aurora Mai (also known as Mai P. Trinh); Michael Simeone; José Lobo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This repository contains two datasets used in the study exploring the impact of Generative AI, specifically ChatGPT, on the public sector workforce in the United States. The datasets provide detailed information on the core tasks of public sector occupations and their estimated performance metrics, including potential for automation and augmentation by ChatGPT. These estimations are generated by OpenAI’s GPT-4 model (GPT-4-1106-preview) through OpenAI API.

  5. ChatGPT Reddit

    • kaggle.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Armita Razavi (2023). ChatGPT Reddit [Dataset]. https://www.kaggle.com/datasets/armitaraz/chatgpt-reddit/data
    Explore at:
    zip(5282154 bytes)Available download formats
    Dataset updated
    Jan 29, 2023
    Authors
    Armita Razavi
    License

    https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api

    Description

    Here you can find about 50K comments on Reddit website regarding ChatGPT . The comments are gathered from Reddit's Posts from 4 subreddits.

    The data includes comment_id, comment_parent_id, comment_body and subreddit

    • comment_id : the comment's id
    • comment_parent_id: the comment's id which the current comment is replied to.
    • comment_body: the comment
    • subreddit: the community/subreddit name of the comment

    The Date and other information related to comments will be added in the next version. This dataset is useful to get insight about the public take on ChatGPT and also for text analysis, text visualizations, Inline Question Answering, Text Summarization, NER and other tasks like clustering and so on.

    Please note that this dataset is not cleaned or preprocessed so if you want to get your hands dirty with data, it's a good practice to level up your skills in data cleaning too :)

    And please don't forget to UPVOTE it in case you find it useful and enjoy it.

  6. s

    Data from: ChatGPT in education: A discourse analysis of worries and...

    • socialmediaarchive.org
    csv, json, txt
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). ChatGPT in education: A discourse analysis of worries and concerns on social media [Dataset]. https://socialmediaarchive.org/record/54
    Explore at:
    csv(6528597), json(248465998), txt(4908229)Available download formats
    Dataset updated
    Sep 26, 2023
    Description

    The rapid advancements in generative AI models present new opportunities in the education sector. However, it is imperative to acknowledge and address the potential risks and concerns that may arise with their use. We collected Twitter data to identify key concerns related to the use of ChatGPT in education. This dataset is used to support the study "ChatGPT in education: A discourse analysis of worries and concerns on social media."

    In this study, we particularly explored two research questions. RQ1 (Concerns): What are the key concerns that Twitter users perceive with using ChatGPT in education? RQ2 (Accounts): Which accounts are implicated in the discussion of these concerns? In summary, our study underscores the importance of responsible and ethical use of AI in education and highlights the need for collaboration among stakeholders to regulate AI policy.

  7. h

    awesome-chatgpt-prompts

    • huggingface.co
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Authors
    Fatih Kadir Akın
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    🧠 Awesome ChatGPT Prompts [CSV dataset]

    This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

      License
    

    CC-0

  8. Datasets .csv

    • figshare.com
    txt
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaser Alhasawi (2024). Datasets .csv [Dataset]. http://doi.org/10.6084/m9.figshare.25053146.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yaser Alhasawi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset for this research project was meticulously constructed to investigate the adoption of ChatGPT among students in the United States. The primary objective was to gain insights into the technological barriers and resistances faced by students in integrating ChatGPT into their information systems. The dataset was designed to capture the diverse adoption patterns among students in various public and private schools and universities across the United States. By examining adoption rates, frequency of usage, and the contexts in which ChatGPT is employed, the research sought to provide a comprehensive understanding of how students are incorporating this technology into their information systems. Moreover, by including participants from diverse educational institutions, the research sought to ensure a comprehensive representation of the student population in the United States. This approach aimed to provide nuanced insights into how factors such as educational background, institution type, and technological familiarity influence ChatGPT adoption.

  9. h

    ASRS-ChatGPT

    • huggingface.co
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archana Tikayat Ray (2023). ASRS-ChatGPT [Dataset]. http://doi.org/10.57967/hf/0830
    Explore at:
    Dataset updated
    Jun 29, 2023
    Authors
    Archana Tikayat Ray
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Summary

    The dataset contains a total of 9984 incident records and 9 columns. Some of the columns contain ground truth values whereas others contain information generated by ChatGPT based on the incident Narratives. The creation of this dataset is aimed at providing researchers with columns generated by using ChatGPT API which is not freely available.

      Dataset Structure
    

    The column names present in the dataset and their descriptions are provided below:

    Column… See the full description on the dataset page: https://huggingface.co/datasets/archanatikayatray/ASRS-ChatGPT.

  10. m

    Public data files containing the data used for the ChatGPT survey (XLSX) and...

    • figshare.mq.edu.au
    • researchdata.edu.au
    xlsx
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matt Bower; Jodie Torrington; Jennifer Lai; Peter Petocz; Mark Alfano (2023). Public data files containing the data used for the ChatGPT survey (XLSX) and the survey containing variable selection codes (DOCX). [Dataset]. http://doi.org/10.25949/24123306.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    Macquarie University
    Authors
    Matt Bower; Jodie Torrington; Jennifer Lai; Peter Petocz; Mark Alfano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project investigated teacher attitudes towards Generative Artificial Intelligence Tools (GAITs). In excess of three hundred teachers were surveyed across a broad variety of teaching levels, demographic areas, experience levels, and disciplinary areas, to better understand how they believe teaching and assessment should change as a result of GAITs such as ChatGPT.Teachers were invited to complete an online survey relating to their perceptions of the open Artificial Intelligence (AI) tool ChatGPT, and how it will influence what they teach and how they assess. The purpose of the study is to provide teachers, policymakers, and society at large with an understanding of the potential impact of tools such as ChatGPT on Education.This dataset contains public data files used for the ChatGPT survey (XLSX) and the survey containing variable selection codes (DOCX). See the second sheet of the XLSX file for variable descriptions.

  11. h

    scraped-chatgpt-conversations

    • huggingface.co
    Updated Apr 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arya Nistane (2023). scraped-chatgpt-conversations [Dataset]. https://huggingface.co/datasets/ar852/scraped-chatgpt-conversations
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2023
    Authors
    Arya Nistane
    Description

    Dataset Card for Dataset Name

      Dataset Summary
    

    scraped-chatgpt-conversations contains ~100k conversations between a user and chatgpt that were shared online through reddit, twitter, or sharegpt. For sharegpt, the conversations were directly scraped from the website. For reddit and twitter, images were downloaded from submissions, segmented, and run through an OCR pipeline to obtain a conversation list. For information on how the each json file is structured, please see… See the full description on the dataset page: https://huggingface.co/datasets/ar852/scraped-chatgpt-conversations.

  12. m

    The Impact of AI and ChatGPT on Bangladeshi University Students

    • data.mendeley.com
    Updated Jan 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Jhirul Islam (2025). The Impact of AI and ChatGPT on Bangladeshi University Students [Dataset]. http://doi.org/10.17632/zykphpvbr7.2
    Explore at:
    Dataset updated
    Jan 6, 2025
    Authors
    Md Jhirul Islam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh
    Description

    The data set records the perceptions of Bangladeshi university students on the influence that AI tools, especially ChatGPT, have on their academic practices, learning experiences, and problem-solving abilities. The varying role of AI in education, which covers common usage statistics, what AI does to our creative abilities, its impact on our learning, and whether it could invade our privacy. This dataset reveals perspective on how AI tools are changing education in the country and offering valuable information for researchers, educators, policymakers, to understand trends, challenges, and opportunities in the adoption of AI in the academic contex.

    Methodology Data Collection Method: Online survey using google from Participants: A total of 3,512 students from various Bangladeshi universities participated. Survey Questions:The survey included questions on demographic information, frequency of AI tool usage, perceived benefits, concerns regarding privacy, and impacts on creativity and learning.

    Sampling Technique: Random sampling of university students Data Collection Period: June 2024 to December 2024

    Privacy Compliance This dataset has been anonymized to remove any personally identifiable information (PII). It adheres to relevant privacy regulations to ensure the confidentiality of participants.

    For further inquiries, please contact: Name: Md Jhirul Islam, Daffodil International University Email: jhirul15-4063@diu.edu.bd Phone: 01316317573

  13. All GPT-4 Conversations

    • kaggle.com
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). All GPT-4 Conversations [Dataset]. https://www.kaggle.com/datasets/thedevastator/all-gpt-4-synthetic-chat-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    All GPT-4 Generated Datasets

    Every chat dataset generated by GPT-4 from Huggingface at the same format

    From [Huggingface datasets]

    About this dataset

    How to use the dataset

    The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets. Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.

    Acknowledgements

    This dataset is a collection of several single chat datasets. If you use this dataset in your research, please credit the original authors of the internal datasets. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

  14. f

    Data Sheet 2_Large language models generating synthetic clinical datasets: a...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

  15. h

    ChatGPT-Research-Abstracts

    • huggingface.co
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolai Thorer Sivesind, ChatGPT-Research-Abstracts [Dataset]. https://huggingface.co/datasets/NicolaiSivesind/ChatGPT-Research-Abstracts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Nicolai Thorer Sivesind
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    ChatGPT-Research-Abstracts

    This is a dataset created in relation to a bachelor thesis written by Nicolai Thorer Sivesind and Andreas Bentzen Winje. It contains human-produced and machine-generated text samples of scientific research abstracts. A reformatted version for text-classification is available in the dataset collection Human-vs-Machine. In this collection, all samples are split into separate data points for real and generated, and labeled either 0 (human-produced) or 1… See the full description on the dataset page: https://huggingface.co/datasets/NicolaiSivesind/ChatGPT-Research-Abstracts.

  16. m

    CHAT GPT

    • data.mendeley.com
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dorit Alt (2023). CHAT GPT [Dataset]. http://doi.org/10.17632/7td5x9nvwm.1
    Explore at:
    Dataset updated
    Nov 15, 2023
    Authors
    Dorit Alt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    chat gpt paper

  17. U

    Data from: Dataset of the study: "Chatbots put to the test in math and logic...

    • researchdata.bath.ac.uk
    Updated May 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vagelis Plevris; George Papazafeiropoulos; Alejandro Jimenez Rios (2023). Dataset of the study: "Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard" [Dataset]. http://doi.org/10.5281/zenodo.7940781
    Explore at:
    Dataset updated
    May 20, 2023
    Dataset provided by
    Zenodo
    Authors
    Vagelis Plevris; George Papazafeiropoulos; Alejandro Jimenez Rios
    Dataset funded by
    Oslo Metropolitan University
    Description

    This dataset contains the 30 questions that were posed to the chatbots (i) ChatGPT-3.5; (ii) ChatGPT-4; and (iii) Google Bard, in May 2023 for the study “Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard”. These 30 questions describe mathematics and logic problems that have a unique correct answer. The questions are fully described with plain text only, without the need for any images or special formatting. The questions are divided into two sets of 15 questions each (Set A and Set B). The questions of Set A are 15 “Original” problems that cannot be found online, at least in their exact wording, while Set B contains 15 “Published” problems that one can find online by searching on the internet, usually with their solution. Each question is posed three times to each chatbot.

    This dataset contains the following: (i) The full set of the 30 questions, A01-A15 and B01-B15; (ii) the correct answer for each one of them; (iii) an explanation of the solution, for the problems where such an explanation is needed, (iv) the 30 (questions) × 3 (chatbots) × 3 (answers) = 270 detailed answers of the chatbots. For the published problems of Set B, we also provide a reference to the source where each problem was taken from.

  18. R

    Chatgpt Dataset

    • universe.roboflow.com
    zip
    Updated Oct 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    project123 (2025). Chatgpt Dataset [Dataset]. https://universe.roboflow.com/project123-2zib1/chatgpt-36sqv/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 17, 2025
    Dataset authored and provided by
    project123
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects Bounding Boxes
    Description

    ChatGPT

    ## Overview
    
    ChatGPT is a dataset for object detection tasks - it contains Objects annotations for 16,696 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  19. R

    New_dataset (400 Images) Chatgpt Agent Mode Version Dataset

    • universe.roboflow.com
    zip
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riga Technical University (2025). New_dataset (400 Images) Chatgpt Agent Mode Version Dataset [Dataset]. https://universe.roboflow.com/riga-technical-university/new_dataset-400-images-chatgpt-agent-mode-version-bvgkw
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 25, 2025
    Dataset authored and provided by
    Riga Technical University
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Mole Idlb LlrU PrSb Bounding Boxes
    Description

    New_dataset (400 Images) Chatgpt Agent Mode Version

    ## Overview
    
    New_dataset (400 Images) Chatgpt Agent Mode Version is a dataset for object detection tasks - it contains Mole Idlb LlrU PrSb annotations for 276 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  20. e

    ChatGPT Usage by Age Group – Survey Data

    • expresslegalfunding.com
    html
    Updated Sep 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Express Legal Funding (2025). ChatGPT Usage by Age Group – Survey Data [Dataset]. https://expresslegalfunding.com/chatgpt-study/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 10, 2025
    Dataset authored and provided by
    Express Legal Funding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    60+, 18–29, 30–44, 45–60
    Description

    This dataset presents ChatGPT usage patterns across different age groups, showing the percentage of users who have followed its advice, used it without following advice, or have never used it, based on a 2025 U.S. survey.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mahdi (2023). ChatGPT Classification Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimaktabdar/chatgpt-classification-dataset
Organization logo

ChatGPT Classification Dataset

Classification of ChatGPT generated text from human generated text

Explore at:
114 scholarly articles cite this dataset (View in Google Scholar)
zip(718710 bytes)Available download formats
Dataset updated
Sep 7, 2023
Authors
Mahdi
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).

This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).

We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:

Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.

https://www.techrxiv.org/users/692552/articles/682641/master/file/data/ChatGPT_generated_Content_Detection/ChatGPT_generated_Content_Detection.pdf

Search
Clear search
Close search
Google apps
Main menu