Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).
This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).
We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:
Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
🧠 Awesome ChatGPT Prompts [CSV dataset]
This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub
License
CC-0
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset consists of daily-updated user reviews and ratings for the ChatGPT Android App. The dataset includes several key attributes that capture various aspects of the reviews, providing insights into user experiences and feedback over time.
Facebook
TwitterThe rapid advancements in generative AI models present new opportunities in the education sector. However, it is imperative to acknowledge and address the potential risks and concerns that may arise with their use. We collected Twitter data to identify key concerns related to the use of ChatGPT in education. This dataset is used to support the study "ChatGPT in education: A discourse analysis of worries and concerns on social media."
In this study, we particularly explored two research questions. RQ1 (Concerns): What are the key concerns that Twitter users perceive with using ChatGPT in education? RQ2 (Accounts): Which accounts are implicated in the discussion of these concerns? In summary, our study underscores the importance of responsible and ethical use of AI in education and highlights the need for collaboration among stakeholders to regulate AI policy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.
Facebook
TwitterChatGPT-Prompts Dataset
Description
This dataset aims to provide an evaluation data for the Language Models to come. It has been generated using LearnGPT website.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
chat gpt paper
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project investigated teacher attitudes towards Generative Artificial Intelligence Tools (GAITs). In excess of three hundred teachers were surveyed across a broad variety of teaching levels, demographic areas, experience levels, and disciplinary areas, to better understand how they believe teaching and assessment should change as a result of GAITs such as ChatGPT.Teachers were invited to complete an online survey relating to their perceptions of the open Artificial Intelligence (AI) tool ChatGPT, and how it will influence what they teach and how they assess. The purpose of the study is to provide teachers, policymakers, and society at large with an understanding of the potential impact of tools such as ChatGPT on Education.This dataset contains public data files used for the ChatGPT survey (XLSX) and the survey containing variable selection codes (DOCX). See the second sheet of the XLSX file for variable descriptions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents ChatGPT usage patterns across different age groups, showing the percentage of users who have followed its advice, used it without following advice, or have never used it, based on a 2025 U.S. survey.
Facebook
TwitterDataset Card for Dataset Name
Dataset Summary
scraped-chatgpt-conversations contains ~100k conversations between a user and chatgpt that were shared online through reddit, twitter, or sharegpt. For sharegpt, the conversations were directly scraped from the website. For reddit and twitter, images were downloaded from submissions, segmented, and run through an OCR pipeline to obtain a conversation list. For information on how the each json file is structured, please see… See the full description on the dataset page: https://huggingface.co/datasets/ar852/scraped-chatgpt-conversations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset for this research project was meticulously constructed to investigate the adoption of ChatGPT among students in the United States. The primary objective was to gain insights into the technological barriers and resistances faced by students in integrating ChatGPT into their information systems. The dataset was designed to capture the diverse adoption patterns among students in various public and private schools and universities across the United States. By examining adoption rates, frequency of usage, and the contexts in which ChatGPT is employed, the research sought to provide a comprehensive understanding of how students are incorporating this technology into their information systems. Moreover, by including participants from diverse educational institutions, the research sought to ensure a comprehensive representation of the student population in the United States. This approach aimed to provide nuanced insights into how factors such as educational background, institution type, and technological familiarity influence ChatGPT adoption.
Facebook
TwitterDon't forget to upvote, comment, and follow if you are using this dataset. If you have any questions about the dataset I uploaded, feel free to leave them in the comments. Thank you! :)
Jangan lupa untuk upvote, comment, follow jika anda menggunakan dataset ini, dan jika ada pertanyaan mengenai dataset yang saya upload, silahkan tinggalkan di comment. Terima kasih :)
Column Descriptions (English) 1. reviewId: A unique ID for each user review. 2. userName: The name of the user who submitted the review. 3. userImage: The URL of the user's profile picture. 4. content: The text content of the review provided by the user. 5. score: The review score given by the user, typically on a scale of 1-5. 6. thumbsUpCount: The number of likes (thumbs up) received by the review. 7. reviewCreatedVersion: The app version used by the user when creating the review (not always available). 8. at: The date and time when the review was submitted. 9. replyContent: The developer's response to the review (no data available in this column). 10. repliedAt: The date and time when the developer's response was submitted (no data available in this column). 11. appVersion: The app version used by the user when submitting the review (not always available).
Deskripsi Kolom (Bahasa Indonesia) 1. reviewId: ID unik untuk setiap ulasan yang diberikan pengguna. 2. userName: Nama pengguna yang memberikan ulasan. 3. userImage: URL gambar profil pengguna yang memberikan ulasan. 4. content: Isi teks ulasan yang diberikan oleh pengguna. 5. score: Skor ulasan yang diberikan pengguna, biasanya dalam skala 1-5. 6. thumbsUpCount: Jumlah suka (thumbs up) yang diterima oleh ulasan tersebut. 7. reviewCreatedVersion: Versi aplikasi yang digunakan pengguna saat membuat ulasan (tidak selalu tersedia). 8. at: Tanggal dan waktu saat ulasan dibuat. 9. replyContent: Isi balasan dari pengembang aplikasi terhadap ulasan (tidak ada data dalam kolom ini). 10. repliedAt: Tanggal dan waktu saat balasan dari pengembang diberikan (tidak ada data dalam kolom ini). 11. appVersion: Versi aplikasi yang digunakan pengguna saat memberikan ulasan (tidak selalu tersedia).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data set records the perceptions of Bangladeshi university students on the influence that AI tools, especially ChatGPT, have on their academic practices, learning experiences, and problem-solving abilities. The varying role of AI in education, which covers common usage statistics, what AI does to our creative abilities, its impact on our learning, and whether it could invade our privacy. This dataset reveals perspective on how AI tools are changing education in the country and offering valuable information for researchers, educators, policymakers, to understand trends, challenges, and opportunities in the adoption of AI in the academic contex.
Methodology Data Collection Method: Online survey using google from Participants: A total of 3,512 students from various Bangladeshi universities participated. Survey Questions:The survey included questions on demographic information, frequency of AI tool usage, perceived benefits, concerns regarding privacy, and impacts on creativity and learning.
Sampling Technique: Random sampling of university students Data Collection Period: June 2024 to December 2024
Privacy Compliance This dataset has been anonymized to remove any personally identifiable information (PII). It adheres to relevant privacy regulations to ensure the confidentiality of participants.
For further inquiries, please contact: Name: Md Jhirul Islam, Daffodil International University Email: jhirul15-4063@diu.edu.bd Phone: 01316317573
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset summarizes how ChatGPT users rated the outcomes of the advice they received, including whether it was helpful, harmful, neutral, or uncertain, based on a 2025 U.S. survey.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows how men and women in the U.S. reported using ChatGPT in a 2025 survey, including whether they followed its advice or chose not to use it.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows the types of advice users sought from ChatGPT based on a 2025 U.S. survey, including education, financial, medical, and legal topics.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The DataSet consists of user reviews of ChatGPT, including Textual Feedback, Ratings, and Review Dates. The Reviews Range from brief comments to more detailed feedback by covering a wide range of user sentiments. The ratings are on a scale of 1 to 5, representing varying levels of Satisfaction. The dataset spans multiple months, providing a temporal dimension for analysis. Each review is accompanied by a timestamp, allowing for Time-Series analysis of sentiment trends.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the data needed to reproduce all results and figures described in "ChatGPT performance on radiation technologist and therapist entry to practice exams".
Details about the data collection can be found in the paper referenced below. Briefly, ChatGPT (GPT-4) was prompted with multiple choice questions from 4 practice exams provided by the Canadian Association of Medical Radiation Technologists (CAMRT). ChatGPT was promted with the questions from each exam 5 times between July 17 and August 13, 2023. Table 1, below, provides details about the dates for data collection.
Variable descriptions
question: Question number, provided by CAMRT. Skipped question numbers indicate image-based questions that were excluded from the study.
discipline: Indicates the CAMRT exam discipline, abbreviated as follows
RAD: radiological technology
MRI: magnetic resonance
NUC: nuclear medicine
RTT: radiation therapy
question_type: Indicates the type of competency being assessed by the question (Knowledge, Application, or Critical thinking). Competency categories were assigned by CAMRT.
corrrect_response: The correct multiple choice response ("A", "B", "C", or "D"), assigned by CAMRT.
attempt1-5: ChatGPT's response to the multiple choice questions for attempts 1 through 5, indicated using the letters "A", "B", "C", or "D". In a few cases, ChatGPT did not provide a reference to a multiple choice response and "NA" is recorded in the dataset.
Note: The long-form questions from CAMRT and answers provided by ChatGPT are not available as a part of this dataset.
Table 1: Dates for data collection
Attempt 1 Attempt 2 Attempt 3 Attempt 4 Attempt 5
Radiological technology 2 Aug 2023 2 Aug 2023 8 Aug 2023 9 Aug 2023 11 Aug 2023
Magnetic resonance 17 Jul 2023 18 Jul 2023 18 Jul 2023 9 Aug 2023 12 Aug 2023
Nuclear medicine 8 Aug 2023 9 Aug 2023 12 Aug 2023 12 Aug 2023 12 Aug 2023
Radiation therapy 9 Aug 2023 12 Aug 2023 12 Aug 2023 13 Aug 2023 13 Aug 2023
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset reflects how Americans perceive ChatGPT's broader societal impact, based on a 2025 survey that asked whether the AI will help or harm humanity.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).
This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).
We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:
Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.