80 datasets found
  1. 89k ChatGPT conversations

    • kaggle.com
    zip
    Updated May 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noah Persaud (2023). 89k ChatGPT conversations [Dataset]. https://www.kaggle.com/datasets/noahpersaud/89k-chatgpt-conversations
    Explore at:
    zip(681600031 bytes)Available download formats
    Dataset updated
    May 4, 2023
    Authors
    Noah Persaud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all available conversations from chatlogs.net between users and ChatGPT. Version 1 contains all conversations available up to the cutoff date of April 4, 2023. Version 1 contains all conversations available up to the cutoff date of April 20, 2023.

  2. ChatGPT Classification Dataset

    • kaggle.com
    zip
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi (2023). ChatGPT Classification Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimaktabdar/chatgpt-classification-dataset
    Explore at:
    zip(718710 bytes)Available download formats
    Dataset updated
    Sep 7, 2023
    Authors
    Mahdi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).

    This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).

    We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:

    Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.

    https://www.techrxiv.org/users/692552/articles/682641/master/file/data/ChatGPT_generated_Content_Detection/ChatGPT_generated_Content_Detection.pdf

  3. Z

    Data from: ChatGPT performance on radiation technologist and therapist entry...

    • data.niaid.nih.gov
    Updated May 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tsuruda, Kaitlyn M; Duggan, Ryan (2024). Data from: ChatGPT performance on radiation technologist and therapist entry to practice exams [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10819235
    Explore at:
    Dataset updated
    May 26, 2024
    Dataset provided by
    Dalhousie University
    Authors
    Tsuruda, Kaitlyn M; Duggan, Ryan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the data needed to reproduce all results and figures described in "ChatGPT performance on radiation technologist and therapist entry to practice exams".

    Details about the data collection can be found in the paper referenced below. Briefly, ChatGPT (GPT-4) was prompted with multiple choice questions from 4 practice exams provided by the Canadian Association of Medical Radiation Technologists (CAMRT). ChatGPT was promted with the questions from each exam 5 times between July 17 and August 13, 2023. Table 1, below, provides details about the dates for data collection.

    Variable descriptions

    question: Question number, provided by CAMRT. Skipped question numbers indicate image-based questions that were excluded from the study.

    discipline: Indicates the CAMRT exam discipline, abbreviated as follows

    RAD: radiological technology

    MRI: magnetic resonance

    NUC: nuclear medicine

    RTT: radiation therapy

    question_type: Indicates the type of competency being assessed by the question (Knowledge, Application, or Critical thinking). Competency categories were assigned by CAMRT.

    corrrect_response: The correct multiple choice response ("A", "B", "C", or "D"), assigned by CAMRT.

    attempt1-5: ChatGPT's response to the multiple choice questions for attempts 1 through 5, indicated using the letters "A", "B", "C", or "D". In a few cases, ChatGPT did not provide a reference to a multiple choice response and "NA" is recorded in the dataset.

    Note: The long-form questions from CAMRT and answers provided by ChatGPT are not available as a part of this dataset.

    Table 1: Dates for data collection

    Attempt 1 Attempt 2 Attempt 3 Attempt 4 Attempt 5

    Radiological technology 2 Aug 2023 2 Aug 2023 8 Aug 2023 9 Aug 2023 11 Aug 2023

    Magnetic resonance 17 Jul 2023 18 Jul 2023 18 Jul 2023 9 Aug 2023 12 Aug 2023

    Nuclear medicine 8 Aug 2023 9 Aug 2023 12 Aug 2023 12 Aug 2023 12 Aug 2023

    Radiation therapy 9 Aug 2023 12 Aug 2023 12 Aug 2023 13 Aug 2023 13 Aug 2023

  4. ChatGPT User Reviews

    • kaggle.com
    zip
    Updated Jun 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavik Jikadara (2024). ChatGPT User Reviews [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/chatgpt-user-feedback
    Explore at:
    zip(5709734 bytes)Available download formats
    Dataset updated
    Jun 30, 2024
    Authors
    Bhavik Jikadara
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

    This dataset consists of daily-updated user reviews and ratings for the ChatGPT Android App. The dataset includes several key attributes that capture various aspects of the reviews, providing insights into user experiences and feedback over time.

    Columns Explanation

    • userName: The display name of the user who posted the review.
    • content: The text content of the review. This column contains the actual review text written by the user. It includes user opinions, feedback, and detailed descriptions of their experiences with the ChatGPT app.
    • score: The rating given by the user, typically ranging from 1 to 5. This column captures the numerical rating provided by the user. Higher scores indicate better experiences, while lower scores indicate dissatisfaction.
    • thumbsUpCount: The number of thumbs up (likes) the review received. This column shows how many other users found the review helpful or agreed with the sentiments expressed. It serves as a measure of the review's relevancy and impact.
    • at: The timestamp of when the review was posted. This column includes the date and time when the review was submitted. It is crucial for tracking the temporal distribution of reviews and analyzing trends over time.

    Collection Methods

    • Data Source: The data is collected from user reviews submitted through the ChatGPT Android App's review section on the Google Play Store.
    • Frequency: The dataset is updated daily to capture the most recent user feedback and ratings.
    • Automation: An automated script is used to scrape and compile the reviews, ensuring that the dataset is current and comprehensive.
    • Data Cleaning: Basic preprocessing is performed to ensure data quality, such as removing duplicates and handling missing values.
  5. m

    Composing alt text using large language models: dataset in English

    • data.mendeley.com
    Updated Jun 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yekaterina Kosova (2024). Composing alt text using large language models: dataset in English [Dataset]. http://doi.org/10.17632/szh5zhpgxh.1
    Explore at:
    Dataset updated
    Jun 17, 2024
    Authors
    Yekaterina Kosova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the results of developing alternative text for images using chatbots based on large language models. The study was carried out in April-June 2024. Microsoft Copilot, Google Gemini, and YandexGPT chatbots were used to generate 108 text descriptions for 12 images. Descriptions were generated by chatbots using keywords specified by a person. The experts then rated the resulting descriptions on a Likert scale (from 1 to 5). The data set is presented in a Microsoft Excel table on the “Data” sheet with the following fields: record number; image number; chatbot; image type (photo, logo); request date; list of keywords; number of keywords; length of keywords; time of compilation of keywords; generated descriptions; required length of descriptions; actual length of descriptions; description generation time; usefulness; reliability; completeness; accuracy; literacy. The “Images” sheet contains links to the original images. Alternative descriptions are presented in English.

  6. f

    Data Sheet 2_Large language models generating synthetic clinical datasets: a...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

  7. chatGPT reviews from google play store

    • kaggle.com
    zip
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmad Selo Abadi (2024). chatGPT reviews from google play store [Dataset]. https://www.kaggle.com/datasets/ahmadseloabadi/chatgpt-reviews-from-google-play-store
    Explore at:
    zip(10517568 bytes)Available download formats
    Dataset updated
    Dec 13, 2024
    Authors
    Ahmad Selo Abadi
    Description

    Don't forget to upvote, comment, and follow if you are using this dataset. If you have any questions about the dataset I uploaded, feel free to leave them in the comments. Thank you! :)

    Jangan lupa untuk upvote, comment, follow jika anda menggunakan dataset ini, dan jika ada pertanyaan mengenai dataset yang saya upload, silahkan tinggalkan di comment. Terima kasih :)

    Column Descriptions (English) 1. reviewId: A unique ID for each user review. 2. userName: The name of the user who submitted the review. 3. userImage: The URL of the user's profile picture. 4. content: The text content of the review provided by the user. 5. score: The review score given by the user, typically on a scale of 1-5. 6. thumbsUpCount: The number of likes (thumbs up) received by the review. 7. reviewCreatedVersion: The app version used by the user when creating the review (not always available). 8. at: The date and time when the review was submitted. 9. replyContent: The developer's response to the review (no data available in this column). 10. repliedAt: The date and time when the developer's response was submitted (no data available in this column). 11. appVersion: The app version used by the user when submitting the review (not always available).

    Deskripsi Kolom (Bahasa Indonesia) 1. reviewId: ID unik untuk setiap ulasan yang diberikan pengguna. 2. userName: Nama pengguna yang memberikan ulasan. 3. userImage: URL gambar profil pengguna yang memberikan ulasan. 4. content: Isi teks ulasan yang diberikan oleh pengguna. 5. score: Skor ulasan yang diberikan pengguna, biasanya dalam skala 1-5. 6. thumbsUpCount: Jumlah suka (thumbs up) yang diterima oleh ulasan tersebut. 7. reviewCreatedVersion: Versi aplikasi yang digunakan pengguna saat membuat ulasan (tidak selalu tersedia). 8. at: Tanggal dan waktu saat ulasan dibuat. 9. replyContent: Isi balasan dari pengembang aplikasi terhadap ulasan (tidak ada data dalam kolom ini). 10. repliedAt: Tanggal dan waktu saat balasan dari pengembang diberikan (tidak ada data dalam kolom ini). 11. appVersion: Versi aplikasi yang digunakan pengguna saat memberikan ulasan (tidak selalu tersedia).

  8. PROSPECT: Professional Role Effects on Specialized Perspective Enhancement...

    • zenodo.org
    zip
    Updated Dec 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keisuke Sato; Keisuke Sato (2024). PROSPECT: Professional Role Effects on Specialized Perspective Enhancement in Conversational Task [Dataset]. http://doi.org/10.5281/zenodo.14567800
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Keisuke Sato; Keisuke Sato
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 29, 2024
    Description

    ### Data Availability Statement (for the paper)

    All dialogue logs and final responses collected in this study are publicly available in the PROSPECT repository on Zenodo (DOI: [to be assigned]). The repository contains PDF files of complete dialogue histories and Markdown files of final comprehensive analyses for all conditions and models used in this study, allowing for reproducibility and further analysis.

    ### README.md for Zenodo

    # PROSPECT: Professional Role Effects on Specialized Perspective Enhancement in Conversational Task

    ## Overview
    This repository (PROSPECT) contains the dataset associated with the paper:
    > "Empirical Investigation of Expertise, Multiperspectivity, and Abstraction Induction in Conversational AI Outputs through Professional Role Assignment to Both User and AI"

    This research analyzed changes in dialogue logs and final responses when professional roles were assigned to both user and AI sides across multiple Large Language Models (LLMs). This repository provides the complete dialogue logs (PDF format) and final responses (Markdown format) used in the analysis.

    ## Directory Structure
    The repository structure under the top directory (`PROSPECT/`) is as follows:

    ```
    PROSPECT/
    ├── dialogue/ # Dialogue histories (PDF)
    │ ├── none/
    │ ├── ai_only/
    │ ├── user_only/
    │ └── both/
    └── final_answers/ # Final responses (Markdown)
    ├── none/
    ├── ai_only/
    ├── user_only/
    └── both/
    ```

    - **dialogue/**
    - Contains raw dialogue logs in PDF format. Subdirectories represent role assignment conditions:
    - `none/`: No roles assigned to either user or AI
    - `ai_only/`: Role assigned to AI only
    - `user_only/`: Role assigned to user only
    - `both/`: Roles assigned to both user and AI
    - **final_answers/**
    - Contains final comprehensive analysis responses in Markdown format. Directory structure mirrors that of `dialogue/`.

    ## File Naming Convention
    Files in each directory follow this naming convention:
    ```
    [AI]_[conditionNumber]-[roleNumber].pdf
    [AI]_[conditionNumber]-[roleNumber].md
    ```
    - `[AI]`: AI model name used for dialogue (e.g., ChatGPT, ChatGPT-o1, Claude, Gemini)
    - `[conditionNumber]`: Number indicating role assignment condition
    - 0: none
    - 1: ai_only
    - 2: user_only
    - 3: both
    - `[roleNumber]`: Professional role number
    - 0: No role
    - 1: Detective
    - 2: Psychologist
    - 3: Artist
    - 4: Architect
    - 5: Natural Scientist

    ### Examples:
    - `ChatGPT_3-1.pdf`: Dialogue log with ChatGPT-4o model under "both" condition (3) with detective role (1)
    - `Gemini_1-4.md`: Final response from Gemini model under "ai_only" condition (1) with architect role (4)

    ## Role Number Reference
    | roleNumber | Professional Role |
    |-----------:|:-----------------|
    | 0 | No role |
    | 1 | Detective |
    | 2 | Psychologist |
    | 3 | Artist |
    | 4 | Architect |
    | 5 | Natural Scientist|

    ## Data Description
    - **Dialogue Histories (PDF format)**
    Complete logs of questions and responses from each session, preserved as captured during the research. All dialogues were conducted in Japanese. While assistant version information is not included, implementation dates and model names are recorded within the files.
    - **Final Responses (Markdown format)**
    Excerpted responses to the final "comprehensive analysis request" as Markdown files, intended for text analysis and keyword extraction. All responses are in Japanese.

    *Note: This dataset contains dialogues and responses exclusively in Japanese. Researchers interested in lexical analysis or content analysis should consider this language specification.

    ## How to Use
    1. Please maintain the folder hierarchy after downloading.
    2. For meta-analysis or lexical analysis, refer to PDFs for complete dialogues and Markdown files for final responses.
    3. Utilize for research reproduction, secondary analysis, or meta-analysis.

    ## License
    This dataset is released under the **CC BY 4.0** License.
    - Free to use and modify, but please cite this repository (DOI) and the associated paper when using the data.

    ## Related Publication


    ## Disclaimer
    - The dialogue logs contain no personal information or confidential data.
    - The provided logs and responses reflect the research timing; identical prompts may yield different responses due to AI model updates.
    - The creators assume no responsibility for any damages resulting from the use of this dataset.

    ## Contact
    For questions or requests, please contact skeisuke@ibaraki-ct.ac.jp.

  9. m

    Acceptance and Use of ChatGPT Among College Students: A Dataset from Select...

    • data.mendeley.com
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Celbert M. Himang (2023). Acceptance and Use of ChatGPT Among College Students: A Dataset from Select Higher Education Institutions in Luzon, Visayas, and Mindanao [Dataset]. http://doi.org/10.17632/7fz5rkz6bw.1
    Explore at:
    Dataset updated
    Dec 4, 2023
    Authors
    Celbert M. Himang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Luzon, Mindanao, Visayas
    Description

    This study, anchored in the Unified Theory of Acceptance and Utilization of Technology 2 (UTAUT2), investigates the determinants of the acceptance and use of ChatGPT among college students from select Philippine Higher Education Institutions in Luzon, Visayas, and Mindanao. Using a comprehensive online survey, 2,254 responses from 95 institutions were collected from July 22 to October 29, 2023. Analyzing the data with WarpPLS 8.0, the study determines factors such as Performance Expectancy, Effort Expectancy, Social Influence, Facilitating Conditions, Hedonic Motivation, Price Value, Habit, and demographic variables. Rigorous data cleansing yielded 1,242 valid responses. The data were intended to be analyzed using the PLS SEM approach to reveal direct effects and also the moderating roles of age, gender, and experience. This research contributes a valuable dataset, shedding light on AI-enabled chatbot adoption in higher education, with implications for global comparative analyses in similar contexts.

  10. h

    ChatGPT-Prompts

    • huggingface.co
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    waqashayder (2024). ChatGPT-Prompts [Dataset]. https://huggingface.co/datasets/waqashayder/ChatGPT-Prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 11, 2024
    Authors
    waqashayder
    Description

    🧠 Awesome ChatGPT Prompts - Dataset Repository This repository contains a dataset of curated prompts designed for use with ChatGPT. These prompts cover a wide range of tasks and scenarios, making it easier for developers, researchers, and enthusiasts to experiment with and extend ChatGPT’s capabilities. 📑 Dataset Overview The dataset includes a variety of prompts categorized by the type of role or task that ChatGPT can act out. Each entry in the dataset includes: Act: The role that ChatGPT… See the full description on the dataset page: https://huggingface.co/datasets/waqashayder/ChatGPT-Prompts.

  11. R

    Chatgpt Dataset

    • universe.roboflow.com
    zip
    Updated Oct 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    project123 (2025). Chatgpt Dataset [Dataset]. https://universe.roboflow.com/project123-2zib1/chatgpt-36sqv/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 17, 2025
    Dataset authored and provided by
    project123
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects Bounding Boxes
    Description

    ChatGPT

    ## Overview
    
    ChatGPT is a dataset for object detection tasks - it contains Objects annotations for 16,696 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  12. m

    Data from: Higher Education Students’ Evolving Perceptions of ChatGPT:...

    • data.mendeley.com
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksander Aristovnik (2025). Higher Education Students’ Evolving Perceptions of ChatGPT: Global Survey Data from the Academic Year 2024–2025 [Dataset]. http://doi.org/10.17632/nv2343nwsb.1
    Explore at:
    Dataset updated
    Apr 21, 2025
    Authors
    Aleksander Aristovnik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The introduction of ChatGPT in November 2022 marked a significant milestone in the application of artificial intelligence in higher education. Due to its advanced natural language processing capabilities, ChatGPT quickly became popular among students worldwide. However, the increasing acceptance of ChatGPT among students has attracted significant attention, sparking both excitement and scepticism globally. Building on the early students' perceptions of ChatGPT after the first year of introduction, a comprehensive and large-scale global survey was repeated between October 2024 and February 2025. The questionnaire was distributed in seven different languages: English, Italian, Spanish, Turkish, Japanese, Arabic, and Hebrew. It covered several aspects relevant to ChatGPT, including sociodemographic characteristics, usage, capabilities, regulation and ethical concerns, satisfaction and attitude, study issues and outcomes, skills development, labour market and skills mismatch, emotions, study and personal information, and general reflections. The survey targeted higher education students who are currently enrolled at any level in a higher education institution, are at least 18 years old, and have the legal capacity to provide free and voluntary consent to participate in an anonymous survey. Survey participants were recruited using a convenience sampling method, which involved promoting the survey in classrooms and through advertisements on university communication systems. The final dataset consists of 22,963 student responses from 120 different countries and territories. The data may prove useful for researchers studying students' perceptions of ChatGPT, including its implications across various aspects. Moreover, also higher education stakeholders may benefit from these data. While educators may benefit from the data in formulating curricula, including designing teaching methods and assessment tools, policymakers may consider the data when formulating strategies for higher education system development in the future.

  13. H

    Trends in research on ChatGPT and adoption-related issues discussed in...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sang-Jun Kim (2023). Trends in research on ChatGPT and adoption-related issues discussed in articles: a narrative review [Dataset]. http://doi.org/10.7910/DVN/LMPTQH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Sang-Jun Kim
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Dataset 1. List of 1,105 unique articles related to ChatGPT for this review. Suppl. 1. Summary table of 140 research articles evaluating ChatGPT with reference numbers.

  14. h

    awesome-chatgpt-prompts

    • huggingface.co
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Authors
    Fatih Kadir Akın
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    🧠 Awesome ChatGPT Prompts [CSV dataset]

    This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

      License
    

    CC-0

  15. Dataset 1.sav

    • figshare.com
    bin
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Al Wardha Zahoor (2024). Dataset 1.sav [Dataset]. http://doi.org/10.6084/m9.figshare.25398004.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 13, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Al Wardha Zahoor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A survey conducted in Pakistan to determine the perception of health care professionals on the use of ChatGPT in clinical decision making. The survey was conducted online, in between March to April 2023 through online Google forms. Any healthcare professional practicing in Pakistan including doctors, paramedic staff, allied health care; physiotherapist, occupational & speech therapist, nurses with any age group, must be familiar with ChatGPT and had used it in their daily practices of clinical decision making were the targeted population of this survey. The undergraduate students who are practicing clinical for their learning are excluded due to amateur skills.

  16. ChatGPT Evaluation Dataset v.2.0

    • zenodo.org
    • data.niaid.nih.gov
    Updated Oct 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Kocoń; Jan Kocoń; Przemysław Kazienko; Przemysław Kazienko (2024). ChatGPT Evaluation Dataset v.2.0 [Dataset]. http://doi.org/10.5281/zenodo.14019715
    Explore at:
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jan Kocoń; Jan Kocoń; Przemysław Kazienko; Przemysław Kazienko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 2023
    Description

    We tested ChatGPT on 25 tasks focusing on solving common NLP problems and requiring analytical reasoning. These tasks include (1) a relatively simple binary classification of texts like spam, humor, sarcasm, aggression detection, or grammatical correctness of the text; (2) a more complex multiclass and multi-label classification of texts such as sentiment analysis, emotion recognition; (3) reasoning with the personal context, i.e., personalized versions of the problems that make use of additional information about text perception of a given user (user’s examples provided to ChatGPT); (4) semantic annotation and acceptance of the text going towards natural language understanding (NLU) like word sense disambiguation (WSD), and (5) answering questions based on the input text. More information in the paper: https://www.sciencedirect.com/science/article/pii/S156625352300177X

  17. Data from: ChatGPT to generate clinical vignettes for teaching and...

    • zenodo.org
    bin
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Özlem Coşkun; Özlem Coşkun; Yavuz Selim Kıyak; Yavuz Selim Kıyak; Işıl İrem Budakoğlu; Işıl İrem Budakoğlu (2024). Data from: ChatGPT to generate clinical vignettes for teaching and multiple-choice questions for assessment: A randomized controlled experiment [Dataset]. http://doi.org/10.5281/zenodo.7920149
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Özlem Coşkun; Özlem Coşkun; Yavuz Selim Kıyak; Yavuz Selim Kıyak; Işıl İrem Budakoğlu; Işıl İrem Budakoğlu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    Dataset Info
    This dataset is based on the study published in Medical Teacher:

    Coşkun, Ö., Kıyak, Y. S., & Budakoğlu, I. İ. (2024). ChatGPT to generate clinical vignettes for teaching and multiple-choice questions for assessment: A randomized controlled experiment. Medical teacher, 1-7. https://www.tandfonline.com/doi/full/10.1080/0142159X.2024.2327477

    1) Case Evaluation

    # The first row of the dataset identifies the columns.
    # The first column represents the participant ids in the case evaluation.
    # The second column represents the participants' groups (1: ChatGPT group, 0: Control (Human-Written) Group)
    # The third column and the following nine columns reveal the responses of the medical students to the statements using a Likert scale (1: Definitely not agree, 5: Definitely agree).

    2) Test Results

    # The first row of the dataset identifies the columns.
    # The first column represents the participant ids in the exam.
    # The second column represents the participants' groups (1: ChatGPT group, 0: Control (Human-Written) Group)
    # The third column and the following 14 columns reveal the responses of the medical students to the single-best answer multiple-choice questions (1: Correct, 0: Incorrect).

  18. m

    ChatGPT-medical use

    • data.mendeley.com
    Updated Feb 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hang Dong (2023). ChatGPT-medical use [Dataset]. http://doi.org/10.17632/kh4vf897fp.1
    Explore at:
    Dataset updated
    Feb 20, 2023
    Authors
    Hang Dong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The purpose of this study was to assess ChatGPT's ability and its potential risks in medical guidance for patients with MAFLD. The study was conducted in February 2023. We simulated the questions raised by patients with MAFLD, and tested 110 questions in six aspects, including concept, diagnosis, progression, prevention, treatment and comorbidity. Each question was submitted to ChatGPT three times. Appropriateness and consistency were evaluated. In the data set, we presented the questions and responding answers. Judgments for consistency and appropriateness and reasons were also provided.

  19. Data from: Comparison of AI-generated and human-made animated videos for...

    • zenodo.org
    Updated Sep 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdullah Bedir Kaya; Abdullah Bedir Kaya; Yavuz Selim Kıyak; Yavuz Selim Kıyak; ÖZLEM COŞKUN; ÖZLEM COŞKUN; irem budakoglu; irem budakoglu (2025). Data from: Comparison of AI-generated and human-made animated videos for medical education: experts and students preferred AI over humans [Dataset]. http://doi.org/10.5281/zenodo.16926141
    Explore at:
    Dataset updated
    Sep 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abdullah Bedir Kaya; Abdullah Bedir Kaya; Yavuz Selim Kıyak; Yavuz Selim Kıyak; ÖZLEM COŞKUN; ÖZLEM COŞKUN; irem budakoglu; irem budakoglu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To access the full manuscript (open-access): https://revistas.um.es/edumed/article/view/677591

    Title: Expert Dataset (Excel)

    Description:
    This dataset contains anonymized responses from an expert panel that evaluated an AI generated educational video intended for use in Problem-Based Learning (PBL) tutorials in medical education. The Excel file includes one worksheet (“Expert Responses”) with item-level entries for each respondent.

    File contents (variables):

    · ID: Anonymized respondent identifier.

    · Expert group: Group membership coded as 0 = Medical Faculty, 1 = Medical Educators, 2 = Non-medical experts.

    · Expert subgroup: Disciplinary area for applicable groups (0 = Basic Sciences, 1 = Internal Sciences, 2 = Surgical Sciences, 3 = Educational Sciences, 4 = Information Technology). (Note: the Medical Educators group has no subgroups.)

    · PBL facilitation experience: 0 = Yes, 1 = No. (Note in the header indicates that only the Medical Faculty group responded to this item.)

    · Paid AI subscription: Prior paid subscription to AI tools (e.g., ChatGPT, Gemini): 0 = Yes, 1 = No.

    · Video evaluation items (M1–M10): Ten Likert-type statements covering medical appropriateness, visual quality, freedom from distraction, naturalness/appropriateness of voices, perceived professional competence of characters, realism of clinical settings, suitability for the preclinical level, visual consistency, overall structure as an educational material, and potential to encourage critical thinking and discussion.

    o Scale: 5-point Likert, typically interpreted from Strongly disagree (1) to Strongly agree (5).

    Title: Student Dataset (Excel)

    Description:
    This dataset contains anonymized responses from medical students who evaluated two instructional videos based on the same scenario for use in Problem-Based Learning (PBL) tutorials: an AI-generated video and an animated video. The Excel file includes one worksheet (“Student Response”) with item-level entries per respondent. No personally identifying information is included.

    File contents (variables):

    · ID: Anonymized respondent identifier.

    · Year of study: Academic year/level (2025–2026 context).

    · Gender: 0 = Female, 1 = Male, 2 = Prefer not to say.

    · Paid AI subscription: Prior paid subscription to AI tools (e.g., ChatGPT, Gemini): 0 = Yes, 1 = No.

    · Belief that AI will transform healthcare: 0 = Yes, 1 = No.

    Video evaluation items (paired, Likert-type):
    Each construct is rated separately for the AI-generated and the Animated video.

    · I-1 (Medical appropriateness)AI and Animated

    · I-2 (Visual quality)AI and Animated

    · I-3 (Freedom from distraction / watchability)AI and Animated

    · I-4 (Audio naturalness; appropriateness of voice tone)AI and Animated

    · I-5 (Perceived professional competence of characters)AI and Animated

    · I-6 (Realism of clinical settings)AI and Animated

    · I-7 (Appropriateness for preclinical level)AI and Animated

    · I-8 (Visual consistency)AI and Animated

    Scale: 5-point Likert, typically interpreted from Strongly disagree (1) to Strongly agree (5).

    Preference/forced-choice items:

    · I-9 — Which video was more engaging? (0 = Animated, 1 = AI-generated)

    · I-10 — Which video evoked more emotion? (0 = Animated, 1 = AI-generated)

    · I-11 — Which video would you prefer to use in PBL tutorials? (0 = Animated, 1 = AI-generated)

    · I-12 — Prior exposure to this scenario in a PBL tutorial (0 = Yes, 1 = No, 2 = I don’t remember).

  20. arena-hard-auto-v0.1

    • huggingface.co
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMArena (2024). arena-hard-auto-v0.1 [Dataset]. https://huggingface.co/datasets/lmarena-ai/arena-hard-auto-v0.1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2024
    Dataset authored and provided by
    LMArenahttps://lmarena.ai/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Arena-Hard-Auto

    Arena-Hard-Auto-v0.1 (See Paper) is an automatic evaluation tool for instruction-tuned LLMs. It contains 500 challenging user queries sourced from Chatbot Arena. We prompt GPT-4-Turbo as judge to compare the models' responses against a baseline model (default: GPT-4-0314). Notably, Arena-Hard-Auto has the highest correlation and separability to Chatbot Arena among popular open-ended LLM benchmarks (See Paper). If you are curious to see how well your model might… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/arena-hard-auto-v0.1.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Noah Persaud (2023). 89k ChatGPT conversations [Dataset]. https://www.kaggle.com/datasets/noahpersaud/89k-chatgpt-conversations
Organization logo

89k ChatGPT conversations

Vicuna style dataset ready to use with FastChat

Explore at:
zip(681600031 bytes)Available download formats
Dataset updated
May 4, 2023
Authors
Noah Persaud
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains all available conversations from chatlogs.net between users and ChatGPT. Version 1 contains all conversations available up to the cutoff date of April 4, 2023. Version 1 contains all conversations available up to the cutoff date of April 20, 2023.

Search
Clear search
Close search
Google apps
Main menu