71 datasets found
  1. S

    Test dataset of ChatGPT in medical field

    • scidb.cn
    Updated Mar 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    robin shen (2023). Test dataset of ChatGPT in medical field [Dataset]. http://doi.org/10.57760/sciencedb.o00130.00001
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2023
    Dataset provided by
    Science Data Bank
    Authors
    robin shen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.

  2. i

    "ChatGPT vs. Student: A Dataset for Source Classification of Computer...

    • ieee-dataport.org
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ALI ABDULLAH S ALQAHTANI (2023). "ChatGPT vs. Student: A Dataset for Source Classification of Computer Science Answers [Dataset]. https://ieee-dataport.org/documents/chatgpt-vs-student-dataset-source-classification-computer-science-answers
    Explore at:
    Dataset updated
    Jul 19, 2023
    Authors
    ALI ABDULLAH S ALQAHTANI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    along with the corresponding answers from students and ChatGPT.

  3. h

    ASRS-ChatGPT

    • huggingface.co
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archana Tikayat Ray (2023). ASRS-ChatGPT [Dataset]. http://doi.org/10.57967/hf/0830
    Explore at:
    Dataset updated
    Jun 29, 2023
    Authors
    Archana Tikayat Ray
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Summary

    The dataset contains a total of 9984 incident records and 9 columns. Some of the columns contain ground truth values whereas others contain information generated by ChatGPT based on the incident Narratives. The creation of this dataset is aimed at providing researchers with columns generated by using ChatGPT API which is not freely available.

      Dataset Structure
    

    The column names present in the dataset and their descriptions are provided below:

    Column… See the full description on the dataset page: https://huggingface.co/datasets/archanatikayatray/ASRS-ChatGPT.

  4. d

    Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its...

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang, Jieshu; Kiran, Elif; S.R. Aurora (also known as Mai P. Trinh); Simeone, Michael; Lobo, JosƩ (2024). Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its Performance in the Public Sector Workforce [Dataset]. http://doi.org/10.7910/DVN/P3CDHS
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Wang, Jieshu; Kiran, Elif; S.R. Aurora (also known as Mai P. Trinh); Simeone, Michael; Lobo, JosƩ
    Description

    This repository contains two datasets used in the study exploring the impact of Generative AI, specifically ChatGPT, on the public sector workforce in the United States. The datasets provide detailed information on the core tasks of public sector occupations and their estimated performance metrics, including potential for automation and augmentation by ChatGPT. These estimations are generated by OpenAI’s GPT-4 model (GPT-4-1106-preview) through OpenAI API.

  5. m

    The Impact of AI and ChatGPT on Bangladeshi University Students

    • data.mendeley.com
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Jhirul Islam (2025). The Impact of AI and ChatGPT on Bangladeshi University Students [Dataset]. http://doi.org/10.17632/zykphpvbr7.2
    Explore at:
    Dataset updated
    Jan 6, 2025
    Authors
    Md Jhirul Islam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh
    Description

    The data set records the perceptions of Bangladeshi university students on the influence that AI tools, especially ChatGPT, have on their academic practices, learning experiences, and problem-solving abilities. The varying role of AI in education, which covers common usage statistics, what AI does to our creative abilities, its impact on our learning, and whether it could invade our privacy. This dataset reveals perspective on how AI tools are changing education in the country and offering valuable information for researchers, educators, policymakers, to understand trends, challenges, and opportunities in the adoption of AI in the academic contex.

    Methodology Data Collection Method: Online survey using google from Participants: A total of 3,512 students from various Bangladeshi universities participated. Survey Questions:The survey included questions on demographic information, frequency of AI tool usage, perceived benefits, concerns regarding privacy, and impacts on creativity and learning.

    Sampling Technique: Random sampling of university students Data Collection Period: June 2024 to December 2024

    Privacy Compliance This dataset has been anonymized to remove any personally identifiable information (PII). It adheres to relevant privacy regulations to ensure the confidentiality of participants.

    For further inquiries, please contact: Name: Md Jhirul Islam, Daffodil International University Email: jhirul15-4063@diu.edu.bd Phone: 01316317573

  6. f

    Data Sheet 1_Large language models generating synthetic clinical datasets: a...

    • frontiersin.figshare.com
    xlsx
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 1_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

  7. e

    Using code from ChatGPT: Finding patterns in the developers’ interaction...

    • b2find.eudat.eu
    Updated Jan 4, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Using code from ChatGPT: Finding patterns in the developers’ interaction with ChatGPT - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/4c49482c-2770-504e-8b1c-f111341624f6
    Explore at:
    Dataset updated
    Jan 4, 2013
    Description

    ChatGPT can advise developers and provide code on how to fix bugs, add new features, refactor, reuse, and secure their code but currently, there is little knowledge about whether the developers trust ChatGPT’s responses and actually use the provided code. In this context, this study aims to identify patterns that describe the interaction of developers with ChatGPT with respect to the characteristics of the prompts and the actual use of the provided code by the developer. We performed a case study on 267,098 lines of code provided by ChatGPT related to commits, pull requests, files of code, and discussions between ChatGPT and developers. Our findings show that developers are more likely to integrate the given code snapshot in their code base when they have provided information to ChatGPT through several rounds of brief prompts that include problem-related specific words instead of using large textual or code prompts. Results also highlight the ability of ChatGPT to handle efficiently different types of problems across different programming languages.

  8. h

    ChatGPT-Gemini-Claude-Perplexity-Human-Evaluation-Multi-Aspects-Review-Dataset...

    • huggingface.co
    Updated Nov 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepNLP (2024). ChatGPT-Gemini-Claude-Perplexity-Human-Evaluation-Multi-Aspects-Review-Dataset [Dataset]. https://huggingface.co/datasets/DeepNLP/ChatGPT-Gemini-Claude-Perplexity-Human-Evaluation-Multi-Aspects-Review-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 12, 2024
    Authors
    DeepNLP
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ChatGPT Gemini Claude Perplexity Human Evaluation Multi Aspect Review Dataset

      Introduction
    

    Human evaluation and reviews with scalar score of AI Services responses are very usefuly in LLM Finetuning, Human Preference Alignment, Few-Shot Learning, Bad Case Shooting, etc, but extremely difficult to collect. This dataset is collected from DeepNLP AI Service User Review panel (http://www.deepnlp.org/store), which is an open review website for users to give reviews and upload… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/ChatGPT-Gemini-Claude-Perplexity-Human-Evaluation-Multi-Aspects-Review-Dataset.

  9. r

    Public data files containing the data used for the ChatGPT survey (XLSX) and...

    • researchdata.edu.au
    • figshare.mq.edu.au
    Updated Sep 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Petocz; Matt Bower; Mark Alfano; Jodie Torrington; Jennifer Lai (2023). Public data files containing the data used for the ChatGPT survey (XLSX) and the survey containing variable selection codes (DOCX). [Dataset]. http://doi.org/10.25949/24123306.V1
    Explore at:
    Dataset updated
    Sep 21, 2023
    Dataset provided by
    Macquarie University
    Authors
    Peter Petocz; Matt Bower; Mark Alfano; Jodie Torrington; Jennifer Lai
    Description

    This project investigated teacher attitudes towards Generative Artificial Intelligence Tools (GAITs). In excess of three hundred teachers were surveyed across a broad variety of teaching levels, demographic areas, experience levels, and disciplinary areas, to better understand how they believe teaching and assessment should change as a result of GAITs such as ChatGPT.

    Teachers were invited to complete an online survey relating to their perceptions of the open Artificial Intelligence (AI) tool ChatGPT, and how it will influence what they teach and how they assess. The purpose of the study is to provide teachers, policymakers, and society at large with an understanding of the potential impact of tools such as ChatGPT on Education.

    This dataset contains public data files used for the ChatGPT survey (XLSX) and the survey containing variable selection codes (DOCX). See the second sheet of the XLSX file for variable descriptions.


  10. f

    Data from: How generative AI models such as ChatGPT can be (mis)used in SPC...

    • tandf.figshare.com
    html
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fadel M. Megahed; Ying-Ju Chen; Joshua A. Ferris; Sven Knoth; L. Allison Jones-Farmer (2024). How generative AI models such as ChatGPT can be (mis)used in SPC practice, education, and research? An exploratory study [Dataset]. http://doi.org/10.6084/m9.figshare.23532743.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Mar 6, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Fadel M. Megahed; Ying-Ju Chen; Joshua A. Ferris; Sven Knoth; L. Allison Jones-Farmer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generative Artificial Intelligence (AI) models such as OpenAI’s ChatGPT have the potential to revolutionize Statistical Process Control (SPC) practice, learning, and research. However, these tools are in the early stages of development and can be easily misused or misunderstood. In this paper, we give an overview of the development of Generative AI. Specifically, we explore ChatGPT’s ability to provide code, explain basic concepts, and create knowledge related to SPC practice, learning, and research. By investigating responses to structured prompts, we highlight the benefits and limitations of the results. Our study indicates that the current version of ChatGPT performs well for structured tasks, such as translating code from one language to another and explaining well-known concepts but struggles with more nuanced tasks, such as explaining less widely known terms and creating code from scratch. We find that using new AI tools may help practitioners, educators, and researchers to be more efficient and productive. However, in their current stages of development, some results are misleading and wrong. Overall, the use of generative AI models in SPC must be properly validated and used in conjunction with other methods to ensure accurate results.

  11. Data from: Dataset of the study: "Chatbots put to the test in math and logic...

    • zenodo.org
    • researchdata.bath.ac.uk
    • +1more
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vagelis Plevris; Vagelis Plevris; George Papazafeiropoulos; George Papazafeiropoulos; Alejandro JimƩnez Rios; Alejandro JimƩnez Rios (2024). Dataset of the study: "Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard" [Dataset]. http://doi.org/10.5281/zenodo.7951690
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vagelis Plevris; Vagelis Plevris; George Papazafeiropoulos; George Papazafeiropoulos; Alejandro JimƩnez Rios; Alejandro JimƩnez Rios
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the 30 questions that were posed to the chatbots (i) ChatGPT-3.5; (ii) ChatGPT-4; and (iii) Google Bard, in May 2023 for the study ā€œChatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT-3.5, ChatGPT-4, and Google Bardā€. These 30 questions describe mathematics and logic problems that have a unique correct answer. The questions are fully described with plain text only, without the need for any images or special formatting. The questions are divided into two sets of 15 questions each (Set A and Set B). The questions of Set A are 15 ā€œOriginalā€ problems that cannot be found online, at least in their exact wording, while Set B contains 15 ā€œPublishedā€ problems that one can find online by searching on the internet, usually with their solution. Each question is posed three times to each chatbot. This dataset contains the following: (i) The full set of the 30 questions, A01-A15 and B01-B15; (ii) the correct answer for each one of them; (iii) an explanation of the solution, for the problems where such an explanation is needed, (iv) the 30 (questions) Ɨ 3 (chatbots) Ɨ 3 (answers) = 270 detailed answers of the chatbots. For the published problems of Set B, we also provide a reference to the source where each problem was taken from.

  12. h

    awesome-chatgpt-prompts

    • huggingface.co
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Authors
    Fatih Kadir Akın
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    🧠 Awesome ChatGPT Prompts [CSV dataset]

    This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

      License
    

    CC-0

  13. DAIGT | External Dataset

    • kaggle.com
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    moth (2023). DAIGT | External Dataset [Dataset]. https://www.kaggle.com/datasets/alejopaullier/daigt-external-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    moth
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Important Note: the text column is NOT AI generated. However, the source_text is, which can still be used as AI generated text. I will update the dataset accordingly. Consequently, this dataset provides 2421 student generated texts (text column) and 2421 AI generated texts (source_text column). I will update as soon as possible.

    In the LLM- Detect AI Generated Text competition you are required to distinguish between student-made and AI-generated texts. However, the competition's data only provides student-made texts.

    Luckily, for CommonLit's competition I made a dataset with AI generated texts to use for that competition. Surprisingly, it's very much alike the data we need for in this competition!

    My dataset not only has 2421 Chat GPT generated texts but also their prompts and source texts! That's double the data we are given in this competition!

    Also, it's very diverse since the texts are generated from unique prompts.

    The best of luck to all of you in this competition! šŸ€

    Dataset Description

    • id: unique identifier for each text.
    • text: extracted text from FeedBack Prize 3 competition. Can be used as student text.
    • instructions: the instruction for ChatGPT to generate the text.
    • source_text: AI generated text.
  14. DORIS-MAE-v1

    • zenodo.org
    • data.niaid.nih.gov
    bin, json
    Updated Oct 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi (2023). DORIS-MAE-v1 [Dataset]. http://doi.org/10.5281/zenodo.8299749
    Explore at:
    bin, jsonAvailable download formats
    Dataset updated
    Oct 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.

    Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.

    The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.

    The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.

    The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.

    Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).

    The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:

    ā”œā”€ā”€ ada_embedding_for_DORIS-MAE_v1.pickle
    ā”œā”€ā”€ "Query"
    │ ā”œā”€ā”€ query_id_1 (Embedding of query_1)
    │ ā”œā”€ā”€ query_id_2 (Embedding of query_2)
    │ └── query_id_3 (Embedding of query_3)
    │ .
    │ .
    │ .
    └── "Corpus"
    ā”œā”€ā”€ corpus_id_1 (Embedding of abstract_1)
    ā”œā”€ā”€ corpus_id_2 (Embedding of abstract_2)
    └── corpus_id_3 (Embedding of abstract_3)
    .
    .
    .

  15. AI Tool Usage by Indian College Students 2025

    • kaggle.com
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakesh Kapilavayi (2025). AI Tool Usage by Indian College Students 2025 [Dataset]. https://www.kaggle.com/datasets/rakeshkapilavai/ai-tool-usage-by-indian-college-students-2025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 9, 2025
    Dataset provided by
    Kaggle
    Authors
    Rakesh Kapilavayi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AI Tool Usage by Indian College Students 2025

    This unique dataset, collected via a May 2025 survey, captures how 496 Indian college students use AI tools (e.g., ChatGPT, Gemini, Copilot) in academics. It includes 16 attributes like AI tool usage, trust, impact on grades, and internet access, ideal for education analytics and machine learning.

    Columns

    • Student_Name: Anonymized student name.
    • College_Name: College attended.
    • Stream: Academic discipline (e.g., Engineering, Arts).
    • Year_of_Study: Year of study (1–4).
    • AI_Tools_Used: Tools used (e.g., ChatGPT, Gemini).
    • Daily_Usage_Hours: Hours spent daily on AI tools.
    • Use_Cases: Purposes (e.g., Assignments, Exam Prep).
    • Trust_in_AI_Tools: Trust level (1–5).
    • Impact_on_Grades: Grade impact (-3 to +3).
    • Do_Professors_Allow_Use: Professor approval (Yes/No).
    • Preferred_AI_Tool: Preferred tool.
    • Awareness_Level: AI awareness (1–10).
    • Willing_to_Pay_for_Access: Willingness to pay (Yes/No).
    • State: Indian state.
    • Device_Used: Device (e.g., Laptop, Mobile).
    • Internet_Access: Access quality (Poor/Medium/High).

    Use Cases

    • Predict academic performance using AI tool usage.
    • Analyze trust in AI across streams or regions.
    • Cluster students by usage patterns.
    • Study digital divide via Internet_Access.

    Source: Collected via Google Forms survey in May 2025, ensuring diverse representation across India. Note: First dataset of its kind on Kaggle!

  16. Datasets .csv

    • figshare.com
    txt
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaser Alhasawi (2024). Datasets .csv [Dataset]. http://doi.org/10.6084/m9.figshare.25053146.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    figshare
    Authors
    Yaser Alhasawi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset for this research project was meticulously constructed to investigate the adoption of ChatGPT among students in the United States. The primary objective was to gain insights into the technological barriers and resistances faced by students in integrating ChatGPT into their information systems. The dataset was designed to capture the diverse adoption patterns among students in various public and private schools and universities across the United States. By examining adoption rates, frequency of usage, and the contexts in which ChatGPT is employed, the research sought to provide a comprehensive understanding of how students are incorporating this technology into their information systems. Moreover, by including participants from diverse educational institutions, the research sought to ensure a comprehensive representation of the student population in the United States. This approach aimed to provide nuanced insights into how factors such as educational background, institution type, and technological familiarity influence ChatGPT adoption.

  17. šŸ¤– ChatGPT App Google Store Reviews

    • kaggle.com
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2023). šŸ¤– ChatGPT App Google Store Reviews [Dataset]. http://doi.org/10.34740/kaggle/ds/4017553
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fd7e02bf38f4b08df2508d6b6e42f3066%2Fchatgpt2.png?generation=1700233710310045&alt=media" alt="">

    Based on their wikipedia page

    ChatGPT (Chat Generative Pre-trained Transformer) is a large language model-based chatbot developed by OpenAI and launched on November 30, 2022, that enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.

    These reviews were extracted from Google Store App

    Usage

    This dataset should paint a good picture on what is the public's perception of the app over the years. Using this dataset, we can do the following

    1. Extract sentiments and trends
    2. Identify which version of the app had the most positive feedback, the worst.
    3. Use topic modeling to identify the pain points of the application.

    (AND MANY MORE!)

    Note

    Images generated using Bing Image Generator

  18. #ChatGPT 1000 Daily 🐦 Tweets

    • kaggle.com
    Updated May 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enric Domingo (2023). #ChatGPT 1000 Daily 🐦 Tweets [Dataset]. http://doi.org/10.34740/kaggle/dsv/5685262
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2023
    Dataset provided by
    Kaggle
    Authors
    Enric Domingo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    UPDATE: Due to new Twitter API conditions changed by Elon Musk, now it's no longer free to use the Twitter (X) API and the pricing is 100 $/month in the hobby plan. So my automated ETL notebook stopped from updating new tweets to this dataset on May 13th 2023.

    This dataset is was updated everyday with the addition of 1000 tweets/day containing any of the words "ChatGPT", "GPT3", or "GPT4", starting from the 3rd of April 2023. Everyday's tweets are uploaded 24-72h later, so the counter on tweets' likes, retweets, messages and impressions gets enough time to be relevant. Tweets are from any language selected randomly from all hours of the day. There are some basic filters applied trying to discard sensitive tweets and spam.

    This dataset can be used for many different applications regarding to Data Analysis and Visualization but also NLP Sentiment Analysis techniques and more.

    Consider upvoting this Dataset and the ETL scheduled Notebook providing new data everyday into it if you found them interesting, thanks! šŸ¤—

    Columns Description:

    • tweet_id: Integer. unique identifier for each tweet. Older tweets have smaller IDs.

    • tweet_created: Timestamp. Time of the tweet's creation.

    • tweet_extracted: Timestamp. The UTC time when the ETL pipeline pulled the tweet and its metadata (likes count, retweets count, etc).

    • text: String. The raw payload text from the tweet.

    • lang: String. Short name for the Tweet text's language.

    • user_id: Integer. Twitter's unique user id.

    • user_name: String. The author's public name on Twitter.

    • user_username: String. The author's Twitter account username (@example)

    • user_location: String. The author's public location.

    • user_description: String. The author's public profile's bio.

    • user_created: Timestamp. Timestamp of user's Twitter account creation.

    • user_followers_count: Integer. The number of followers of the author's account at the moment of the tweet extraction

    • user_following_count: Integer. The number of followed accounts from the author's account at the moment of the Tweet extraction

    • user_tweet_count: Integer. The number of Tweets that the author has published at the moment of the Tweet extraction.

    • user_verified: Boolean. True if the user is verified (blue mark).

    • source: The device/app used to publish the tweet (Apparently not working, all values are Nan so far).

    • retweet_count: Integer. Number of retweets to the Tweet at the moment of the Tweet extraction.

    • like_count: Integer. Number of Likes to the Tweet at the moment of the Tweet extraction.

    • reply_count: Integer. Number of reply messages to the Tweet.

    • impression_count: Integer. Number of times the Tweet has been seen at the moment of the Tweet extraction.

    More info: Tweets API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet Users API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user

  19. h

    chats-data-2023-10-16

    • huggingface.co
    Updated Oct 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Collective Cognition (2023). chats-data-2023-10-16 [Dataset]. https://huggingface.co/datasets/CollectiveCognition/chats-data-2023-10-16
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 16, 2023
    Dataset authored and provided by
    Collective Cognition
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for "Collective Cognition ChatGPT Conversations"

      Dataset Description
    
    
    
    
    
      Dataset Summary
    

    The "Collective Cognition ChatGPT Conversations" dataset is a collection of chat logs between users and the ChatGPT model. These conversations have been shared by users on the "Collective Cognition" website. The dataset provides insights into user interactions with language models and can be utilized for multiple purposes, including training, research, and… See the full description on the dataset page: https://huggingface.co/datasets/CollectiveCognition/chats-data-2023-10-16.

  20. m

    Composing alt text using large language models: dataset in Russian

    • data.mendeley.com
    Updated Jun 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yekaterina Kosova (2024). Composing alt text using large language models: dataset in Russian [Dataset]. http://doi.org/10.17632/73dptbyxbb.1
    Explore at:
    Dataset updated
    Jun 17, 2024
    Authors
    Yekaterina Kosova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the results of developing alternative text for images using chatbots based on large language models. The study was carried out in April-June 2024. Microsoft Copilot, Google Gemini, and YandexGPT chatbots were used to generate 108 text descriptions for 12 images. Descriptions were generated by chatbots using keywords specified by a person. The experts then rated the resulting descriptions on a Likert scale (from 1 to 5). The data set is presented in a Microsoft Excel table on the ā€œDataā€ sheet with the following fields: record number; image number; chatbot; image type (photo, logo); request date; list of keywords; number of keywords; length of keywords; time of compilation of keywords; generated descriptions; required length of descriptions; actual length of descriptions; description generation time; usefulness; reliability; completeness; accuracy; literacy. The ā€œImagesā€ sheet contains links to the original images. Data set is presented in Russian.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
robin shen (2023). Test dataset of ChatGPT in medical field [Dataset]. http://doi.org/10.57760/sciencedb.o00130.00001

Test dataset of ChatGPT in medical field

Explore at:
258 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 3, 2023
Dataset provided by
Science Data Bank
Authors
robin shen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The researcher tests the QA capability of ChatGPT in the medical field from the following aspects:1. Test their reserve capacity for medical knowledge2. Check their ability to read literature and understand medical literature3. Test their ability of auxiliary diagnosis after reading case data4. Test its error correction ability for case data5. Test its ability to standardize medical terms6. Test their evaluation ability to experts7. Check their ability to evaluate medical institutionsThe conclusion is:ChatGPT has great potential in the application of medical and health care, and may directly replace human beings or even professionals at a certain level in some fields;The researcher preliminarily believe that ChatGPT has basic medical knowledge and the ability of multiple rounds of dialogue, and its ability to understand Chinese is not weak;ChatGPT has the ability to read, understand and correct cases;ChatGPT has the ability of information extraction and terminology standardization, and is quite excellent;ChatGPT has the reasoning ability of medical knowledge;ChatGPT has the ability of continuous learning. After continuous training, its level has improved significantly;ChatGPT does not have the academic evaluation ability of Chinese medical talents, and the results are not ideal;ChatGPT does not have the academic evaluation ability of Chinese medical institutions, and the results are not ideal;ChatGPT is an epoch-making product, which can become a useful assistant for medical diagnosis and treatment, knowledge service, literature reading, review and paper writing.

Search
Clear search
Close search
Google apps
Main menu