100+ datasets found

H
Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its...
dataverse.harvard.edu
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jieshu Wang; Elif Kiran; Aurora Mai (also known as Mai P. Trinh); Michael Simeone; José Lobo (2024). Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its Performance in the Public Sector Workforce [Dataset]. http://doi.org/10.7910/DVN/P3CDHS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/P3CDHS
Dataset updated
May 31, 2024
Dataset provided by
Harvard Dataverse
Authors
Jieshu Wang; Elif Kiran; Aurora Mai (also known as Mai P. Trinh); Michael Simeone; José Lobo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This repository contains two datasets used in the study exploring the impact of Generative AI, specifically ChatGPT, on the public sector workforce in the United States. The datasets provide detailed information on the core tasks of public sector occupations and their estimated performance metrics, including potential for automation and augmentation by ChatGPT. These estimations are generated by OpenAI’s GPT-4 model (GPT-4-1106-preview) through OpenAI API.
s
Data from: ChatGPT in education: A discourse analysis of worries and...
socialmediaarchive.org
csv, json, txt
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). ChatGPT in education: A discourse analysis of worries and concerns on social media [Dataset]. https://socialmediaarchive.org/record/54
Explore at:
csv(6528597), json(248465998), txt(4908229)Available download formats
Dataset updated
Sep 26, 2023
Description
The rapid advancements in generative AI models present new opportunities in the education sector. However, it is imperative to acknowledge and address the potential risks and concerns that may arise with their use. We collected Twitter data to identify key concerns related to the use of ChatGPT in education. This dataset is used to support the study "ChatGPT in education: A discourse analysis of worries and concerns on social media."

In this study, we particularly explored two research questions. RQ1 (Concerns): What are the key concerns that Twitter users perceive with using ChatGPT in education? RQ2 (Accounts): Which accounts are implicated in the discussion of these concerns? In summary, our study underscores the importance of responsible and ethical use of AI in education and highlights the need for collaboration among stakeholders to regulate AI policy.
W
ChatGPT Usage Survey Data
webfx.com
Updated Sep 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebFX (2025). ChatGPT Usage Survey Data [Dataset]. https://www.webfx.com/blog/ai/chatgpt-usage-statistics/
Explore at:
Dataset updated
Sep 2, 2025
Dataset authored and provided by
WebFX
Variables measured
Average words in first message, Average words per ChatGPT conversation, Average number of messages per conversation, Percentage of conversations that are commands, Percentage of conversations that start as questions, Percentage of conversations in the "learning & understanding" category, Percentage of conversations using advanced features (persona assignment / data upload)
Description
Analysis of 13,252 publicly shared ChatGPT conversations by WebFX to uncover usage statistics - prompt length, message count, question vs command distribution, use-case categories.
t
Producing Charts with AI - Data Analysis
tomtunguz.com
Updated Jul 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomasz Tunguz (2023). Producing Charts with AI - Data Analysis [Dataset]. https://tomtunguz.com/data-analysis-gpt/
Explore at:
Dataset updated
Jul 17, 2023
Dataset provided by
Theory Ventures
Authors
Tomasz Tunguz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Discover how AI code interpreters are revolutionizing data visualization, reducing chart creation time from 20 to 5 minutes while simplifying complex statistical analysis.
d
How are Chat GPT and AI used in medical diagnosis
dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maher Asaad Baker (2023). How are Chat GPT and AI used in medical diagnosis [Dataset]. http://doi.org/10.7910/DVN/2HMJ58
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/2HMJ58
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Maher Asaad Baker
Description
The potential of using Chat GPT and AI to revolutionize the way we interact with computers, specifically in the field of medical diagnostics. Chat GPT can make conversations between doctors and patients more natural, while AI can analyze vast amounts of patient data to identify trends and estimate a patient’s health. Patients can use Chat GPT to better understand their medical conditions, and both Chat GPT and AI can be used to automate tasks such as scheduling appointments and processing test results. However, there are limitations to using AI, including data bias, complex results, and analysis errors. To reduce errors, it is important to validate findings using various techniques and ensure that data is accurate and up-to-date. Chat GPT also employs security measures to protect patient data privacy and confidentiality.
ChatGPT Reddit
kaggle.com
zip
Updated Jan 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Armita Razavi (2023). ChatGPT Reddit [Dataset]. https://www.kaggle.com/datasets/armitaraz/chatgpt-reddit/data
Explore at:
zip(5282154 bytes)Available download formats
Dataset updated
Jan 29, 2023
Authors
Armita Razavi
License
https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
Description
Here you can find about 50K comments on Reddit website regarding ChatGPT . The comments are gathered from Reddit's Posts from 4 subreddits.

The data includes comment_id, comment_parent_id, comment_body and subreddit

comment_id : the comment's id

comment_parent_id: the comment's id which the current comment is replied to.

comment_body: the comment

subreddit: the community/subreddit name of the comment

The Date and other information related to comments will be added in the next version. This dataset is useful to get insight about the public take on ChatGPT and also for text analysis, text visualizations, Inline Question Answering, Text Summarization, NER and other tasks like clustering and so on.

Please note that this dataset is not cleaned or preprocessed so if you want to get your hands dirty with data, it's a good practice to level up your skills in data cleaning too :)

And please don't forget to UPVOTE it in case you find it useful and enjoy it.
Table 1_Generative Artificial Intelligence for Data Analysis: A Randomised...
frontiersin.figshare.com
docx
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tafadzwa Dhokotera; Nandi Joubert; Aline Veillat; Christoph Pimmer; Karin Gross; Marco Waser; Jan Hattendorf; Julia Bohlius (2025). Table 1_Generative Artificial Intelligence for Data Analysis: A Randomised Controlled Trial in a Public Health Research Institute.docx [Dataset]. http://doi.org/10.3389/ijph.2025.1608572.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/ijph.2025.1608572.s001
Dataset updated
Oct 1, 2025
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Tafadzwa Dhokotera; Nandi Joubert; Aline Veillat; Christoph Pimmer; Karin Gross; Marco Waser; Jan Hattendorf; Julia Bohlius
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveTo assess the competence of students and academic staff to use generative artificial intelligence (GenAI) as a tool in epidemiological data analyses in a randomised controlled trial (RCT).MethodsWe invited postgraduate students and academic staff at the Swiss Tropical and Public Health Institute to the RCT. Participants were randomized to analyse a simulated cross-sectional dataset using ChatGPT’s code interpreter (integrated analysis arm) vs. a statistical software (R/Stata) with ChatGPT as a support tool (distributed analysis arm). The primary outcome was the trial task score (out of 17, using an assessment rubric). Secondary outcome was the time to complete the task.ResultsWe invited 338 and randomized 31 participants equally to the two study arms and 30 participants submitted results. Overall, there was no statistically significant difference in mean task scores between the distributed analysis arm (8.5, ±4.6) and the integrated analysis arm (9.4, ±3.8), with a mean difference of 0.93 (p = 0.55). Mean task completion time was significantly shorter in the integrated analysis arm compared to the distributed analysis arm.ConclusionWhile ChatGPT offers advantages, its effective use requires a careful balance of GenAI capabilities and human expertise.
t
ChatGPT Discussion Trends
tickertrends.io
html
Updated Oct 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TickerTrends (2025). ChatGPT Discussion Trends [Dataset]. https://tickertrends.io/chatgpt-trends
Explore at:
htmlAvailable download formats
Dataset updated
Oct 11, 2025
Dataset authored and provided by
TickerTrends
License
https://tickertrends.io/termshttps://tickertrends.io/terms
Time period covered
Nov 2022 - Present
Area covered
Global
Variables measured
Keyword Volume, Topic Mentions, Trend Momentum
Description
Monthly dataset tracking topic frequency, keyword volume, and conversation patterns across ChatGPT discussions. Data is normalized on a 0 to 100 scale for easy comparison. Aggregates millions of AI interactions to reveal emerging trends, user interests, and discussion momentum across technology, finance, health, education, and business categories.
500k ChatGPT-related Tweets Jan-Mar 2023
kaggle.com
zip
Updated Apr 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khalid Ansari (2023). 500k ChatGPT-related Tweets Jan-Mar 2023 [Dataset]. https://www.kaggle.com/datasets/khalidryder777/500k-chatgpt-tweets-jan-mar-2023/code
Explore at:
zip(49816658 bytes)Available download formats
Dataset updated
Apr 11, 2023
Authors
Khalid Ansari
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a CSV file related to ChatGPT including keywords(chatgpt, chat gpt) #hashtags and @mentions about ChatGPT. OpenAI's conversational AI model. The file includes information on 500,000 tweets. The dataset aims to help understand public opinion, trends, and potential applications of ChatGPT by analyzing tweet volume, sentiment, user engagement, and the influence of key AI events. The dataset offers valuable insights for companies, researchers, and policymakers, allowing them to make informed decisions and shape the future of AI-powered conversational technologies.

Check out my Comprehensive Analysis on this dataset: Medium article "Cracking the ChatGPT Code: A Deep Dive into 500,000 Tweets using Advanced NLP Techniques"

Learn about the collection process in Medium article "Effortlessly Scraping Massive Twitter Data"
Z
Collected Data of Evaluating ChatGPT for Detecting Security Vulnerabilities...
data-staging.niaid.nih.gov
Updated Jan 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alqaradaghi, Midya (2025). Collected Data of Evaluating ChatGPT for Detecting Security Vulnerabilities in Java Code [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14505161
Explore at:
Dataset updated
Jan 31, 2025
Authors
Alqaradaghi, Midya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the links to all the related experiments that I run related to my article titled Using "LLM for finding security vulnerabilities."
Data from: Academic Discourse on ChatGPT in Social Sciences: A Topic...
figshare.com
zip
Updated Jul 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qian Shen (2025). Academic Discourse on ChatGPT in Social Sciences: A Topic Modeling and Sentiment Analysis of Research Article Abstracts [Dataset]. http://doi.org/10.6084/m9.figshare.29625773.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29625773.v1
Dataset updated
Jul 23, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Qian Shen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the dataset and code used in the study titled “Academic Discourse on ChatGPT in Social Sciences: A Topic Modeling and Sentiment Analysis of Research Article Abstracts.” The study explores how social science scholars frame and evaluate ChatGPT by analyzing 1,227 SSCI-indexed abstracts using Latent Dirichlet Allocation (LDA) topic modeling and lexicon-based sentiment analysis. The data include the collected abstracts (with metadata), while the code files provide the full analytical pipeline in Python and R, covering preprocessing, topic modeling, sentiment scoring using the NRC Emotion Lexicon, and visualization scripts. This repository supports transparency, reproducibility, and reuse of the study’s computational methods and underlying materials.
4
Data associated with the article: "Exploring the Viability of ChatGPT for...
data.4tu.nl
zip
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nina van Staalduine, Data associated with the article: "Exploring the Viability of ChatGPT for Personal Data Anonymization in Government: A Comprehensive Analysis of Possibilities, Risks, and Ethical Implications" [Dataset]. http://doi.org/10.4121/a1dfacbe-b463-404f-a3d7-dab8485e6458.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/a1dfacbe-b463-404f-a3d7-dab8485e6458.v1
Dataset provided by
4TU.ResearchData
Authors
Nina van Staalduine
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
Feb 2023 - Jul 2023
Dataset funded by
Justitiële Informatiedienst
Description
Artificial Intelligence (AI) applications are expected to promote government service delivery and quality, more efficient handling of cases, and bias reduction in decision-making. One potential benefit of the AI tool ChatGPT is that it may support governments in the anonymization of data. However, it is not clear whether ChatGPT is appropriate to support data anonymization for public organizations. Hence, this study examines the possibilities, risks, and ethical implications for government organizations to employ ChatGPT in the anonymization of personal data. We use a case study approach, combining informal conversations, formal interviews, a literature review, document analysis and experiments to conduct a three-step study. First, we describe the technology behind ChatGPT and its operation. Second, experiments with three types of data (fake data, original literature and modified literature) show that ChatGPT exhibits strong performance in anonymizing these three types of texts. Third, an overview of significant risks and ethical issues related to ChatGPT and its use for anonymization within a specific government organization was generated, including themes such as privacy, responsibility, transparency, bias, human intervention, and sustainability. One significant risk in the current form of ChatGPT is a privacy risk, as inputs are stored and forwarded to OpenAI and potentially other parties. This is unacceptable if texts containing personal data are anonymized with ChatGPT. We discuss several potential solutions to address these risks and ethical issues. This study contributes to the scarce scientific literature on the potential value of employing ChatGPT for personal data anonymization in government. In addition, this study has practical value for civil servants who face the challenges of data anonymization in practice including resource-intensive and costly processes.
f
Data from: The impact of using ChatGPT on academic writing among medical...
datasetcatalog.nlm.nih.gov
tandf.figshare.com
Updated Nov 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang, Jingyu; Shu, Jiankun; Liao, Yuxuan; Wang, Rui; Zhang, Decai; Wang, Na; Liu, Shaojun (2024). The impact of using ChatGPT on academic writing among medical undergraduates [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001479573
Explore at:
Dataset updated
Nov 18, 2024
Authors
Wang, Jingyu; Shu, Jiankun; Liao, Yuxuan; Wang, Rui; Zhang, Decai; Wang, Na; Liu, Shaojun
Description
ChatGPT is widely used for writing tasks, yet its effects on medical students’ academic writing remain underexplored. This study aims to elucidate ChatGPT’s impact on academic writing efficiency and quality among medical students, while also evaluating students’ attitudes towards its use in academic writing. We collected systematic reviews from 130 third-year medical students and administered a questionnaire to assess ChatGPT usage and student attitudes. Three independent reviewers graded the papers using EASE guidelines, and statistical analysis compared articles generated with or without ChatGPT assistance across various parameters, with rigorous quality control ensuring survey reliability and validity. In this study, 33 students (25.8%) utilized ChatGPT for writing (ChatGPT group) and 95 (74.2%) did not (Control group). The ChatGPT group exhibited significantly higher daily technology use and prior experience with ChatGPT (p < 0.05). Writing time was significantly reduced in the ChatGPT group (p = 0.04), with 69.7% completing tasks within 2–3 days compared to 48.4% in the control group. They also achieved higher article quality scores (p < 0.0001) with improvements in completeness, credibility, and scientific content. Self-assessment indicated enhanced writing skills (p < 0.01), confidence (p < 0.001), satisfaction (p < 0.001) and a positive attitude toward its future use in the ChatGPT group. Integrating ChatGPT in medical academic writing, with proper guidance, improves efficiency and quality, illustrating artificial intelligence’s potential in shaping medical education methodologies.
f
Data Sheet 2_Large language models generating synthetic clinical datasets: a...
frontiersin.figshare.com
figshare.com
xlsx
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2025.1533508.s002
Dataset updated
Feb 5, 2025
Dataset provided by
Frontiers
Authors
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
89k ChatGPT conversations
kaggle.com
zip
Updated May 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noah Persaud (2023). 89k ChatGPT conversations [Dataset]. https://www.kaggle.com/datasets/noahpersaud/89k-chatgpt-conversations
Explore at:
zip(681600031 bytes)Available download formats
Dataset updated
May 4, 2023
Authors
Noah Persaud
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains all available conversations from chatlogs.net between users and ChatGPT. Version 1 contains all conversations available up to the cutoff date of April 4, 2023. Version 1 contains all conversations available up to the cutoff date of April 20, 2023.
f
S1 Data -
plos.figshare.com
xlsx
Updated Nov 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jun Qiu; Youlian Zhou (2024). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0311937.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0311937.s003
Dataset updated
Nov 20, 2024
Dataset provided by
PLOS ONE
Authors
Jun Qiu; Youlian Zhou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundChatGPT, developed by OpenAI, is an artificial intelligence software designed to generate text-based responses. The objective of this study is to evaluate the accuracy and consistency of ChatGPT’s responses to single-choice questions pertaining to carbon monoxide poisoning. This evaluation will contribute to our understanding of the reliability of ChatGPT-generated information in the medical field.MethodsThe questions utilized in this study were selected from the "Medical Exam Assistant (Yi Kao Bang)" application and encompassed a range of topics related to carbon monoxide poisoning. A total of 44 single-choice questions were included in the study following a screening process. Each question was entered into ChatGPT ten times in Chinese, followed by a translation into English, where it was also entered ten times. The responses generated by ChatGPT were subjected to statistical analysis with the objective of assessing their accuracy and consistency in both languages. In this assessment process, the "Medical Exam Assistant (Yi Kao Bang)" reference responses were employed as benchmarks. The data analysis was conducted using the Python.ResultsIn approximately 50% of the cases, the responses generated by ChatGPT exhibited a high degree of consistency, whereas in approximately one-third of the cases, the responses exhibited unacceptable blurring of the answers. Meanwhile, the accuracy of these responses was less favorable, with an accuracy rate of 61.1% in Chinese and 57% in English. This indicates that ChatGPT could be enhanced with respect to both consistency and accuracy in responding to queries pertaining to carbon monoxide poisoning.ConclusionsIt is currently evident that the consistency and accuracy of responses generated by ChatGPT regarding carbon monoxide poisoning is inadequate. Although it offers significant insights, it should not supersede the role of healthcare professionals in making clinical decisions.
m
Data from: ChatGPT as an education and learning tool for engineering,...
data.mendeley.com
Updated May 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RAVINDRA BHARDWAJ (2024). ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India [Dataset]. http://doi.org/10.17632/995zwcz5yt.1
Explore at:
Unique identifier
https://doi.org/10.17632/995zwcz5yt.1
Dataset updated
May 14, 2024
Authors
RAVINDRA BHARDWAJ
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
This is the raw data that is used in the publication: ChatGPT as an education and learning tool for engineering, technology and general studies: performance analysis of ChatGPT 3.0 on CSE, GATE and JEE examinations of India.
Z
Data from: Dataset from the study "Analysis of the accuracy of scientific...
data.niaid.nih.gov
producciocientifica.uv.es
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sixto-Costoya, Andrea; Liu, Yiming; Vidal-Cabo, Christian; Aleixandre- Benavent, Rafael; Valderrama-Zurián, Juan Carlos (2023). Dataset from the study "Analysis of the accuracy of scientific literature references provided by ChatGPT" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7788299
Explore at:
Dataset updated
Mar 31, 2023
Dataset provided by
Universidad Católica de Valencia San Vicente Màrtir
Instituto de Gestión de la Innovación y del Conocimiento – Ingenio (CSIC-Universitat Politécnica de València)
Universitat de València
Authors
Sixto-Costoya, Andrea; Liu, Yiming; Vidal-Cabo, Christian; Aleixandre- Benavent, Rafael; Valderrama-Zurián, Juan Carlos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset corresponds to the study carried out to analyse 10 bibliographic references of 10 Spanish authors in the field of Information Sciences requested to the ChatGPT chatbot.

The file "Bibliographic_references_ analysis" contains the 10 references returned by ChatGPT for each of the 10 authors (a total of 100 references), together with the variables analysed to check their authenticity.

The "Keywords_analysis" file contains the normalisation carried out on the words considered to be key words extracted from the titles of the works, according to which a word cloud showing the frequency of occurrence could be drawn up.
H
ChatGPT examples in the hydrological sciences
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Irvine (2023). ChatGPT examples in the hydrological sciences [Dataset]. http://doi.org/10.4211/hs.fc0552275ea14c7082218c42ebd63da6
Explore at:
zip(1.3 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.fc0552275ea14c7082218c42ebd63da6
Dataset updated
Oct 9, 2023
Dataset provided by
HydroShare
Authors
Dylan Irvine
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
WGS 84 EPSG:4326,
Description
ChatGPT has forever changed the way that many industries operate. Much of the focus of Artificial Intelligence (AI) has been on their ability to generate text. However, it is likely that their ability to generate computer codes and scripts will also have a major impact. We demonstrate the use of ChatGPT to generate Python scripts to perform hydrological analyses and highlight the opportunities, limitations and risks that AI poses in the hydrological sciences.

Here, we provide four worked examples of the use of ChatGPT to generate scripts to conduct hydrological analyses. We also provide a full list of the libraries available to the ChatGPT Advanced Data Analysis plugin (only available in the paid version). These files relate to a manuscript that is to be submitted to Hydrological Processes. The authors of the manuscript are Dylan J. Irvine, Landon J.S. Halloran and Philip Brunner.

If you find these examples useful and/or use them, we would appreciate if you could cite the associated publication in Hydrological Processes. Details to be made available upon final publication.
4
Supplementary data for the paper 'Can ChatGPT be used to predict citation...
data.4tu.nl
zip
Updated Jan 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joost de Winter (2024). Supplementary data for the paper 'Can ChatGPT be used to predict citation counts, readership, and social media interaction? An exploration among 2222 scientific abstracts' [Dataset]. http://doi.org/10.4121/710585da-ed2e-4d36-b8e4-ad02c3af1e65.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/710585da-ed2e-4d36-b8e4-ad02c3af1e65.v1
Dataset updated
Jan 5, 2024
Dataset provided by
4TU.ResearchData
Authors
Joost de Winter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study explores the potential of ChatGPT, a large language model, in scientometrics by assessing its ability to predict citation counts, Mendeley readers, and social media engagement. In this study, 2222 abstracts from PLOS ONE articles published during the initial months of 2022 were analyzed using ChatGPT-4, which used a set of 60 criteria to assess each abstract. Using a principal component analysis, three components were identified: Quality and Reliability, Accessibility and Understandability, and Novelty and Engagement. The Accessibility and Understandability of the abstracts correlated with higher Mendeley readership, while Novelty and Engagement and Accessibility and Understandability were linked to citation counts (Dimensions, Scopus, Google Scholar) and social media attention. Quality and Reliability showed minimal correlation with citation and altmetrics outcomes. Finally, it was found that the predictive correlations of ChatGPT-based assessments surpassed traditional readability metrics. The findings highlight the potential of large language models in scientometrics and possibly pave the way for AI-assisted peer review.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jieshu Wang; Elif Kiran; Aurora Mai (also known as Mai P. Trinh); Michael Simeone; José Lobo (2024). Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its Performance in the Public Sector Workforce [Dataset]. http://doi.org/10.7910/DVN/P3CDHS

Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its Performance in the Public Sector Workforce

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.7910/DVN/P3CDHS

Dataset updated

May 31, 2024

Dataset provided by

Harvard Dataverse

Authors

Jieshu Wang; Elif Kiran; Aurora Mai (also known as Mai P. Trinh); Michael Simeone; José Lobo

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This repository contains two datasets used in the study exploring the impact of Generative AI, specifically ChatGPT, on the public sector workforce in the United States. The datasets provide detailed information on the core tasks of public sector occupations and their estimated performance metrics, including potential for automation and augmentation by ChatGPT. These estimations are generated by OpenAI’s GPT-4 model (GPT-4-1106-preview) through OpenAI API.

Clear search

Close search

Google apps

Main menu

Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its...

Data from: ChatGPT in education: A discourse analysis of worries and...

ChatGPT Usage Survey Data

Producing Charts with AI - Data Analysis

How are Chat GPT and AI used in medical diagnosis

ChatGPT Reddit

Table 1_Generative Artificial Intelligence for Data Analysis: A Randomised...

ChatGPT Discussion Trends

500k ChatGPT-related Tweets Jan-Mar 2023

Collected Data of Evaluating ChatGPT for Detecting Security Vulnerabilities...

Data from: Academic Discourse on ChatGPT in Social Sciences: A Topic...

Data associated with the article: "Exploring the Viability of ChatGPT for...

Data from: The impact of using ChatGPT on academic writing among medical...

Data Sheet 2_Large language models generating synthetic clinical datasets: a...

89k ChatGPT conversations

S1 Data -

Data from: ChatGPT as an education and learning tool for engineering,...

Data from: Dataset from the study "Analysis of the accuracy of scientific...

ChatGPT examples in the hydrological sciences

Supplementary data for the paper 'Can ChatGPT be used to predict citation...

Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its Performance in the Public Sector Workforce