58 datasets found
  1. i

    Chat-GPT Generated Sample Weather Data

    • ieee-dataport.org
    Updated Mar 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Outman (2023). Chat-GPT Generated Sample Weather Data [Dataset]. https://ieee-dataport.org/documents/chat-gpt-generated-sample-weather-data
    Explore at:
    Dataset updated
    Mar 27, 2023
    Authors
    Alexander Outman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    humidity

  2. Text sample datasets and AI detectors test results

    • figshare.com
    txt
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Popkov (2023). Text sample datasets and AI detectors test results [Dataset]. http://doi.org/10.6084/m9.figshare.24208443.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 18, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Andrey Popkov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes three distinct subsets of text:Open Access Academic Articles: A collection of 100 open-access articles from various academic journals focused on mental health and psychiatry published between 2016-2018. The articles are selected from reputable journals including JAMA, The Lancet Psychiatry, WPJ, and AM J Psy.ChatGPT-Generated Texts: Discussion section samples generated by ChatGPT (GPT-4 model, version as of August 3, 2023, OpenAI) that are designed to imitate the style and content of academic articles in the field of mental health and psychiatry.Claude-Generated Texts: Discussion section samples generated by Claude (Version 2, Anthropic) with the aim of imitating academic articles in the same field.Additionally, the dataset contains the results of tests performed using ZeroGPT and Originality.AI to evaluate the AI texts vs the academic articles for the percentage of texts identified as being AI-generated.Please cite this dataset if you make use of it in your research.

  3. h

    gpt-generated-news-sentences

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    joshua, gpt-generated-news-sentences [Dataset]. https://huggingface.co/datasets/joshuapsa/gpt-generated-news-sentences
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    joshua
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card

    This dataset was created solely for the purpose of code testing. This dataset was generated from prompting chatGPT to create sample pieces of news setences according to a topic. Sample prompt: "generate 50 sentences on the topic of "very recent breaking news on wars and conflicts events" with some sample location names. One example: "a missile struck near a residential building in Kiev last night, Russia denied Ukraine's accusations of attacking non-military targets""
 See the full description on the dataset page: https://huggingface.co/datasets/joshuapsa/gpt-generated-news-sentences.

  4. h

    chatgpt-paraphrases

    • huggingface.co
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Humarin (2023). chatgpt-paraphrases [Dataset]. https://huggingface.co/datasets/humarin/chatgpt-paraphrases
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 17, 2023
    Dataset authored and provided by
    Humarin
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    This is a dataset of paraphrases created by ChatGPT. Model based on this dataset is avaible: model

      We used this prompt to generate paraphrases
    

    Generate 5 similar paraphrases for this question, show it like a numbered list without commentaries: {text} This dataset is based on the Quora paraphrase question, texts from the SQUAD 2.0 and the CNN news dataset. We generated 5 paraphrases for each sample, totally this dataset has about 420k data rows. You can make 30 rows from a row from
 See the full description on the dataset page: https://huggingface.co/datasets/humarin/chatgpt-paraphrases.

  5. f

    Table_1_Evaluation of the quality and quantity of artificial...

    • frontiersin.figshare.com
    docx
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jisun Choi; Ah Ran Oh; Jungchan Park; Ryung A. Kang; Seung Yeon Yoo; Dong Jae Lee; Kwangmo Yang (2024). Table_1_Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0.DOCX [Dataset]. http://doi.org/10.3389/fmed.2024.1400153.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Frontiers
    Authors
    Jisun Choi; Ah Ran Oh; Jungchan Park; Ryung A. Kang; Seung Yeon Yoo; Dong Jae Lee; Kwangmo Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThe large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures.MethodsTwo anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0.ResultsRegarding quality, “appropriate” was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed “insufficient” in 59% of cases for 3.5, and “adequate” in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were − 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas.ConclusionChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.

  6. h

    awesome-chatgpt-prompts

    • huggingface.co
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Authors
    Fatih Kadir Akın
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    🧠 Awesome ChatGPT Prompts [CSV dataset]

    This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

      License
    

    CC-0

  7. Production Support Jira tickets samples

    • kaggle.com
    Updated Apr 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravindra Singh (2025). Production Support Jira tickets samples [Dataset]. https://www.kaggle.com/datasets/ravindrasingh/production-support-jira-tickets-samples
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ravindra Singh
    Description

    ChatGPT generated dataset - Jira Tickets samples from production support, you can use this sample to perform vector search or any other AI or model testing.

  8. f

    Data from: AI vs academia: Experimental study on AI text detectors’ accuracy...

    • tandf.figshare.com
    docx
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey A. Popkov; Tyson S. Barrett (2025). AI vs academia: Experimental study on AI text detectors’ accuracy in behavioral health academic writing [Dataset]. http://doi.org/10.6084/m9.figshare.25459810.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 12, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Andrey A. Popkov; Tyson S. Barrett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artificial Intelligence (AI) language models continue to expand in both access and capability. As these models have evolved, the number of academic journals in medicine and healthcare which have explored policies regarding AI-generated text has increased. The implementation of such policies requires accurate AI detection tools. Inaccurate detectors risk unnecessary penalties for human authors and/or may compromise the effective enforcement of guidelines against AI-generated content. Yet, the accuracy of AI text detection tools in identifying human-written versus AI-generated content has been found to vary across published studies. This experimental study used a sample of behavioral health publications and found problematic false positive and false negative rates from both free and paid AI detection tools. The study assessed 100 research articles from 2016–2018 in behavioral health and psychiatry journals and 200 texts produced by AI chatbots (100 by “ChatGPT” and 100 by “Claude”). The free AI detector showed a median of 27.2% for the proportion of academic text identified as AI-generated, while commercial software Originality.AI demonstrated better performance but still had limitations, especially in detecting texts generated by Claude. These error rates raise doubts about relying on AI detectors to enforce strict policies around AI text generation in behavioral health publications.

  9. d

    ChatGPT examples in the hydrological sciences

    • dataone.org
    • hydroshare.org
    • +1more
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Irvine (2023). ChatGPT examples in the hydrological sciences [Dataset]. http://doi.org/10.4211/hs.fc0552275ea14c7082218c42ebd63da6
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset provided by
    Hydroshare
    Authors
    Dylan Irvine
    Area covered
    Description

    ChatGPT has forever changed the way that many industries operate. Much of the focus of Artificial Intelligence (AI) has been on their ability to generate text. However, it is likely that their ability to generate computer codes and scripts will also have a major impact. We demonstrate the use of ChatGPT to generate Python scripts to perform hydrological analyses and highlight the opportunities, limitations and risks that AI poses in the hydrological sciences.

    Here, we provide four worked examples of the use of ChatGPT to generate scripts to conduct hydrological analyses. We also provide a full list of the libraries available to the ChatGPT Advanced Data Analysis plugin (only available in the paid version). These files relate to a manuscript that is to be submitted to Hydrological Processes. The authors of the manuscript are Dylan J. Irvine, Landon J.S. Halloran and Philip Brunner.

    If you find these examples useful and/or use them, we would appreciate if you could cite the associated publication in Hydrological Processes. Details to be made available upon final publication.

  10. P

    Human-ChatGPT texts Dataset

    • paperswithcode.com
    Updated Nov 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raghav Gaggar; Ashish Bhagchandani; Harsh Oza (2023). Human-ChatGPT texts Dataset [Dataset]. https://paperswithcode.com/dataset/human-chatgpt-texts
    Explore at:
    Dataset updated
    Nov 25, 2023
    Authors
    Raghav Gaggar; Ashish Bhagchandani; Harsh Oza
    Description

    A dataset including texts by humans (labeled 0) and then rephrased by ChatGPT (labeled 1), created to train models for machine-generated text detection.

    It is a robust dataset - includes text of various lengths, and the human texts are taken from multiple sources.

  11. h

    ChatGPT-4o-Writing-Prompts

    • huggingface.co
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gryphe Padar (2024). ChatGPT-4o-Writing-Prompts [Dataset]. https://huggingface.co/datasets/Gryphe/ChatGPT-4o-Writing-Prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2024
    Authors
    Gryphe Padar
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    ChatGPT-4o Writing Prompts

    This is a dataset containing 3746 short stories, generated with OpenAI's chatgpt-4o-latest model and using Reddit's Writing Prompts subreddit as a source. Each sample is generally between 6000-8000 characters long. These stories were thoroughly cleaned and then further enriched with a title and a series of applicable genres.
    Note that I did not touch the Markdown ChatGPT-4o produced by itself to enrich its output, as I very much enjoy the added flavour
 See the full description on the dataset page: https://huggingface.co/datasets/Gryphe/ChatGPT-4o-Writing-Prompts.

  12. o

    Awesome ChatGPT Prompts

    • opendatabay.com
    .csv
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Awesome ChatGPT Prompts [Dataset]. https://www.opendatabay.com/data/ai-ml/b19fe949-9f50-4a6e-ba87-7318e75458c2
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    Welcome to the "Awesome ChatGPT Prompts" dataset on Kaggle! This is a collection of prompt examples to be used with the ChatGPT model.

    The ChatGPT model is a large language model trained by OpenAI that is capable of generating human-like text. By providing it with a prompt, it can generate responses that continue the conversation or expand on the given prompt.

    License

    CC0

    Original Data Source: Awesome ChatGPT Prompts

  13. f

    Data Sheet 1_Large language models generating synthetic clinical datasets: a...

    • frontiersin.figshare.com
    xlsx
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 1_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

  14. Z

    Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with...

    • data.niaid.nih.gov
    • portaldelaciencia.uva.es
    Updated Nov 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier, Conde (2024). Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameters configurations [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10646081
    Explore at:
    Dataset updated
    Nov 16, 2024
    Dataset provided by
    Pedro, Reviriego
    Gonzalo, MartĂ­nez
    Elena, Merino
    Javier, Conde
    José Alberto, Hernåndez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    Prompts generated from ChatGPT3.5, ChatGPT4, Llama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameter configurations.

    The dataset is useful to study lexical aspects of LLMs with different parameters/roles configurations.

    The 0_Base_Topics.xlsx file lists the topics used for the dataset generation

    The rest of the files collect the answers of ChatGPT to these topics with different configurations of parameters/context:

    Temperature (parameter): Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

    Frequency penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

    Top probability (parameter): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

    Presence penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

    Roles (context)

    Default: No role is assigned to the LLM, the default role is used.

    Child: The LLM is requested to answer as a five-year-old child.

    Young adult male: The LLM is requested to answer as a young male adult.

    Young adult female: The LLM is requested to answer as a young female adult.

    Elderly adult male: The LLM is requested to answer as an elderly male adult.

    Elderly adult female: The LLM is requested to answer as an elderly female adult.

    Affluent adult male: The LLM is requested to answer as an affluent male adult.

    Affluent adult female: The LLM is requested to answer as an affluent female adult.

    Lower-class adult male: The LLM is requested to answer as a lower-class male adult.

    Lower-class adult female: The LLM is requested to answer as a lower-class female adult.

    Erudite: The LLM is requested to answer as an erudite who uses a rich vocabulary.

    Paper

    Paper: Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study

    Cite:

    @article{10.1145/3696459,author = {Mart\'{\i}nez, Gonzalo and Hern\'{a}ndez, Jos\'{e} Alberto and Conde, Javier and Reviriego, Pedro and Merino-G\'{o}mez, Elena},title = {Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study},year = {2024},publisher = {Association for Computing Machinery},address = {New York, NY, USA},issn = {2157-6904},url = {https://doi.org/10.1145/3696459},doi = {10.1145/3696459},abstract = ,note = {Just Accepted},journal = {ACM Trans. Intell. Syst. Technol.},month = sep,keywords = {LLM, Lexical diversity, ChatGPT, Evaluation}}

  15. E

    ChatGPT Statistics By Market, User, Price And Performance (2025)

    • electroiq.com
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Electro IQ (2025). ChatGPT Statistics By Market, User, Price And Performance (2025) [Dataset]. https://electroiq.com/stats/chatgpt-statistics/
    Explore at:
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Electro IQ
    License

    https://electroiq.com/privacy-policyhttps://electroiq.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    ChatGPT Statistics: In todayñ€ℱs technologically advancing world, Artificial Intelligence (AI) is no longer just science fiction; it has also become an integral part of everyday life. One of the most exciting examples of AI in action is ChatGPT, a powerful language model developed by OpenAI. ChatGPT is a conversational AI tool capable of generating human-like responses, assisting with a variety of tasks ranging from writing to coding, customer service, education, and more. In everyday life, the implementation of ChatGPT is growing enormously as it enables communication faster, smarter, and more intuitively.

    This article examines how ChatGPT operates and its statistical analysis from various perspectives, including its practical applications, and the evolving conversations surrounding its benefits, limitations, and future potential.

  16. Z

    Replication Package for "Improving the Readability of Generated Tests Using...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory Gay (2023). Replication Package for "Improving the Readability of Generated Tests Using GPT-4 and ChatGPT Code Interpreter" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8289841
    Explore at:
    Dataset updated
    Oct 5, 2023
    Dataset authored and provided by
    Gregory Gay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    While automated test generation can decrease the human burden associated with testing, it does not eliminate this burden. Humans must still work with generated test cases to interpret testing results, debug the code, build and maintain a comprehensive test suite, and many other tasks. Therefore, a major challenge with automated test generation is understandability of generated test test cases.

    Large language models (LLMs), machine learning models trained on massive corpora of textual data - including both natural language and programming languages - are an emerging technology with great potential for performing language-related predictive tasks such as translation, summarization, and decision support.

    In this study, we are exploring the capabilities of LLMs with regard to improving test case understandability.

    This package contains the data produced during this exploration:

    The examples directory contains the three case studies we tested our transformation process on:

    queue_example: Tests of a basic queue data structure

    httpie_sessions: Tests of the sessions module from the httpie project.

    string_utils_validation: Tests of the validation module from the python-string-utils project.

    Each directory contains the modules-under-test, the original test cases generated by Pynguin, and the transformed test cases.

    Two trials were performed per case example of the transformation technique to assess the impact of different results from the LLM.

    The survey directory contains the survey that was sent to assess the impact of the transformation on test readability.

    survey.pdf contains the survey questions.

    responses.xlsx contains the survey results.

  17. AH&AITD – Arslan’s Human and AI Text Database

    • figshare.com
    xlsx
    Updated May 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arslan Akram (2025). AH&AITD – Arslan’s Human and AI Text Database [Dataset]. http://doi.org/10.6084/m9.figshare.29144348.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 24, 2025
    Dataset provided by
    figshare
    Authors
    Arslan Akram
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AH&AITD is a comprehensive benchmark dataset designed to support the evaluation of AI-generated text detection tools. The dataset contains 11,580 samples spanning both human-written and AI-generated content across multiple domains. It was developed to address limitations in previous datasets, particularly in terms of diversity, scale, and real-world applicability. To facilitate research in the detection of AI-generated text by providing a diverse, multi-domain dataset. This dataset enables fair benchmarking of detection tools across various writing styles and content categories.Composition1. Human-Written Samples (Total: 5,790)Collected from:Open Web Text (2,343 samples)Blogs (196 samples)Web Text (397 samples)Q&A Platforms (670 samples)News Articles (430 samples)Opinion Statements (1,549 samples)Scientific Research Abstracts (205 samples)2. AI-Generated Samples (Total: 5,790)Generated using:ChatGPT (1,130 samples)GPT-4 (744 samples)Paraphrase Models (1,694 samples)GPT-2 (328 samples)GPT-3 (296 samples)DaVinci (GPT-3.5 variant) (433 samples)GPT-3.5 (364 samples)OPT-IML (406 samples)Flan-T5 (395 samples)Citation:Akram, A. (2023). AH&AITD: Arslan’s Human and AI Text Database. [Dataset]. Associated with the article: An Empirical Study of AI-Generated Text Detection Tools. Advances in Machine Learning & Artificial Intelligence, 4(2), 44–55.

  18. h

    code_exercises

    • huggingface.co
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jina AI (2023). code_exercises [Dataset]. https://huggingface.co/datasets/jinaai/code_exercises
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 7, 2023
    Dataset authored and provided by
    Jina AI
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for "code_exercises"

      Code exercise
    

    This dataset is composed of a diverse set of ~120k Python code exercises (~120m total tokens) generated by ChatGPT 3.5. It is designed to distill ChatGPT 3.5 knowledge about Python coding tasks into other (potentially smaller) models. The exercises have been generated by following the steps described in the related GitHub repository. The generated exercises follow the format of the Human Eval benchmark. Each training sample
 See the full description on the dataset page: https://huggingface.co/datasets/jinaai/code_exercises.

  19. f

    AI for Identifying Dismissive and Acceptive Acts in a Conversation Dataset...

    • figshare.com
    zip
    Updated Mar 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neuman Yair; Yohai Cohen (2025). AI for Identifying Dismissive and Acceptive Acts in a Conversation Dataset and model code [Dataset]. http://doi.org/10.6084/m9.figshare.28539149.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 5, 2025
    Dataset provided by
    figshare
    Authors
    Neuman Yair; Yohai Cohen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Human interactions involve dialogue acts that can be responded to by acceptive or dismissive acts. For example, a person sharing a painful past event uses the dialogue act of disclosure. The disclosure can be responded to empathically or by dismissing the pain. Here, we address the challenge of automatically identifying dismissive and acceptive dialogue acts in a conversation. We used massive AI-generated datasets of utterances and dialogues expressing acceptive/dismissive behavior to address the challenge. Next, we trained and tested several machine-learning models that performed highly in classifying utterances as acceptive or dismissive. The basic approach described in this paper can empower the development of automatic interactive systems in contexts ranging from artificial therapists to assistant robots for the elderly.

  20. h

    ChatGPT-Jailbreak-Prompts

    • huggingface.co
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rubén Darío Jaramillo Romero (2023). ChatGPT-Jailbreak-Prompts [Dataset]. https://huggingface.co/datasets/rubend18/ChatGPT-Jailbreak-Prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 19, 2023
    Authors
    Rubén Darío Jaramillo Romero
    Description

    Dataset Card for Dataset Name

      Name
    

    ChatGPT Jailbreak Prompts

      Dataset Summary
    

    ChatGPT Jailbreak Prompts is a complete collection of jailbreak related prompts for ChatGPT. This dataset is intended to provide a valuable resource for understanding and generating text in the context of jailbreaking in ChatGPT.

      Languages
    

    [English]

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alexander Outman (2023). Chat-GPT Generated Sample Weather Data [Dataset]. https://ieee-dataport.org/documents/chat-gpt-generated-sample-weather-data

Chat-GPT Generated Sample Weather Data

Explore at:
Dataset updated
Mar 27, 2023
Authors
Alexander Outman
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

humidity

Search
Clear search
Close search
Google apps
Main menu