21 datasets found
  1. d

    How are Chat GPT and AI used in medical diagnosis

    • dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maher Asaad Baker (2023). How are Chat GPT and AI used in medical diagnosis [Dataset]. http://doi.org/10.7910/DVN/2HMJ58
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Maher Asaad Baker
    Description

    The potential of using Chat GPT and AI to revolutionize the way we interact with computers, specifically in the field of medical diagnostics. Chat GPT can make conversations between doctors and patients more natural, while AI can analyze vast amounts of patient data to identify trends and estimate a patient’s health. Patients can use Chat GPT to better understand their medical conditions, and both Chat GPT and AI can be used to automate tasks such as scheduling appointments and processing test results. However, there are limitations to using AI, including data bias, complex results, and analysis errors. To reduce errors, it is important to validate findings using various techniques and ensure that data is accurate and up-to-date. Chat GPT also employs security measures to protect patient data privacy and confidentiality.

  2. t

    ChatGPT Discussion Trends

    • tickertrends.io
    html
    Updated Oct 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TickerTrends (2025). ChatGPT Discussion Trends [Dataset]. https://tickertrends.io/chatgpt-trends
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Oct 11, 2025
    Dataset authored and provided by
    TickerTrends
    License

    https://tickertrends.io/termshttps://tickertrends.io/terms

    Time period covered
    Nov 2022 - Present
    Area covered
    Global
    Variables measured
    Keyword Volume, Topic Mentions, Trend Momentum
    Description

    Monthly dataset tracking topic frequency, keyword volume, and conversation patterns across ChatGPT discussions. Data is normalized on a 0 to 100 scale for easy comparison. Aggregates millions of AI interactions to reveal emerging trends, user interests, and discussion momentum across technology, finance, health, education, and business categories.

  3. 500k ChatGPT-related Tweets Jan-Mar 2023

    • kaggle.com
    zip
    Updated Apr 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khalid Ansari (2023). 500k ChatGPT-related Tweets Jan-Mar 2023 [Dataset]. https://www.kaggle.com/datasets/khalidryder777/500k-chatgpt-tweets-jan-mar-2023/code
    Explore at:
    zip(49816658 bytes)Available download formats
    Dataset updated
    Apr 11, 2023
    Authors
    Khalid Ansari
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a CSV file related to ChatGPT including keywords(chatgpt, chat gpt) #hashtags and @mentions about ChatGPT. OpenAI's conversational AI model. The file includes information on 500,000 tweets. The dataset aims to help understand public opinion, trends, and potential applications of ChatGPT by analyzing tweet volume, sentiment, user engagement, and the influence of key AI events. The dataset offers valuable insights for companies, researchers, and policymakers, allowing them to make informed decisions and shape the future of AI-powered conversational technologies.

    Check out my Comprehensive Analysis on this dataset: Medium article "Cracking the ChatGPT Code: A Deep Dive into 500,000 Tweets using Advanced NLP Techniques"

    Learn about the collection process in Medium article "Effortlessly Scraping Massive Twitter Data"

  4. DeepSeek vs ChatGPT: AI Platform Comparison

    • kaggle.com
    zip
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aakif Khan (2025). DeepSeek vs ChatGPT: AI Platform Comparison [Dataset]. https://www.kaggle.com/datasets/khanaakif/deepseek-vs-chatgpt-ai-platform-comparison
    Explore at:
    zip(529634 bytes)Available download formats
    Dataset updated
    Feb 24, 2025
    Authors
    Aakif Khan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    DeepSeek vs. ChatGPT: AI Performance & User Behavior (July 2023 - Feb 2025)

    This synthetically generated dataset provides a realistic AI performance comparison between ChatGPT (GPT-4-turbo) and DeepSeek (DeepSeek-Chat 1.5) over a 1.5-year period. With 10,000+ rows, it captures key user interaction metrics, platform performance indicators, and AI response characteristics to analyze trends in accuracy, engagement, and adoption.

    Key Features:

    • Time-Series Ready – Granular date and time columns for trend and seasonality analysis.
    • Comparative AI Analysis – Compare user engagement, retention rates, and response quality.
    • User Behavior Insights – Analyze session durations, input text complexity, and user ratings.
    • Technical Performance Metrics – Evaluate AI response accuracy and processing speed.
    • Data Cleaning Practice – Includes intentionally introduced null values for preprocessing exercises.

    Ideal For:

    • AI benchmarking and platform performance studies
    • Time-series forecasting and trend analysis
    • Data preprocessing and feature engineering
    • Power BI, SQL, and Python-based analytical dashboards

    📜 License: MIT – Free for research, projects, and analysis.

  5. S

    Chat GPT Data

    • scidb.cn
    Updated Aug 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emmanuel Mensah Kparl; Iddris Faisal (2024). Chat GPT Data [Dataset]. http://doi.org/10.57760/sciencedb.11927
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Emmanuel Mensah Kparl; Iddris Faisal
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This if the data we used for our analysis

  6. b

    ChatGPT Revenue and Usage Statistics (2025)

    • businessofapps.com
    Updated Feb 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Business of Apps (2023). ChatGPT Revenue and Usage Statistics (2025) [Dataset]. https://www.businessofapps.com/data/chatgpt-statistics/
    Explore at:
    Dataset updated
    Feb 9, 2023
    Dataset authored and provided by
    Business of Apps
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...

  7. 89k ChatGPT conversations

    • kaggle.com
    zip
    Updated May 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noah Persaud (2023). 89k ChatGPT conversations [Dataset]. https://www.kaggle.com/datasets/noahpersaud/89k-chatgpt-conversations
    Explore at:
    zip(681600031 bytes)Available download formats
    Dataset updated
    May 4, 2023
    Authors
    Noah Persaud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all available conversations from chatlogs.net between users and ChatGPT. Version 1 contains all conversations available up to the cutoff date of April 4, 2023. Version 1 contains all conversations available up to the cutoff date of April 20, 2023.

  8. f

    Data Sheet 1_Free word association analysis of students' perception of...

    • frontiersin.figshare.com
    pdf
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marvin Henrich; Sandra Formella-Zimmermann; Sebastian Schneider; Paul Wilhelm Dierkes (2025). Data Sheet 1_Free word association analysis of students' perception of artificial intelligence.pdf [Dataset]. http://doi.org/10.3389/feduc.2025.1543746.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset provided by
    Frontiers
    Authors
    Marvin Henrich; Sandra Formella-Zimmermann; Sebastian Schneider; Paul Wilhelm Dierkes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study aims to explore students' associations with Artificial Intelligence (AI) and how these perceptions have evolved following the release of Chat GPT. A free word association test was conducted with 836 German high school students aged 10–20. Associations were collected before and after the release of Chat GPT, processed, cleaned, and inductively categorized into nine groups: technical association, assistance system, future, human, negative, positive, artificial, others, and no association. In total, 355 distinct terms were mentioned, with “robot” emerging as the most frequently cited, followed by “computer” and “Chat GPT,” indicating a strong connection between AI and technological applications. The release of Chat GPT had a significant impact on students' associations, with a marked increase in mentions of Chat GPT and related assistance systems, such as Siri and Snapchat AI. The results reveal a shift in students' perception of AI-from abstract, futuristic concepts to more immediate, application-based associations. Network analysis further demonstrated how terms were semantically clustered, emphasizing the prominence of assistance systems in students' conceptions. The findings underscore the importance of integrating AI education that fosters both critical reflection and practical understanding of AI, encouraging responsible engagement with the technology. These insights are crucial for shaping the future of AI literacy in schools and universities.

  9. ChatGPT global web traffic 2022-2024

    • statista.com
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). ChatGPT global web traffic 2022-2024 [Dataset]. https://www.statista.com/statistics/1463713/chatgpt-chat-openai-com-monthly-visits/
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Apr 2023 - Mar 2025
    Area covered
    Worldwide
    Description

    In March 2025, ChatGPT.com received approximately *** billion visits from users worldwide. The most recent year under analysis has seen an increase in traffic to OpenAI's artificial intelligence chatbot. This is the highest traffic volume achieved by the site to date, with values for the most recent analyzed month exceeding twice the average monthly visits for the entire examined period between April 2023 and April 2024.

  10. Z

    Data from: HoneyBee: Progressive Instruction Finetuning of Large Language...

    • data.niaid.nih.gov
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Song, Yu; Miret, Santiago; Zhang, Huan; Liu, Bang (2023). HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10119841
    Explore at:
    Dataset updated
    Nov 13, 2023
    Authors
    Song, Yu; Miret, Santiago; Zhang, Huan; Liu, Bang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We propose an instruction-based process for trustworthy data curation in materials science (MatSci-Instruct), which we then apply to finetune a LLaMa-based language model targeted for materials science (HoneyBee). MatSci-Instruct helps alleviate the scarcity of relevant, high-quality materials science textual data available in the open literature, and HoneyBee is the first billion-parameter language model specialized to materials science. In MatSci-Instruct we improve the trustworthiness of generated data by prompting multiple commercially available large language models for generation with an Instructor module (e.g. Chat-GPT) and verification from an independent Verifier module (e.g. Claude). Using MatSci-Instruct, we construct a dataset of multiple tasks and measure the quality of our dataset along multiple dimensions, including accuracy against known facts, relevance to materials science, as well as completeness and reasonableness of the data. Moreover, we iteratively generate more targeted instructions and instruction-data in a finetuning-evaluation-feedback loop leading to progressively better performance for our finetuned HoneyBee models. Our evaluation on the MatSci-NLP benchmark shows HoneyBee's outperformance of existing language models on materials science tasks and iterative improvement in successive stages of instruction-data refinement. We study the quality of HoneyBee's language modeling through automatic evaluation and analyze case studies to further understand the model's capabilities and limitations. Our code and relevant datasets are publicly available at https://github.com/BangLab-UdeM-Mila/NLP4MatSci-HoneyBee.

  11. Green Future Data

    • kaggle.com
    zip
    Updated Feb 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patricia Webb (2024). Green Future Data [Dataset]. https://www.kaggle.com/datasets/patriciawebb/green-future-data
    Explore at:
    zip(28896 bytes)Available download formats
    Dataset updated
    Feb 25, 2024
    Authors
    Patricia Webb
    Description

    This is a mock dataset from Chat GPT

    You are a data analyst hired by GreenFuture Inc., an environmental consultancy firm. GreenFuture Inc. specializes in assessing the environmental impact of businesses in various sectors, focusing on carbon footprint, waste management, and energy efficiency. They've collected data from several companies and want to analyze this data to identify trends, potential areas for improvement, and industry benchmarks.

    Suggested Analysis Tasks:

    Emission Trends Analysis: Calculate total emissions (CO2, CH4, N2O) per year for all companies. Identify the top 5 companies with the highest CO2 emissions in the latest year available. Sector-wise Energy Use Analysis

    Aggregate energy use (Electricity, Fossil Fuels, Renewables) by sector and year: Determine the sector with the highest reliance on fossil fuels versus renewables.

    Waste Management Efficiency: Calculate the percentage of waste recycled for each company and find the top 3 companies with the best recycling rates. Analyze trends in waste management practices over the years.

    Correlation Analysis: Explore the correlation between energy use types (Electricity, Fossil Fuels, Renewables) and emissions (CO2) to identify patterns.

    Actionable Insights: Based on the analysis, suggest actionable insights for companies to reduce their environmental impact. This might involve recommendations for sectors or companies lagging in renewable energy use or those with inefficient waste management practices.

  12. f

    Data Sheet 1_A comparison of the diagnostic ability of large language models...

    • frontiersin.figshare.com
    application/csv
    Updated Aug 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Palwasha Khan; Eoin Daniel O’Sullivan (2024). Data Sheet 1_A comparison of the diagnostic ability of large language models in challenging clinical cases.csv [Dataset]. http://doi.org/10.3389/frai.2024.1379297.s001
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Aug 5, 2024
    Dataset provided by
    Frontiers
    Authors
    Maria Palwasha Khan; Eoin Daniel O’Sullivan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThe rise of accessible, consumer facing large language models (LLM) provides an opportunity for immediate diagnostic support for clinicians.ObjectivesTo compare the different performance characteristics of common LLMS utility in solving complex clinical cases and assess the utility of a novel tool to grade LLM output.MethodsUsing a newly developed rubric to assess the models’ diagnostic utility, we measured to models’ ability to answer cases according to accuracy, readability, clinical interpretability, and an assessment of safety. Here we present a comparative analysis of three LLM models—Bing, Chat GPT, and Gemini—across a diverse set of clinical cases as presented in the New England Journal of Medicines case series.ResultsOur results suggest that models performed differently when presented with identical clinical information, with Gemini performing best. Our grading tool had low interobserver variability and proved a reliable tool to grade LLM clinical output.ConclusionThis research underscores the variation in model performance in clinical scenarios and highlights the importance of considering diagnostic model performance in diverse clinical scenarios prior to deployment. Furthermore, we provide a new tool to assess LLM output.

  13. A

    AI Skincare Advisor Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). AI Skincare Advisor Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-skincare-advisor-504886
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming AI Skincare Advisor market! Learn about its projected $2.5 billion valuation by 2033, key growth drivers, leading companies like Reveive and Bioderma, and the challenges ahead. Get insights into personalized skincare, AI technology, and market trends.

  14. 🚐 AutoScout Data

    • kaggle.com
    zip
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). 🚐 AutoScout Data [Dataset]. https://www.kaggle.com/datasets/mexwell/autoscout-data/data
    Explore at:
    zip(505880 bytes)Available download formats
    Dataset updated
    Jun 26, 2024
    Authors
    mexwell
    Description

    The AutoScout Data dataset is a comprehensive collection of information on various vehicles, capturing a wide array of attributes that can be used for analysis in automotive research, market analysis, and machine learning projects related to car sales and valuations. Here is an overview of the dataset columns:

    make_model: This column contains the make and model of the vehicle. It is a categorical variable that identifies the brand and specific model of the car.

    body_type: This column describes the body type of the vehicle (e.g., sedan, SUV, hatchback). It is a categorical variable representing the shape and style of the car.

    price: This column lists the price of the vehicle in the respective currency. It is a numerical variable crucial for market valuation studies.

    vat: This column indicates whether the price includes VAT (Value Added Tax). It is a categorical variable, typically showing "Yes" or "No."

    km: This column shows the mileage of the vehicle in kilometers. It is a numerical variable that indicates the usage level of the car.

    Type: This column describes the type of car, such as "new," "used," or "demonstration." It is a categorical variable.

    Fuel: This column specifies the type of fuel the vehicle uses, such as petrol, diesel, electric, etc. It is a categorical variable.

    Gears: This column indicates the number of gears the vehicle has. It is a numerical variable.

    Comfort_Convenience: This column lists comfort and convenience features of the vehicle, such as air conditioning, heated seats, etc. It is a categorical variable with multiple entries.

    Entertainment_Media: This column describes the entertainment and media features available in the car, like Bluetooth, radio, CD player, etc. It is a categorical variable with multiple entries.

    Extras: This column details any extra features of the car, which might include things like alloy wheels, roof rails, etc. It is a categorical variable with multiple entries.

    Safety_Security: This column includes information on safety and security features of the vehicle, such as airbags, ABS, etc. It is a categorical variable with multiple entries.

    age: This column indicates the age of the vehicle in years. It is a numerical variable.

    Previous_Owners: This column shows the number of previous owners of the vehicle. It is a numerical variable.

    hp_kW: This column lists the horsepower of the vehicle in kilowatts. It is a numerical variable indicative of the car’s power.

    Inspection_new: This column indicates whether the vehicle has a new inspection. It is a categorical variable, typically showing "Yes" or "No."

    Paint_Type: This column specifies the type of paint of the vehicle, such as metallic, matte, etc. It is a categorical variable.

    Upholstery_type: This column describes the type of upholstery in the vehicle, such as leather, fabric, etc. It is a categorical variable.

    Gearing_Type: This column indicates the type of gearing system, such as manual, automatic, etc. It is a categorical variable.

    Displacement_cc: This column lists the engine displacement in cubic centimeters (cc). It is a numerical variable.

    Weight_kg: This column shows the weight of the vehicle in kilograms. It is a numerical variable.

    Drive_chain: This column describes the drive chain of the vehicle, such as front-wheel drive, rear-wheel drive, etc. It is a categorical variable.

    cons_comb: This column indicates the combined fuel consumption of the vehicle. It is a numerical variable representing fuel efficiency.

    Usage and Potential Analyses This dataset can be used for various analyses, including:

    • Price Prediction: Building models to predict vehicle prices based on various attributes.
    • Market Analysis: Understanding trends in vehicle types, features, and pricing.
    • Customer Preferences: Analyzing which features are most common or popular among different types of vehicles.
    • Vehicle Performance: Studying the relationship between engine displacement, horsepower, and fuel consumption.

    The dataset's rich set of features allows for detailed exploration and insights into the automotive market, providing valuable information for consumers, dealers, and manufacturers alike.

    Text generated with Chat-GPT

    Acknowlegement

    Foto von Alev Takil auf Unsplash

  15. h

    awesome-chatgpt-prompts

    • huggingface.co
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Authors
    Fatih Kadir Akın
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    🧠 Awesome ChatGPT Prompts [CSV dataset]

    This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

      License
    

    CC-0

  16. h

    Turkish-Chat_GPT-4O

    • huggingface.co
    Updated Nov 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asha (2024). Turkish-Chat_GPT-4O [Dataset]. https://huggingface.co/datasets/Quardo/Turkish-Chat_GPT-4O
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2024
    Authors
    Asha
    License

    https://choosealicense.com/licenses/wtfpl/https://choosealicense.com/licenses/wtfpl/

    Area covered
    Türkiye
    Description

    Quardo/Turkish-Chat_GPT-4O

      Description
    

    This is a simple dataset generated by OpenAI's GPT-4O (gpt-4o-2024-08-06). The dataset includes various entries created and evaluated by the AI model, providing a unique collection of Turkish chat data for analysis and research.

      Warning
    

    Please note that this dataset may contain errors or inconsistencies as it is fully generated by an AI model. It is highly recommended to check and edit the data before usage, as AI can… See the full description on the dataset page: https://huggingface.co/datasets/Quardo/Turkish-Chat_GPT-4O.

  17. S

    ChatGPT vs. Google Gemini Statistics 2025: Head-to-Head AI Trends

    • sqmagazine.co.uk
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SQ Magazine (2025). ChatGPT vs. Google Gemini Statistics 2025: Head-to-Head AI Trends [Dataset]. https://sqmagazine.co.uk/chatgpt-vs-google-gemini-statistics/
    Explore at:
    Dataset updated
    Oct 7, 2025
    Dataset authored and provided by
    SQ Magazine
    License

    https://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/

    Time period covered
    Jan 1, 2024 - Dec 31, 2025
    Area covered
    Global
    Description

    The rivalry between ChatGPT and Google Gemini defines the generative AI landscape. ChatGPT remains the leader in active engagement, while Gemini closes the gap through mass distribution. From corporate reports to web traffic studies, figures speak clearly about adoption, reach, and momentum. Explore what makes each platform stand out, and what...

  18. ChatGPT Classification Dataset

    • kaggle.com
    zip
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi (2023). ChatGPT Classification Dataset [Dataset]. https://www.kaggle.com/datasets/mahdimaktabdar/chatgpt-classification-dataset
    Explore at:
    zip(718710 bytes)Available download formats
    Dataset updated
    Sep 7, 2023
    Authors
    Mahdi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).

    This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).

    We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:

    Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.

    https://www.techrxiv.org/users/692552/articles/682641/master/file/data/ChatGPT_generated_Content_Detection/ChatGPT_generated_Content_Detection.pdf

  19. h

    Bitext-telco-llm-chatbot-training-dataset

    • huggingface.co
    Updated Jan 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2025). Bitext-telco-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-telco-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2025
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Telco Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [telco] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An overview of… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-telco-llm-chatbot-training-dataset.

  20. Data from: AstroChat

    • kaggle.com
    • huggingface.co
    zip
    Updated Jun 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    astro_pat (2024). AstroChat [Dataset]. https://www.kaggle.com/datasets/patrickfleith/astrochat
    Explore at:
    zip(1214166 bytes)Available download formats
    Dataset updated
    Jun 9, 2024
    Authors
    astro_pat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose and Scope

    The AstroChat dataset is a collection of 901 dialogues, synthetically generated, tailored to the specific domain of Astronautics / Space Mission Engineering. This dataset will be frequently updated following feedback from the community. If you would like to contribute, please reach out in the community discussion.

    Intended Use

    The dataset is intended to be used for supervised fine-tuning of chat LLMs (Large Language Models). Due to its currently limited size, you should use a pre-trained instruct model and ideally augment the AstroChat dataset with other datasets in the area of (Science Technology, Engineering and Math).

    Quickstart

    To be completed

    DATASET DESCRIPTION

    Access

    Structure

    901 generated conversations between a simulated user and AI-assistant (more on the generation method below). Each instance is made of the following field (column): - id: a unique identifier to refer to this specific conversation. Useeful for traceability purposes, especially for further processing task or merge with other datasets. - topic: a topic within the domain of Astronautics / Space Mission Engineering. This field is useful to filter the dataset by topic, or to create a topic-based split. - subtopic: a subtopic of the topic. For instance in the topic of Propulsion, there are subtopics like Injector Design, Combustion Instability, Electric Propulsion, Chemical Propulsion, etc. - persona: description of the persona used to simulate a user - opening_question: the first question asked by the user to start a conversation with the AI-assistant - messages: the whole conversation messages between the user and the AI assistant in already nicely formatted for rapid use with the transformers library. A list of messages where each message is a dictionary with the following fields: - role: the role of the speaker, either user or assistant - content: the message content. For the assistant, it is the answer to the user's question. For the user, it is the question asked to the assistant.

    Important See the full list of topics and subtopics covered below.

    Metadata

    Dataset is version controlled and commits history is available here: https://huggingface.co/datasets/patrickfleith/AstroChat/commits/main

    Generation Method

    We used a method inspired from Ultrachat dataset. Especially, we implemented our own version of Human-Model interaction from Sector I: Questions about the World of their paper:

    Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., ... & Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.

    Step-by-step description

    • Defined a set of user persona
    • Defined a set of topics/ disciplines within the domain of Astronautics / Space Mission Engineering
    • For each topics, we defined a set of subtopics to narrow down the conversation to more specific and niche conversations (see below the full list)
    • For each subtopic we generate a set of opening questions that the user could ask to start a conversation (see below the full list)
    • We then distil the knowledge of an strong Chat Model (in our case ChatGPT through then api with gpt-4-turbo model) to generate the answers to the opening questions
    • We simulate follow-up questions from the user to the assistant, and the assistant's answers to these questions which builds up the messages.

    Future work and contributions appreciated

    • Distil knowledge from more models (Anthropic, Mixtral, GPT-4o, etc...)
    • Implement more creativity in the opening questions and follow-up questions
    • Filter-out questions and conversations which are too similar
    • Ask topic and subtopic expert to validate the generated conversations to have a sense on how reliable is the overall dataset

    Languages

    All instances in the dataset are in english

    Size

    901 synthetically-generated dialogue

    USAGE AND GUIDELINES

    License

    AstroChat © 2024 by Patrick Fleith is licensed under Creative Commons Attribution 4.0 International

    Restrictions

    No restriction. Please provide the correct attribution following the license terms.

    Citation

    Patrick Fleith. (2024). AstroChat - A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11531579

    Update Frequency

    Will be updated based on feedbacks. I am also looking for contributors. Help me create more datasets for Space Engineering LLMs :)

    Have a feedback or spot an error?

    Use the ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Maher Asaad Baker (2023). How are Chat GPT and AI used in medical diagnosis [Dataset]. http://doi.org/10.7910/DVN/2HMJ58

How are Chat GPT and AI used in medical diagnosis

Explore at:
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Maher Asaad Baker
Description

The potential of using Chat GPT and AI to revolutionize the way we interact with computers, specifically in the field of medical diagnostics. Chat GPT can make conversations between doctors and patients more natural, while AI can analyze vast amounts of patient data to identify trends and estimate a patient’s health. Patients can use Chat GPT to better understand their medical conditions, and both Chat GPT and AI can be used to automate tasks such as scheduling appointments and processing test results. However, there are limitations to using AI, including data bias, complex results, and analysis errors. To reduce errors, it is important to validate findings using various techniques and ensure that data is accurate and up-to-date. Chat GPT also employs security measures to protect patient data privacy and confidentiality.

Search
Clear search
Close search
Google apps
Main menu