Facebook
TwitterThe potential of using Chat GPT and AI to revolutionize the way we interact with computers, specifically in the field of medical diagnostics. Chat GPT can make conversations between doctors and patients more natural, while AI can analyze vast amounts of patient data to identify trends and estimate a patient’s health. Patients can use Chat GPT to better understand their medical conditions, and both Chat GPT and AI can be used to automate tasks such as scheduling appointments and processing test results. However, there are limitations to using AI, including data bias, complex results, and analysis errors. To reduce errors, it is important to validate findings using various techniques and ensure that data is accurate and up-to-date. Chat GPT also employs security measures to protect patient data privacy and confidentiality.
Facebook
Twitterhttps://tickertrends.io/termshttps://tickertrends.io/terms
Monthly dataset tracking topic frequency, keyword volume, and conversation patterns across ChatGPT discussions. Data is normalized on a 0 to 100 scale for easy comparison. Aggregates millions of AI interactions to reveal emerging trends, user interests, and discussion momentum across technology, finance, health, education, and business categories.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a CSV file related to ChatGPT including keywords(chatgpt, chat gpt) #hashtags and @mentions about ChatGPT. OpenAI's conversational AI model. The file includes information on 500,000 tweets. The dataset aims to help understand public opinion, trends, and potential applications of ChatGPT by analyzing tweet volume, sentiment, user engagement, and the influence of key AI events. The dataset offers valuable insights for companies, researchers, and policymakers, allowing them to make informed decisions and shape the future of AI-powered conversational technologies.
Check out my Comprehensive Analysis on this dataset: Medium article "Cracking the ChatGPT Code: A Deep Dive into 500,000 Tweets using Advanced NLP Techniques"
Learn about the collection process in Medium article "Effortlessly Scraping Massive Twitter Data"
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This synthetically generated dataset provides a realistic AI performance comparison between ChatGPT (GPT-4-turbo) and DeepSeek (DeepSeek-Chat 1.5) over a 1.5-year period. With 10,000+ rows, it captures key user interaction metrics, platform performance indicators, and AI response characteristics to analyze trends in accuracy, engagement, and adoption.
📜 License: MIT – Free for research, projects, and analysis.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This if the data we used for our analysis
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
ChatGPT was the chatbot that kickstarted the generative AI revolution, which has been responsible for hundreds of billions of dollars in data centres, graphics chips and AI startups. Launched by...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains all available conversations from chatlogs.net between users and ChatGPT. Version 1 contains all conversations available up to the cutoff date of April 4, 2023. Version 1 contains all conversations available up to the cutoff date of April 20, 2023.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study aims to explore students' associations with Artificial Intelligence (AI) and how these perceptions have evolved following the release of Chat GPT. A free word association test was conducted with 836 German high school students aged 10–20. Associations were collected before and after the release of Chat GPT, processed, cleaned, and inductively categorized into nine groups: technical association, assistance system, future, human, negative, positive, artificial, others, and no association. In total, 355 distinct terms were mentioned, with “robot” emerging as the most frequently cited, followed by “computer” and “Chat GPT,” indicating a strong connection between AI and technological applications. The release of Chat GPT had a significant impact on students' associations, with a marked increase in mentions of Chat GPT and related assistance systems, such as Siri and Snapchat AI. The results reveal a shift in students' perception of AI-from abstract, futuristic concepts to more immediate, application-based associations. Network analysis further demonstrated how terms were semantically clustered, emphasizing the prominence of assistance systems in students' conceptions. The findings underscore the importance of integrating AI education that fosters both critical reflection and practical understanding of AI, encouraging responsible engagement with the technology. These insights are crucial for shaping the future of AI literacy in schools and universities.
Facebook
TwitterIn March 2025, ChatGPT.com received approximately *** billion visits from users worldwide. The most recent year under analysis has seen an increase in traffic to OpenAI's artificial intelligence chatbot. This is the highest traffic volume achieved by the site to date, with values for the most recent analyzed month exceeding twice the average monthly visits for the entire examined period between April 2023 and April 2024.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We propose an instruction-based process for trustworthy data curation in materials science (MatSci-Instruct), which we then apply to finetune a LLaMa-based language model targeted for materials science (HoneyBee). MatSci-Instruct helps alleviate the scarcity of relevant, high-quality materials science textual data available in the open literature, and HoneyBee is the first billion-parameter language model specialized to materials science. In MatSci-Instruct we improve the trustworthiness of generated data by prompting multiple commercially available large language models for generation with an Instructor module (e.g. Chat-GPT) and verification from an independent Verifier module (e.g. Claude). Using MatSci-Instruct, we construct a dataset of multiple tasks and measure the quality of our dataset along multiple dimensions, including accuracy against known facts, relevance to materials science, as well as completeness and reasonableness of the data. Moreover, we iteratively generate more targeted instructions and instruction-data in a finetuning-evaluation-feedback loop leading to progressively better performance for our finetuned HoneyBee models. Our evaluation on the MatSci-NLP benchmark shows HoneyBee's outperformance of existing language models on materials science tasks and iterative improvement in successive stages of instruction-data refinement. We study the quality of HoneyBee's language modeling through automatic evaluation and analyze case studies to further understand the model's capabilities and limitations. Our code and relevant datasets are publicly available at https://github.com/BangLab-UdeM-Mila/NLP4MatSci-HoneyBee.
Facebook
TwitterThis is a mock dataset from Chat GPT
You are a data analyst hired by GreenFuture Inc., an environmental consultancy firm. GreenFuture Inc. specializes in assessing the environmental impact of businesses in various sectors, focusing on carbon footprint, waste management, and energy efficiency. They've collected data from several companies and want to analyze this data to identify trends, potential areas for improvement, and industry benchmarks.
Suggested Analysis Tasks:
Emission Trends Analysis: Calculate total emissions (CO2, CH4, N2O) per year for all companies. Identify the top 5 companies with the highest CO2 emissions in the latest year available. Sector-wise Energy Use Analysis
Aggregate energy use (Electricity, Fossil Fuels, Renewables) by sector and year: Determine the sector with the highest reliance on fossil fuels versus renewables.
Waste Management Efficiency: Calculate the percentage of waste recycled for each company and find the top 3 companies with the best recycling rates. Analyze trends in waste management practices over the years.
Correlation Analysis: Explore the correlation between energy use types (Electricity, Fossil Fuels, Renewables) and emissions (CO2) to identify patterns.
Actionable Insights: Based on the analysis, suggest actionable insights for companies to reduce their environmental impact. This might involve recommendations for sectors or companies lagging in renewable energy use or those with inefficient waste management practices.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThe rise of accessible, consumer facing large language models (LLM) provides an opportunity for immediate diagnostic support for clinicians.ObjectivesTo compare the different performance characteristics of common LLMS utility in solving complex clinical cases and assess the utility of a novel tool to grade LLM output.MethodsUsing a newly developed rubric to assess the models’ diagnostic utility, we measured to models’ ability to answer cases according to accuracy, readability, clinical interpretability, and an assessment of safety. Here we present a comparative analysis of three LLM models—Bing, Chat GPT, and Gemini—across a diverse set of clinical cases as presented in the New England Journal of Medicines case series.ResultsOur results suggest that models performed differently when presented with identical clinical information, with Gemini performing best. Our grading tool had low interobserver variability and proved a reliable tool to grade LLM clinical output.ConclusionThis research underscores the variation in model performance in clinical scenarios and highlights the importance of considering diagnostic model performance in diverse clinical scenarios prior to deployment. Furthermore, we provide a new tool to assess LLM output.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Discover the booming AI Skincare Advisor market! Learn about its projected $2.5 billion valuation by 2033, key growth drivers, leading companies like Reveive and Bioderma, and the challenges ahead. Get insights into personalized skincare, AI technology, and market trends.
Facebook
TwitterThe AutoScout Data dataset is a comprehensive collection of information on various vehicles, capturing a wide array of attributes that can be used for analysis in automotive research, market analysis, and machine learning projects related to car sales and valuations. Here is an overview of the dataset columns:
make_model: This column contains the make and model of the vehicle. It is a categorical variable that identifies the brand and specific model of the car.
body_type: This column describes the body type of the vehicle (e.g., sedan, SUV, hatchback). It is a categorical variable representing the shape and style of the car.
price: This column lists the price of the vehicle in the respective currency. It is a numerical variable crucial for market valuation studies.
vat: This column indicates whether the price includes VAT (Value Added Tax). It is a categorical variable, typically showing "Yes" or "No."
km: This column shows the mileage of the vehicle in kilometers. It is a numerical variable that indicates the usage level of the car.
Type: This column describes the type of car, such as "new," "used," or "demonstration." It is a categorical variable.
Fuel: This column specifies the type of fuel the vehicle uses, such as petrol, diesel, electric, etc. It is a categorical variable.
Gears: This column indicates the number of gears the vehicle has. It is a numerical variable.
Comfort_Convenience: This column lists comfort and convenience features of the vehicle, such as air conditioning, heated seats, etc. It is a categorical variable with multiple entries.
Entertainment_Media: This column describes the entertainment and media features available in the car, like Bluetooth, radio, CD player, etc. It is a categorical variable with multiple entries.
Extras: This column details any extra features of the car, which might include things like alloy wheels, roof rails, etc. It is a categorical variable with multiple entries.
Safety_Security: This column includes information on safety and security features of the vehicle, such as airbags, ABS, etc. It is a categorical variable with multiple entries.
age: This column indicates the age of the vehicle in years. It is a numerical variable.
Previous_Owners: This column shows the number of previous owners of the vehicle. It is a numerical variable.
hp_kW: This column lists the horsepower of the vehicle in kilowatts. It is a numerical variable indicative of the car’s power.
Inspection_new: This column indicates whether the vehicle has a new inspection. It is a categorical variable, typically showing "Yes" or "No."
Paint_Type: This column specifies the type of paint of the vehicle, such as metallic, matte, etc. It is a categorical variable.
Upholstery_type: This column describes the type of upholstery in the vehicle, such as leather, fabric, etc. It is a categorical variable.
Gearing_Type: This column indicates the type of gearing system, such as manual, automatic, etc. It is a categorical variable.
Displacement_cc: This column lists the engine displacement in cubic centimeters (cc). It is a numerical variable.
Weight_kg: This column shows the weight of the vehicle in kilograms. It is a numerical variable.
Drive_chain: This column describes the drive chain of the vehicle, such as front-wheel drive, rear-wheel drive, etc. It is a categorical variable.
cons_comb: This column indicates the combined fuel consumption of the vehicle. It is a numerical variable representing fuel efficiency.
Usage and Potential Analyses This dataset can be used for various analyses, including:
The dataset's rich set of features allows for detailed exploration and insights into the automotive market, providing valuable information for consumers, dealers, and manufacturers alike.
Text generated with Chat-GPT
Foto von Alev Takil auf Unsplash
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
🧠 Awesome ChatGPT Prompts [CSV dataset]
This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub
License
CC-0
Facebook
Twitterhttps://choosealicense.com/licenses/wtfpl/https://choosealicense.com/licenses/wtfpl/
Quardo/Turkish-Chat_GPT-4O
Description
This is a simple dataset generated by OpenAI's GPT-4O (gpt-4o-2024-08-06). The dataset includes various entries created and evaluated by the AI model, providing a unique collection of Turkish chat data for analysis and research.
Warning
Please note that this dataset may contain errors or inconsistencies as it is fully generated by an AI model. It is highly recommended to check and edit the data before usage, as AI can… See the full description on the dataset page: https://huggingface.co/datasets/Quardo/Turkish-Chat_GPT-4O.
Facebook
Twitterhttps://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/
The rivalry between ChatGPT and Google Gemini defines the generative AI landscape. ChatGPT remains the leader in active engagement, while Gemini closes the gap through mass distribution. From corporate reports to web traffic studies, figures speak clearly about adoption, reach, and momentum. Explore what makes each platform stand out, and what...
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
We have compiled a dataset that consists of textual articles including common terminology, concepts and definitions in the field of computer science, artificial intelligence, and cyber security. This dataset consists of both human-generated text and OpenAI’s ChatGPT-generated text. Human-generated answers were collected from different computer science dictionaries and encyclopedias including “The Encyclopedia of Computer Science and Technology” and "Encyclopedia of Human-Computer Interaction". AI-generated content in our dataset was produced by simply posting questions to OpenAI’s ChatGPT and manually documenting the resulting responses. A rigorous data-cleaning process has been performed to remove unwanted Unicode characters, styling and formatting tags. To structure our dataset for binary classification, we combined both AI-generated and Human-generated answers into a single column and assigned appropriate labels to each data point (Human-generated = 0 and AI-generated = 1).
This creates our article-level dataset (article_level_data.csv) which consists of a total of 1018 articles, 509 AI-generated and 509 Human-generated. Additionally, we have divided each article into its sentences and labelled them accordingly. This is mainly to evaluate the performance of classification models and pipelines when it comes to shorter sentence-level data points. This constructs our sentence-level dataset (sentence_level_data.csv) which consists of a total of 7344 entries (4008 AI-generated and 3336 Human-generated).
We appreciate it, if you cite the following article if you happen to use this dataset in any scientific publication:
Maktab Dar Oghaz, M., Dhame, K., Singaram, G., & Babu Saheer, L. (2023). Detection and Classification of ChatGPT Generated Contents Using Deep Transformer Models. Frontiers in Artificial Intelligence.
Facebook
Twitterhttps://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Bitext - Telco Tagged Training Dataset for LLM-based Virtual Assistants
Overview
This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [telco] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An overview of… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-telco-llm-chatbot-training-dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The AstroChat dataset is a collection of 901 dialogues, synthetically generated, tailored to the specific domain of Astronautics / Space Mission Engineering. This dataset will be frequently updated following feedback from the community. If you would like to contribute, please reach out in the community discussion.
The dataset is intended to be used for supervised fine-tuning of chat LLMs (Large Language Models). Due to its currently limited size, you should use a pre-trained instruct model and ideally augment the AstroChat dataset with other datasets in the area of (Science Technology, Engineering and Math).
To be completed
python
from datasets import load_dataset
dataset = load_dataset("patrickfleith/AstroChat")901 generated conversations between a simulated user and AI-assistant (more on the generation method below). Each instance is made of the following field (column):
- id: a unique identifier to refer to this specific conversation. Useeful for traceability purposes, especially for further processing task or merge with other datasets.
- topic: a topic within the domain of Astronautics / Space Mission Engineering. This field is useful to filter the dataset by topic, or to create a topic-based split.
- subtopic: a subtopic of the topic. For instance in the topic of Propulsion, there are subtopics like Injector Design, Combustion Instability, Electric Propulsion, Chemical Propulsion, etc.
- persona: description of the persona used to simulate a user
- opening_question: the first question asked by the user to start a conversation with the AI-assistant
- messages: the whole conversation messages between the user and the AI assistant in already nicely formatted for rapid use with the transformers library. A list of messages where each message is a dictionary with the following fields:
- role: the role of the speaker, either user or assistant
- content: the message content. For the assistant, it is the answer to the user's question. For the user, it is the question asked to the assistant.
Important See the full list of topics and subtopics covered below.
Dataset is version controlled and commits history is available here: https://huggingface.co/datasets/patrickfleith/AstroChat/commits/main
We used a method inspired from Ultrachat dataset. Especially, we implemented our own version of Human-Model interaction from Sector I: Questions about the World of their paper:
Ding, N., Chen, Y., Xu, B., Qin, Y., Zheng, Z., Hu, S., ... & Zhou, B. (2023). Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.
gpt-4-turbo model) to generate the answers to the opening questionsAll instances in the dataset are in english
901 synthetically-generated dialogue
AstroChat © 2024 by Patrick Fleith is licensed under Creative Commons Attribution 4.0 International
No restriction. Please provide the correct attribution following the license terms.
Patrick Fleith. (2024). AstroChat - A Dataset of synthetically generated conversations for LLM supervised fine-tuning in the domain of Space Mission Engineering and Astronautics (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11531579
Will be updated based on feedbacks. I am also looking for contributors. Help me create more datasets for Space Engineering LLMs :)
Use the ...
Facebook
TwitterThe potential of using Chat GPT and AI to revolutionize the way we interact with computers, specifically in the field of medical diagnostics. Chat GPT can make conversations between doctors and patients more natural, while AI can analyze vast amounts of patient data to identify trends and estimate a patient’s health. Patients can use Chat GPT to better understand their medical conditions, and both Chat GPT and AI can be used to automate tasks such as scheduling appointments and processing test results. However, there are limitations to using AI, including data bias, complex results, and analysis errors. To reduce errors, it is important to validate findings using various techniques and ensure that data is accurate and up-to-date. Chat GPT also employs security measures to protect patient data privacy and confidentiality.