8 datasets found

C
ChatGPT Statistics 2025
sambretzmann.com
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ChatGPT Statistics 2025 [Dataset]. https://sambretzmann.com/chatgpt-statistics/
Explore at:
Dataset updated
Jun 12, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comprehensive ChatGPT statistics covering 800 million weekly users, $300 billion valuation, market share, demographics, and technical specifications for 2025.
E
Google Gemini Statistics By Features, Performance and AI Versions
enterpriseappstoday.com
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EnterpriseAppsToday (2023). Google Gemini Statistics By Features, Performance and AI Versions [Dataset]. https://www.enterpriseappstoday.com/stats/google-gemini-statistics.html
Explore at:
Dataset updated
Dec 20, 2023
Dataset authored and provided by
EnterpriseAppsToday
License
https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Google Gemini Statistics: In 2023, Google unveiled the most powerful AI model to date. Google Gemini is the worldâ€™s most advanced AI leaving the ChatGPT 4 behind in the line. Google has 3 different sizes of models, superior to each, and can perform tasks accordingly. According to Google Gemini Statistics, these can understand and solve complex problems related to absolutely anything. Google even said, they will develop AI in such as way that it will let you know how helpful AI is in our daily routine. Well, we hope our next generation wonâ€™t be fully dependent on such technologies, otherwise, we will lose all of our natural talent! Editorâ€™s Choice Google Gemini can follow natural and engaging conversations. According to Google Gemini Statistics, Gemini Ultra has a 90.0% score on the MMLU benchmark for testing the knowledge of and problem-solving on subjects including history, physics, math, law, ethics, history, and medicine. If you ask Gemini what to do with your raw material, it can provide you with ideas in the form of text or images according to the given input. Gemini has outperformed ChatGPT -4 tests in the majority of the cases. According to the report this LLM is said to be unique because it can process multiple types of data at the same time along with video, images, computer code, and text. Google is considering its development as The Gemini Era, showing the importance of our AI is significant in improving our daily lives. Google Gemini can talk like a real person Gemini Ultra is the largest model and can solve extremely complex problems. Gemini models are trained on multilingual and multimodal datasets. Geminiâ€™s Ultra performance on the MMMU benchmark has also outperformed the GPT-4V in the following results Art and Design (74.2), Business (62.7), Health and Medicine (71.3), Humanities and Social Science (78.3), and Technology and Engineering (53.00).
o
#ChatGPT 1000 Daily 🐦 Tweets
opendatabay.com
.csv
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). #ChatGPT 1000 Daily 🐦 Tweets [Dataset]. https://www.opendatabay.com/data/ai-ml/2cf951da-3ce1-4606-a8d6-3f865c4d8a3b
Explore at:
.csvAvailable download formats
Dataset updated
Jun 18, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Social Media and Networking
Description
UPDATE: Due to new Twitter API conditions changed by Elon Musk, now it's no longer free to use the Twitter (X) API and the pricing is 100 $/month in the hobby plan. So my automated ETL notebook stopped from updating new tweets to this dataset on May 13th 2023.

This dataset is was updated everyday with the addition of 1000 tweets/day containing any of the words "ChatGPT", "GPT3", or "GPT4", starting from the 3rd of April 2023. Everyday's tweets are uploaded 24-72h later, so the counter on tweets' likes, retweets, messages and impressions gets enough time to be relevant. Tweets are from any language selected randomly from all hours of the day. There are some basic filters applied trying to discard sensitive tweets and spam.

This dataset can be used for many different applications regarding to Data Analysis and Visualization but also NLP Sentiment Analysis techniques and more.

Consider upvoting this Dataset and the ETL scheduled Notebook providing new data everyday into it if you found them interesting, thanks! 🤗

Columns Description: tweet_id: Integer. unique identifier for each tweet. Older tweets have smaller IDs.

tweet_created: Timestamp. Time of the tweet's creation.

tweet_extracted: Timestamp. The UTC time when the ETL pipeline pulled the tweet and its metadata (likes count, retweets count, etc).

text: String. The raw payload text from the tweet.

lang: String. Short name for the Tweet text's language.

user_id: Integer. Twitter's unique user id.

user_name: String. The author's public name on Twitter.

user_username: String. The author's Twitter account username (@example)

user_location: String. The author's public location.

user_description: String. The author's public profile's bio.

user_created: Timestamp. Timestamp of user's Twitter account creation.

user_followers_count: Integer. The number of followers of the author's account at the moment of the tweet extraction

user_following_count: Integer. The number of followed accounts from the author's account at the moment of the Tweet extraction

user_tweet_count: Integer. The number of Tweets that the author has published at the moment of the Tweet extraction.

user_verified: Boolean. True if the user is verified (blue mark).

source: The device/app used to publish the tweet (Apparently not working, all values are Nan so far). #ChatGPT 1000 Daily 🐦 Tweets retweet_count: Integer. Number of retweets to the Tweet at the moment of the Tweet extraction.

like_count: Integer. Number of Likes to the Tweet at the moment of the Tweet extraction.

reply_count: Integer. Number of reply messages to the Tweet.

impression_count: Integer. Number of times the Tweet has been seen at the moment of the Tweet extraction.

More info: Tweets API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet Users API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user

License

CC0

Original Data Source: https://www.kaggle.com/datasets/edomingo/chatgpt-1000-daily-tweets
h
awesome-chatgpt-prompts
huggingface.co
Updated Dec 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2023
Authors
Fatih Kadir Akın
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
🧠 Awesome ChatGPT Prompts [CSV dataset]

This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

License

CC-0
P
HaluEval Dataset
paperswithcode.com
opendatalab.com
Updated Mar 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junyi Li; Xiaoxue Cheng; Wayne Xin Zhao; Jian-Yun Nie; Ji-Rong Wen (2025). HaluEval Dataset [Dataset]. https://paperswithcode.com/dataset/halueval
Explore at:
Dataset updated
Mar 17, 2024
Authors
Junyi Li; Xiaoxue Cheng; Wayne Xin Zhao; Jian-Yun Nie; Ji-Rong Wen
Description
HaluEval is a large-scale hallucination evaluation benchmark designed for Large Language Models (LLMs). It provides a comprehensive collection of generated and human-annotated hallucinated samples to evaluate the performance of LLMs in recognizing hallucinations¹².

Here are the key details about the HaluEval dataset:

Purpose and Overview: Purpose: HaluEval aims to understand the types of content and the extent to which LLMs are prone to hallucinate. Content: It includes both general user queries with ChatGPT responses and task-specific examples from three tasks: question answering, knowledge-grounded dialogue, and text summarization.

Data Sources:

For general user queries, HaluEval adopts the 52K instruction tuning dataset from Alpaca. Task-specific examples are generated based on existing task datasets (e.g., HotpotQA, OpenDialKG, CNN/Daily Mail) as seed data.

Data Composition:

General User Queries: 5,000 user queries paired with ChatGPT responses. Queries are selected based on low-similarity responses to identify potential hallucinations.

Task-Specific Examples:

30,000 examples from three tasks: Question Answering: Based on HotpotQA as seed data. Knowledge-Grounded Dialogue: Based on OpenDialKG as seed data. Text Summarization: Based on CNN/Daily Mail as seed data.

Data Release:

The dataset contains 35,000 generated and human-annotated hallucinated samples used in experiments. JSON files include: qa_data.json: Hallucinated QA samples. dialogue_data.json: Hallucinated dialogue samples. summarization_data.json: Hallucinated summarization samples. general_data.json: Human-annotated ChatGPT responses to general user queries.

Source: Conversation with Bing, 3/17/2024 (1) HaluEval: A Hallucination Evaluation Benchmark for LLMs. https://github.com/RUCAIBox/HaluEval. (2) jzjiao/halueval-sft · Datasets at Hugging Face. https://huggingface.co/datasets/jzjiao/halueval-sft. (3) HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large .... https://aclanthology.org/2023.emnlp-main.397/. (4) HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large .... https://arxiv.org/abs/2305.11747. (5) undefined. https://github.com/RUCAIBox/HaluEval%29.
Customer Shopping Trends Dataset
kaggle.com
Updated Oct 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description
Context

The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

Content

This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

Dataset Glossary (Column-wise)

Customer ID - Unique identifier for each customer

Age - Age of the customer

Gender - Gender of the customer (Male/Female)

Item Purchased - The item purchased by the customer

Category - Category of the item purchased

Purchase Amount (USD) - The amount of the purchase in USD

Location - Location where the purchase was made

Size - Size of the purchased item

Color - Color of the purchased item

Season - Season during which the purchase was made

Review Rating - Rating given by the customer for the purchased item

Subscription Status - Indicates if the customer has a subscription (Yes/No)

Shipping Type - Type of shipping chosen by the customer

Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)

Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)

Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction

Payment Method - Customer's most preferred payment method

Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

Structure of the Dataset

https://i.imgur.com/6UEqejq.png" alt="">

Acknowledgement

This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

Cover Photo by: Freepik

Thumbnail by: Clothing icons created by Flat Icons - Flaticon
WildChat
huggingface.co
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2024). WildChat [Dataset]. https://huggingface.co/datasets/allenai/WildChat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 23, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
Dataset Card for WildChat

Note: a newer version with 1 million conversations and demographic information can be found here. Dataset Description

Paper: https://arxiv.org/abs/2405.01470

Interactive Search Tool: https://wildvisualizer.com (paper)

License: ODC-BY

Language(s) (NLP): multi-lingual

Point of Contact: Yuntian Deng

Dataset Summary

WildChat is a collection of 650K conversations between human users and ChatGPT. We collected WildChat… See the full description on the dataset page: https://huggingface.co/datasets/allenai/WildChat.
m
Data Retention Period Disclosures in Privacy Policies
data.mendeley.com
Updated Jul 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Rodríguez (2024). Data Retention Period Disclosures in Privacy Policies [Dataset]. http://doi.org/10.17632/c4x958pzpm.2
Explore at:
Unique identifier
https://doi.org/10.17632/c4x958pzpm.2
Dataset updated
Jul 7, 2024
Authors
David Rodríguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
115 privacy policies from the OPP-115 corpus have been re-annotated with the specific data retention periods disclosed, aligned with the GDPR requirements disclosed in Art. 13 (2)(a). Those retention periods have been categorized into the following 6 distinct cases:

C0: No data retention period is indicated in the privacy policy/segment. C1: A specific data retention period is indicated (e.g., days, weeks, months...). C2: Indicate that the data will be stored indefinitely. C3: A criterion is determined during which a defined period during which the data will be stored can be understood (e.g., as long as the user has an active account). C4: It is indicated that personal data will be stored for an unspecified period, for fraud prevention, legal or security reasons. C5: It is indicated that personal data will be stored for an unspecified period, for purposes other than fraud prevention, legal, or security. Note: If the privacy policy or segment accounts for more than one case, the case with the highest value was annotated (e.g., if case C2 and case C4 apply, C4 is annotated).

Then, the ground truth dataset served as validation for our proposed ChatGPT-based method, the results of which have also been included in this dataset.

Columns description: - policy_id: ID of the policy in the OPP-115 dataset - policy_name: Domain of the privacy policy - policy_text: Privacy policy collected at the time of OPP-115 dataset creation - info_type_value: Type of personal data to which data retention refers - retention_period: Period of retention annotated by OPP-115 annotators - actual_case: Our annotated case ranging from C0-C5 - GPT_case: ChatGPT classification of the case identified in the segment - actual_Comply_GDPR: Boolean denoting True if they apparently comply with GDPR (cases C1-C5) or False if not (case C0) - GPT_Comply_GDPR: Boolean denoting True if they apparently comply with GDPR (cases C1-C5) or False if not (case C0) - paragraphs_retention_period: List containing the paragraphs annotated as Data Retention by OPP-115 annotators and our red text describing the relevant information used for our annotation decision
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). ChatGPT Statistics 2025 [Dataset]. https://sambretzmann.com/chatgpt-statistics/

ChatGPT Statistics 2025

Explore at:

8 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 12, 2025

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Comprehensive ChatGPT statistics covering 800 million weekly users, $300 billion valuation, market share, demographics, and technical specifications for 2025.

Clear search

Close search

Google apps

Main menu

ChatGPT Statistics 2025

Google Gemini Statistics By Features, Performance and AI Versions

#ChatGPT 1000 Daily 🐦 Tweets

License

awesome-chatgpt-prompts

HaluEval Dataset

Customer Shopping Trends Dataset

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement

WildChat

Data Retention Period Disclosures in Privacy Policies

ChatGPT Statistics 2025