Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset builds upon the original ChatGPT Tweets dataset, adding preprocessing and analysis to further its usability and insights. It retains the original data collection period from Nov 30, 2022, to Feb 11, 2023, capturing global public reactions, opinions, and discussions during ChatGPT’s initial launch phase.
Enhancements include: - Data Cleaning: Encoding errors were corrected, invalid rows removed, and unnecessary elements including hashtags, URLs etc. were cleaned and standardized. - Sentiment Analysis: New columns were added, providing sentiment labels (Positive, Neutral, Negative) for each tweet using a Hugging Face pre-trained model.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset consists of daily-updated user reviews and ratings for the ChatGPT Android App. The dataset includes several key attributes that capture various aspects of the reviews, providing insights into user experiences and feedback over time.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Artificial Intelligence (AI) applications are expected to promote government service delivery and quality, more efficient handling of cases, and bias reduction in decision-making. One potential benefit of the AI tool ChatGPT is that it may support governments in the anonymization of data. However, it is not clear whether ChatGPT is appropriate to support data anonymization for public organizations. Hence, this study examines the possibilities, risks, and ethical implications for government organizations to employ ChatGPT in the anonymization of personal data. We use a case study approach, combining informal conversations, formal interviews, a literature review, document analysis and experiments to conduct a three-step study. First, we describe the technology behind ChatGPT and its operation. Second, experiments with three types of data (fake data, original literature and modified literature) show that ChatGPT exhibits strong performance in anonymizing these three types of texts. Third, an overview of significant risks and ethical issues related to ChatGPT and its use for anonymization within a specific government organization was generated, including themes such as privacy, responsibility, transparency, bias, human intervention, and sustainability. One significant risk in the current form of ChatGPT is a privacy risk, as inputs are stored and forwarded to OpenAI and potentially other parties. This is unacceptable if texts containing personal data are anonymized with ChatGPT. We discuss several potential solutions to address these risks and ethical issues. This study contributes to the scarce scientific literature on the potential value of employing ChatGPT for personal data anonymization in government. In addition, this study has practical value for civil servants who face the challenges of data anonymization in practice including resource-intensive and costly processes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
🤖 LMSYS-Chat-GPT-5-Chat-Response
The dataset used in Black-Box On-Policy Distillation of Large Language Models paper. Homepage at here. This dataset is an extension of the LMSYS-Chat-1M-Clean corpus, specifically curated by collecting high-quality, non-refusal responses from the GPT-5-Chat API. The LMSYS-Chat-1M dataset collects real-world user queries from the Chatbot Arena. There is no tool calls or reasoning in the GPT-5-Chat response.
💾 Dataset Structure
The… See the full description on the dataset page: https://huggingface.co/datasets/ytz20/LMSYS-Chat-GPT-5-Chat-Response.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Personality research has traditionally relied on questionnaires, which bring with them inherent limitations, such as response style bias. With the emergence of large language models such as ChatGPT, the question arises as to what extent these models can be used in personality research. In this study, ChatGPT (GPT-4) generated 2000 text-based personas. Next, for each persona, ChatGPT completed a short form of the Big Five Inventory (BFI-10), the Brief Sensation Seeking Scale (BSSS), and a Short Dark Triad (SD3). The mean scores on the BFI-10 items were found to correlate strongly with means from previously published research, and principal component analysis revealed a clear five-component structure. Certain relationships between traits, such as a negative correlation between the age of the persona and the BSSS score, were clearly interpretable, while some other correlations diverged from the literature. An additional analysis using four new sets of 2000 personas each, including a set of ‘realistic’ personas and a set of cinematic personas, showed that the correlation matrix among personality constructs was affected by the persona set. It is concluded that evaluating questionnaires and research hypotheses prior to engaging with real individuals holds promise.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We explore the transformative impact of non-decision models, specifically ChatGPT, on the traditional decision-centric task of Android malware detection. Through a series of carefully designed experiments using publicly available datasets, this study reveals a paradigm shift. It reveals a serious lack of interpretability in decision-driven solutions, raising concerns about their reliability. In contrast, ChatGPT, as a non-decision-making model, is good at providing comprehensive analysis reports and significantly enhances interpretability. We give developers more insights through a non-decision-making perspective.ChatGPTYou can find ChatGPT’s analysis report on [APK_Analysis].Project StructureAPK List: It contains the SHA256 of malicious and benign samples.DatasetAll samples we used in our experiments you can find at [kronodroid].Survey ResultsWe collect responses from 101 participants and process their data by removing personal or sensitive information.Data preparation includes:Converting speech to textTranslating Chinese responses to EnglishRemoving redundant modal particlesThese processes ensure that the data is clean and structured, allowing for accurate and efficient analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As ChatGPT emerges as a potential ally in healthcare decision-making, it is imperative to investigate how users leverage and perceive it. The repurposing of technology is innovative but brings risks, especially since AI’s effectiveness depends on the data it’s fed. In healthcare, ChatGPT might provide sound advice based on current medical knowledge, which could turn into misinformation if its data sources later include erroneous information. Our study assesses user perceptions of ChatGPT, particularly of those who used ChatGPT for healthcare-related queries. By examining factors such as competence, reliability, transparency, trustworthiness, security, and persuasiveness of ChatGPT, the research aimed to understand how users rely on ChatGPT for health-related decision-making. A web-based survey was distributed to U.S. adults using ChatGPT at least once a month. Bayesian Linear Regression was used to understand how much ChatGPT aids in informed decision-making. This analysis was conducted on subsets of respondents, both those who used ChatGPT for healthcare decisions and those who did not. Qualitative data from open-ended questions were analyzed using content analysis, with thematic coding to extract public opinions on urban environmental policies. Six hundred and seven individuals responded to the survey. Respondents were distributed across 306 US cities of which 20 participants were from rural cities. Of all the respondents, 44 used ChatGPT for health-related queries and decision-making. In the healthcare context, the most effective model highlights ’Competent + Trustworthy + ChatGPT for healthcare queries’, underscoring the critical importance of perceived competence and trustworthiness specifically in the realm of healthcare applications of ChatGPT. On the other hand, the non-healthcare context reveals a broader spectrum of influential factors in its best model, which includes ’Trustworthy + Secure + Benefits outweigh risks + Satisfaction + Willing to take decisions + Intent to use + Persuasive’. In conclusion our study findings suggest a clear demarcation in user expectations and requirements from AI systems based on the context of their use. We advocate for a balanced approach where technological advancement and user readiness are harmonized.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
If you find this dataset useful, a quick upvote would be greatly appreciated 🙌 It helps more learners discover it!
Explore how students at different academic levels use AI tools like ChatGPT for tasks such as coding, writing, studying, and brainstorming. Designed for learning, EDA, and ML experimentation.
This dataset simulates 10,000 sessions of students interacting with an AI assistant (like ChatGPT or similar tools) for various academic tasks. Each row represents a single session, capturing the student’s level, discipline, type of task, session length, AI effectiveness, satisfaction rating, and whether they reused the AI tool later.
As AI tools become mainstream in education, there's a need to analyze and model how students interact with them. However, no public datasets exist for this behavior. This dataset fills that gap by providing a safe, fully synthetic yet realistic simulation for:
It’s ideal for students, data science learners, and researchers who want real-world use cases without privacy or copyright constraints.
| Column | Description |
|---|---|
SessionID | Unique session identifier |
StudentLevel | Academic level: High School, Undergraduate, Graduate |
Discipline | Student’s field of study (e.g., CS, Psychology, etc.) |
SessionDate | Date of the session |
SessionLengthMin | Length of AI interaction in minutes |
TotalPrompts | Number of prompts/messages used |
TaskType | Nature of the task (e.g., Coding, Writing, Research) |
AI_AssistanceLevel | 1–5 scale on how helpful the AI was perceived to be |
FinalOutcome | What the student achieved: Assignment Completed, Idea Drafted, etc. |
UsedAgain | Whether the student returned to use the assistant again |
SatisfactionRating | 1–5 rating of overall satisfaction with the session |
All data is synthetically generated using controlled distributions, real-world logic, and behavioral modeling to reflect realistic usage patterns.
This dataset is rich with potential for:
UsedAgain) or final outcome
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
A comprehensive, research-grade dataset capturing the adoption, usage, and impact of leading AI tools—such as ChatGPT, Midjourney, Stable Diffusion, Bard, and Claude—across multiple industries, countries, and user demographics. This dataset is designed for advanced analytics, machine learning, natural language processing, and business intelligence applications.
This dataset provides a panoramic view of how AI technologies are transforming business, industry, and society worldwide. Drawing inspiration from real-world adoption surveys, academic research, and industry reports, it enables users to:
To add a column descriptor (column description) to your Kaggle dataset's Data Card, you should provide a clear and concise explanation for each column. This improves dataset usability and helps users understand your data structure, which is highly recommended for achieving a 10/10 usability score on Kaggle[2][9].
Below is a ready-to-copy Column Descriptions table for your dataset. You can paste this into the "Column Descriptions" section of your Kaggle Data Card (after clicking the pencil/edit icon in the Data tab)[2][9]:
| Column Name | Description |
|---|---|
country | Country where the organization or user is located (e.g., USA, India, China, etc.) |
industry | Industry sector of the organization (e.g., Technology, Healthcare, Retail, etc.) |
ai_tool | Name of the AI tool used (e.g., ChatGPT, Midjourney, Bard, Stable Diffusion, Claude) |
adoption_rate | Percentage representing the adoption rate of the AI tool within the sector or company (0–100) |
daily_active_users | Estimated number of daily active users for the AI tool in the given context |
year | Year in which the data was recorded (2023 or 2024) |
user_feedback | Free-text feedback from users about their experience with the AI tool (up to 150 characters) |
age_group | Age group of users (e.g., 18-24, 25-34, 35-44, 45-54, 55+) |
company_size | Size category of the organization (Startup, SME, Enterprise) |
country,industry,ai_tool,adoption_rate,daily_active_users,year,user_feedback,age_group,company_size
USA,Technology,ChatGPT,78.5,5423,2024,"Great productivity boost for our team!",25-34,Enterprise
India,Healthcare,Midjourney,62.3,2345,2024,"Improved patient engagement and workflow.",35-44,SME
Germany,Manufacturing,Stable Diffusion,45.1,1842,2023,"Enhanced our design process.",45-54,Enterprise
Brazil,Retail,Bard,33.2,1200,2024,"Helped automate our customer support.",18-24,Startup
UK,Finance,Claude,55.7,2100,2023,"Increased accuracy in financial forecasting.",25-34,SME
import pandas as pd
df = pd.read_csv('/path/to/ai_adoption_dataset.csv')
print(df.head())
print(df.info())
industry_adoption = df.groupby(['industry', 'country'])['adoption_rate'].mean().reset_index()
print(industry_adoption.sort_values(by='adoption_rate', ascending=False).head(10))
import matplotlib.pyplot as plt
tool_counts = df['ai_tool'].value_counts()
tool_counts.plot(kind='bar', title='AI Tool Usage Distribution')
plt.xlabel('AI Tool')
plt.ylabel('Number of Records')
plt.show()
from textblob import TextBlob
df['feedback_sentiment'] = df['user_feedback'].apply(lambda x: TextBlob(x).sentiment.polarity)
print(df[['user_feedback', 'feedback_sentiment']].head())
yearly_trends = df.groupby(['year', 'ai_tool'])['adoption_rate'].mean().unstack()
yearly_trends.plot(marker='o', title='AI Tool Adoption Rate Over Time')
plt.xlabel('Year')
plt.ylabel('Average Adoption Rate (%)')
plt.show()
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The birth of ChatGPT, a cutting-edge language model-based chatbot developed by OpenAI, ushered in a new era in AI. However, due to potential pitfalls, its role in rigorous scientific research is not clear yet. This paper vividly showcases its innovative application within the field of drug discovery. Focused specifically on developing anticocaine addiction drugs, the study employs GPT-4 as a virtual guide, offering strategic and methodological insights to researchers working on generative models for drug candidates. The primary objective is to generate optimal drug-like molecules with desired properties. By leveraging the capabilities of ChatGPT, the study introduces a novel approach to the drug discovery process. This symbiotic partnership between AI and researchers transforms how drug development is approached. Chatbots become facilitators, steering researchers toward innovative methodologies and productive paths for creating effective drug candidates. This research sheds light on the collaborative synergy between human expertise and AI assistance, wherein ChatGPT’s cognitive abilities enhance the design and development of pharmaceutical solutions. This paper not only explores the integration of advanced AI in drug discovery but also reimagines the landscape by advocating for AI-powered chatbots as trailblazers in revolutionizing therapeutic innovation.
Facebook
TwitterAll data of experiences (reviews), comments, and companies on WikiTajrobe website.
| Field | Description | Data Type | --- | --- | | id | Primary key | integer | wt_id | Review ID on the WikiTajrobe website | integer | company_id | Company ID on the WikiTajrobe website | integer | default_tag | Is the experience related to work experience or a job interview? | varchar | danger_tag | Does the experience contain a report of sexual harassment? | varchar | title | Experience title | varchar | text | Experience text | varchar | job_title | Job title of the experience submitter | varchar | status | Employment status of the experience submitter | varchar | score | Rating given to the company | integer | salary_offer | Salary offer in the job interview | varchar | salary | Salary received in the work experience | varchar | publish_date | Date of experience publication | varchar | interview_date | Date of interview | varchar | employment_start_date | Start date of the job | varchar | cell_group | The fields of this JSON format are presented as separate fields | jsonb | created_at | Record timestamp in the database | datetime
| Field | Description | Data Type | --- | --- | | id | Primary key | integer | review_id | Review ID on the WikiTajrobe website | integer | company_id | Company ID on the WikiTajrobe website | integer | text | Comment text | varchar | time_elapsed | Time elapsed since the comment was created | varchar | created_at | Record timestamp in the database | datetime
| Field | Description | Data Type | --- | --- | | id | Primary key | integer | wt_id | Company ID on the WikiTajrobe website | integer | name | Company name | varchar | username | Company username | varchar | created_at | Record timestamp in the database | datetime
Created with ChatGPT for Data Cleaning. | Field | Description | Data Type | --- | --- | | job_title | Job title of the experience submitter | varchar | categorized_job_title | Categorized job titles for mapping | varchar
| Field | Description | Data Type | --- | --- | | company_name | Company name | varchar | company_size | Company size | varchar | company_industry | Company industry | varchar
Facebook
Twitter1. Project Purpose and Value
Modern customer communication systems generate thousands of messages every day: operational notices, reminders, support responses, delivery updates, billing notifications, and more. While these messages reach their recipients, their clarity, tone, and emotional impact vary widely depending on the sender and the situation.
The purpose of this project is simple: to help companies automatically improve outgoing communication so it becomes clearer, more professional, and more aligned with brand expectations — without changing the core content or meaning.
The system evaluates a message, highlights potential issues (tone, sentiment, clarity), and offers improved variants instantly. This solves the common problem of inconsistent communication quality across teams, countries, and channels.
For businesses using Product solutions — especially Inspire, Parcel Locker software, or customer service platforms — the value is significant:
In other words, the system acts as a communication quality amplifier, which matches perfectly with this year’s internal mission: “The AI Amplifier: Doubling Our Customers’ World.”
2. Why This Solution Is Better Than Using ChatGPT Directly
Many companies already use ChatGPT or similar tools to rewrite messages. However, ChatGPT has several limitations when used in an enterprise workflow:
2.1. Lack of Integration
ChatGPT does not plug into:
Our project provides an API-driven solution that can be embedded anywhere in the company’s ecosystem.
2.2. Consistency and Control
Generic LLMs write messages in unpredictable styles.
Our model is tuned for:
This results in stable output, which is important for enterprise communication.
2.3. Privacy and Data Governance
Our system runs on-premises or in a private cloud, ensuring that:
This aligns with enterprise governance standards.
2.4. Domain-Specific Optimization
Generic LLMs are not optimized for:
Our project can be fine-tuned for industry-specific scenarios, which significantly increases relevance and practicality.
3. AI Models and Approaches Used
The solution uses a combination of lightweight and efficient NLP components:
3.1. Text Analysis
These components allow the system to highlight issues such as:
3.2. Message Rewriting
Instead of heavy GPU-based LLMs, the MVP uses:
This approach provides:
In future iterations we can fine-tune the model on anonymized internal communication samples for even more accuracy.
4. Machine Learning Concepts Applied
The project incorporates several ML principles:
Even as an MVP, it demonstrates practical use of ML without requiring large computational resources.
5. Key Benefits for Product
5.1. Immediately Applicable
The tool can be embedded into:
5.2. Aligns with Product’s Vision
The system amplifies customer value by:
This directly supports the theme of AI-powered enhancement.
5.3. Enhances Existing Products
Product Inspire customers can embed the optimizer via REST API, enabling:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Does ChatGPT deliver its explicit claim to be culturally sensitive and its implicit claim to be a friendly digital person when conversing with human users? These claims are investigated from the perspective of linguistic pragmatics, particularly Grice's cooperative principle in communication. Following the pattern of real-life communication, turn-taking conversations reveal limitations in the LLM's grasp of the entire contextual setting described in the prompt. The prompts included ethical issues, a hiking adventure, geographical orientation and bodily movement. For cultural sensitivity the prompts came from a Pakistani Muslim in English language, from a Hindu in English, and from a Chinese in Chinese language. The issues were deeply cultural issues involving feelings and affects. Qualitative analysis of the conversation pragmatics showed that ChatGPT is often unable to conduct conversations according to the pragmatic principles of quantity, reliable quality, remaining in focus, and being clear in expression. We conclude that ChatGPT should not be presented as a global LLM but be subdivided into several culture-specific modules.
Facebook
TwitterAI in Consumer Decision-Making: Global Zero-Party Dataset
This dataset captures how consumers around the world are using AI tools like ChatGPT, Perplexity, Gemini, Claude, and Copilot to guide their purchase decisions. It spans multiple product categories, demographics, and geographies, mapping the emerging role of AI as a decision-making companion across the consumer journey.
What Makes This Dataset Unique
Unlike datasets inferred from digital traces or modeled from third-party assumptions, this collection is built entirely on zero-party data: direct responses from consumers who voluntarily share their habits and preferences. That means the insights come straight from the people making the purchases, ensuring unmatched accuracy and relevance.
For FMCG leaders, retailers, and financial services strategists, this dataset provides the missing piece: visibility into how often consumers are letting AI shape their decisions, and where that influence is strongest.
Dataset Structure
Each record is enriched with: Product Category – from high-consideration items like electronics to daily staples such as groceries and snacks. AI Tool Used – identifying whether consumers turn to ChatGPT, Gemini, Perplexity, Claude, or Copilot. Influence Level – the percentage of consumers in a given context who rely on AI to guide their choices. Demographics – generational breakdowns from Gen Z through Boomers. Geographic Detail – city- and country-level coverage across Africa, LATAM, Asia, Europe, and North America.
This structure allows filtering and comparison across categories, age groups, and markets, giving users a multidimensional view of AI’s impact on purchasing.
Why It Matters
AI has become a trusted voice in consumers’ daily lives. From meal planning to product comparisons, many people now consult AI before making a purchase—often without realizing how much it shapes the options they consider. For brands, this means that the path to purchase increasingly runs through an AI filter.
This dataset provides a comprehensive view of that hidden step in the consumer journey, enabling decision-makers to quantify: How much AI shapes consumer thinking before they even reach the shelf or checkout. Which product categories are most influenced by AI consultation. How adoption varies by geography and generation. Which AI platforms are most commonly trusted by consumers.
Opportunities for Business Leaders
FMCG & Retail Brands: Understand where AI-driven decision-making is already reshaping category competition. Marketers: Identify demographic segments most likely to consult AI, enabling targeted strategies. Retailers: Align assortments and promotions with the purchase patterns influenced by AI queries. Investors & Innovators: Gauge market readiness for AI-integrated commerce solutions.
The dataset doesn’t just describe what’s happening—it opens doors to the “so what” questions that define strategy. Which categories are becoming algorithm-driven? Which markets are shifting fastest? Where is the opportunity to get ahead of competitors in an AI-shaped funnel?
Why Now
Consumer AI adoption is no longer a forecast; it is a daily behavior. Just as search engines once rewrote the rules of marketing, conversational AI is quietly rewriting how consumers decide what to buy. This dataset offers an early, detailed view into that change, giving brands the ability to act while competitors are still guessing.
What You Get
Users gain: A global, city-level view of AI adoption in consumer decision-making. Cross-category comparability to see where AI influence is strongest and weakest. Generational breakdowns that show how adoption differs between younger and older cohorts. AI platform analysis, highlighting how tool preferences vary by region and category. Every row is powered by zero-party input, ensuring the insights reflect actual consumer behavior—not modeled assumptions.
How It’s Used
Leverage this data to:
Validate strategies before entering new markets or categories. Benchmark competitors on AI readiness and influence. Identify growth opportunities in categories where AI-driven recommendations are rapidly shaping decisions. Anticipate risks where brand visibility could be disrupted by algorithmic mediation.
Core Insights
The full dataset reveals: Surprising adoption curves across categories where AI wasn’t expected to play a role. Geographic pockets where AI has already become a standard step in purchase decisions. Demographic contrasts showing who trusts AI most—and where skepticism still holds. Clear differences between AI platforms and the consumer profiles most drawn to each.
These patterns are not visible in traditional retail data, sales reports, or survey summaries. They are only captured here, directly from the consumers themselves.
Summary
Winning in FMCG and retail today means more than getting on shelves, capturing price points, or running promotions. It means understanding the invisible algorithms consumers are ...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A curated database of legal cases where generative AI produced hallucinated citations submitted in court filings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
What was the educational challenge? Reflective practice (RP) fosters professional growth but is often hindered by unclear purpose and minimal guidance. Designing comprehensive RP assignments is challenging because it takes time and not all faculty possess the skills to generate effective assignments. This innovation addressed two challenges: (1) creating clear, scaffolded reflective assignments using the transparent assessment framework (TAF) and (2) reducing faculty workload in assignment design. What was the solution? ChatGPT 4o was used to design RP assignments in health professions education (HPE). The TAF guided the design of the assessment. How was the solution implemented? A four-step process was followed to facilitate the design of the assignment. ChatGPT 4o was prompted to design the assignment and refined for the course. What lessons were learned that are relevant to a wider global audience? A pilot study in three graduate-level HPE courses showed ChatGPT-assisted assignments improved clarity, structure, and student performance while decreasing faculty preparation time. What are the next steps? We plan to expand this research to obtain student feedback on the effectiveness of the redesigned assignment and to explore the effectiveness of AI-generated reflective assignments with medical students who may require more structured guidance and contextualized prompts.
Facebook
TwitterAs of October 2025, Google represented ***** percent of the global online search engine referrals on desktop devices. Despite being much ahead of its competitors, this represents a modest increase from the previous months. Meanwhile, its longtime competitor Bing accounted for ***** percent, as tools like Yahoo and Yandex held shares of over **** percent and **** percent respectively. Google and the global search market Ever since the introduction of Google Search in 1997, the company has dominated the search engine market, while the shares of all other tools has been rather lopsided. The majority of Google revenues are generated through advertising. Its parent corporation, Alphabet, was one of the biggest internet companies worldwide as of 2024, with a market capitalization of **** trillion U.S. dollars. The company has also expanded its services to mail, productivity tools, enterprise products, mobile devices, and other ventures. As a result, Google earned one of the highest tech company revenues in 2024 with roughly ****** billion U.S. dollars. Search engine usage in different countries Google is the most frequently used search engine worldwide. But in some countries, its alternatives are leading or competing with it to some extent. As of the last quarter of 2023, more than ** percent of internet users in Russia used Yandex, whereas Google users represented little over ** percent. Meanwhile, Baidu was the most used search engine in China, despite a strong decrease in the percentage of internet users in the country accessing it. In other countries, like Japan and Mexico, people tend to use Yahoo along with Google. By the end of 2024, nearly half of the respondents in Japan said that they had used Yahoo in the past four weeks. In the same year, over ** percent of users in Mexico said they used Yahoo.
Facebook
TwitterThis dataset was created using data collected from ESPNcricinfo with assistance from ChatGPT and Google Gemini to organize, clean, and verify player statistics. It focuses on the ODI careers of MS Dhoni, Virat Kohli, and Rohit Sharma — three of India’s most consistent performers.
The goal was to build a structured dataset that helps in analyzing:
Year-wise performance trends
Career summaries and key metrics (Runs, Average, Strike Rate, etc.)
Dismissal patterns and bowler matchups
This dataset serves as a foundation for exploring sports analytics, practicing data cleaning, and creating interactive visualizations using Pandas, Seaborn, and Plotly.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
📘 Overview
This dataset consists of augmented Azerbaijani text pairs (clean & masked) that contain personally identifiable information (PII). All content has been automatically generated using ChatGPT to simulate sensitive data scenarios for tasks like PII detection, anonymization, entity masking, and secure data handling.
🔍 Dataset Structure
Each example is a paired record:
original: The full augmented Azerbaijani text containing PII. masked: The same text with PII… See the full description on the dataset page: https://huggingface.co/datasets/aimtune/az_personal_info_aug_masked.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive collection of synthetic job postings to facilitate research and analysis in the field of job market trends, natural language processing (NLP), and machine learning. Created for educational and research purposes, this dataset offers a diverse set of job listings across various industries and job types.
We would like to express our gratitude to the Python Faker library for its invaluable contribution to the dataset generation process. Additionally, we appreciate the guidance provided by ChatGPT in fine-tuning the dataset, ensuring its quality, and adhering to ethical standards.
Please note that the examples provided are fictional and for illustrative purposes. You can tailor the descriptions and examples to match the specifics of your dataset. It is not suitable for real-world applications and should only be used within the scope of research and experimentation. You can also reach me via email at: rrana157@gmail.com
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset builds upon the original ChatGPT Tweets dataset, adding preprocessing and analysis to further its usability and insights. It retains the original data collection period from Nov 30, 2022, to Feb 11, 2023, capturing global public reactions, opinions, and discussions during ChatGPT’s initial launch phase.
Enhancements include: - Data Cleaning: Encoding errors were corrected, invalid rows removed, and unnecessary elements including hashtags, URLs etc. were cleaned and standardized. - Sentiment Analysis: New columns were added, providing sentiment labels (Positive, Neutral, Negative) for each tweet using a Hugging Face pre-trained model.