23 datasets found
  1. ChatGPT Tweets Sentiment Analysis

    • kaggle.com
    zip
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fanta Sea (2025). ChatGPT Tweets Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/ryosyo0/chatgpt-tweets-sentiment-analysis-clean-data/discussion
    Explore at:
    zip(167724620 bytes)Available download formats
    Dataset updated
    Jan 23, 2025
    Authors
    Fanta Sea
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset builds upon the original ChatGPT Tweets dataset, adding preprocessing and analysis to further its usability and insights. It retains the original data collection period from Nov 30, 2022, to Feb 11, 2023, capturing global public reactions, opinions, and discussions during ChatGPT’s initial launch phase.

    Enhancements include: - Data Cleaning: Encoding errors were corrected, invalid rows removed, and unnecessary elements including hashtags, URLs etc. were cleaned and standardized. - Sentiment Analysis: New columns were added, providing sentiment labels (Positive, Neutral, Negative) for each tweet using a Hugging Face pre-trained model.

  2. ChatGPT User Reviews

    • kaggle.com
    zip
    Updated Jun 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavik Jikadara (2024). ChatGPT User Reviews [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/chatgpt-user-feedback
    Explore at:
    zip(5709734 bytes)Available download formats
    Dataset updated
    Jun 30, 2024
    Authors
    Bhavik Jikadara
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

    This dataset consists of daily-updated user reviews and ratings for the ChatGPT Android App. The dataset includes several key attributes that capture various aspects of the reviews, providing insights into user experiences and feedback over time.

    Columns Explanation

    • userName: The display name of the user who posted the review.
    • content: The text content of the review. This column contains the actual review text written by the user. It includes user opinions, feedback, and detailed descriptions of their experiences with the ChatGPT app.
    • score: The rating given by the user, typically ranging from 1 to 5. This column captures the numerical rating provided by the user. Higher scores indicate better experiences, while lower scores indicate dissatisfaction.
    • thumbsUpCount: The number of thumbs up (likes) the review received. This column shows how many other users found the review helpful or agreed with the sentiments expressed. It serves as a measure of the review's relevancy and impact.
    • at: The timestamp of when the review was posted. This column includes the date and time when the review was submitted. It is crucial for tracking the temporal distribution of reviews and analyzing trends over time.

    Collection Methods

    • Data Source: The data is collected from user reviews submitted through the ChatGPT Android App's review section on the Google Play Store.
    • Frequency: The dataset is updated daily to capture the most recent user feedback and ratings.
    • Automation: An automated script is used to scrape and compile the reviews, ensuring that the dataset is current and comprehensive.
    • Data Cleaning: Basic preprocessing is performed to ensure data quality, such as removing duplicates and handling missing values.
  3. 4

    Data associated with the article: "Exploring the Viability of ChatGPT for...

    • data.4tu.nl
    zip
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nina van Staalduine, Data associated with the article: "Exploring the Viability of ChatGPT for Personal Data Anonymization in Government: A Comprehensive Analysis of Possibilities, Risks, and Ethical Implications" [Dataset]. http://doi.org/10.4121/a1dfacbe-b463-404f-a3d7-dab8485e6458.v1
    Explore at:
    zipAvailable download formats
    Dataset provided by
    4TU.ResearchData
    Authors
    Nina van Staalduine
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Feb 2023 - Jul 2023
    Dataset funded by
    Justitiële Informatiedienst
    Description

    Artificial Intelligence (AI) applications are expected to promote government service delivery and quality, more efficient handling of cases, and bias reduction in decision-making. One potential benefit of the AI tool ChatGPT is that it may support governments in the anonymization of data. However, it is not clear whether ChatGPT is appropriate to support data anonymization for public organizations. Hence, this study examines the possibilities, risks, and ethical implications for government organizations to employ ChatGPT in the anonymization of personal data. We use a case study approach, combining informal conversations, formal interviews, a literature review, document analysis and experiments to conduct a three-step study. First, we describe the technology behind ChatGPT and its operation. Second, experiments with three types of data (fake data, original literature and modified literature) show that ChatGPT exhibits strong performance in anonymizing these three types of texts. Third, an overview of significant risks and ethical issues related to ChatGPT and its use for anonymization within a specific government organization was generated, including themes such as privacy, responsibility, transparency, bias, human intervention, and sustainability. One significant risk in the current form of ChatGPT is a privacy risk, as inputs are stored and forwarded to OpenAI and potentially other parties. This is unacceptable if texts containing personal data are anonymized with ChatGPT. We discuss several potential solutions to address these risks and ethical issues. This study contributes to the scarce scientific literature on the potential value of employing ChatGPT for personal data anonymization in government. In addition, this study has practical value for civil servants who face the challenges of data anonymization in practice including resource-intensive and costly processes.

  4. h

    LMSYS-Chat-GPT-5-Chat-Response

    • huggingface.co
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ytz (2025). LMSYS-Chat-GPT-5-Chat-Response [Dataset]. https://huggingface.co/datasets/ytz20/LMSYS-Chat-GPT-5-Chat-Response
    Explore at:
    Dataset updated
    Nov 17, 2025
    Authors
    ytz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🤖 LMSYS-Chat-GPT-5-Chat-Response

    The dataset used in Black-Box On-Policy Distillation of Large Language Models paper. Homepage at here. This dataset is an extension of the LMSYS-Chat-1M-Clean corpus, specifically curated by collecting high-quality, non-refusal responses from the GPT-5-Chat API. The LMSYS-Chat-1M dataset collects real-world user queries from the Chatbot Arena. There is no tool calls or reasoning in the GPT-5-Chat response.

      💾 Dataset Structure
    

    The… See the full description on the dataset page: https://huggingface.co/datasets/ytz20/LMSYS-Chat-GPT-5-Chat-Response.

  5. 4

    Supplementary data for the article: 'The use of ChatGPT for personality...

    • data.4tu.nl
    zip
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joost de Winter; Tom Driessen; Dimitra Dodou (2024). Supplementary data for the article: 'The use of ChatGPT for personality research: Administering questionnaires using generated personas' [Dataset]. http://doi.org/10.4121/6e0f2f2b-f1fc-4300-b8ca-eb9031a7b257.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 8, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Joost de Winter; Tom Driessen; Dimitra Dodou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Personality research has traditionally relied on questionnaires, which bring with them inherent limitations, such as response style bias. With the emergence of large language models such as ChatGPT, the question arises as to what extent these models can be used in personality research. In this study, ChatGPT (GPT-4) generated 2000 text-based personas. Next, for each persona, ChatGPT completed a short form of the Big Five Inventory (BFI-10), the Brief Sensation Seeking Scale (BSSS), and a Short Dark Triad (SD3). The mean scores on the BFI-10 items were found to correlate strongly with means from previously published research, and principal component analysis revealed a clear five-component structure. Certain relationships between traits, such as a negative correlation between the age of the persona and the BSSS score, were clearly interpretable, while some other correlations diverged from the literature. An additional analysis using four new sets of 2000 personas each, including a set of ‘realistic’ personas and a set of cinematic personas, showed that the correlation matrix among personality constructs was affected by the persona set. It is concluded that evaluating questionnaires and research hypotheses prior to engaging with real individuals holds promise.

  6. Enhancing Android Malware Detection: The Influence of ChatGPT on...

    • figshare.com
    txt
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    123 (2024). Enhancing Android Malware Detection: The Influence of ChatGPT on Decision-centric Task [Dataset]. http://doi.org/10.6084/m9.figshare.27004879.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    123
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We explore the transformative impact of non-decision models, specifically ChatGPT, on the traditional decision-centric task of Android malware detection. Through a series of carefully designed experiments using publicly available datasets, this study reveals a paradigm shift. It reveals a serious lack of interpretability in decision-driven solutions, raising concerns about their reliability. In contrast, ChatGPT, as a non-decision-making model, is good at providing comprehensive analysis reports and significantly enhances interpretability. We give developers more insights through a non-decision-making perspective.ChatGPTYou can find ChatGPT’s analysis report on [APK_Analysis].Project StructureAPK List: It contains the SHA256 of malicious and benign samples.DatasetAll samples we used in our experiments you can find at [kronodroid].Survey ResultsWe collect responses from 101 participants and process their data by removing personal or sensitive information.Data preparation includes:Converting speech to textTranslating Chinese responses to EnglishRemoving redundant modal particlesThese processes ensure that the data is clean and structured, allowing for accurate and efficient analysis.

  7. Minimal dataset.

    • plos.figshare.com
    txt
    Updated Mar 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avishek Choudhury; Safa Elkefi; Achraf Tounsi (2024). Minimal dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0296151.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 8, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Avishek Choudhury; Safa Elkefi; Achraf Tounsi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As ChatGPT emerges as a potential ally in healthcare decision-making, it is imperative to investigate how users leverage and perceive it. The repurposing of technology is innovative but brings risks, especially since AI’s effectiveness depends on the data it’s fed. In healthcare, ChatGPT might provide sound advice based on current medical knowledge, which could turn into misinformation if its data sources later include erroneous information. Our study assesses user perceptions of ChatGPT, particularly of those who used ChatGPT for healthcare-related queries. By examining factors such as competence, reliability, transparency, trustworthiness, security, and persuasiveness of ChatGPT, the research aimed to understand how users rely on ChatGPT for health-related decision-making. A web-based survey was distributed to U.S. adults using ChatGPT at least once a month. Bayesian Linear Regression was used to understand how much ChatGPT aids in informed decision-making. This analysis was conducted on subsets of respondents, both those who used ChatGPT for healthcare decisions and those who did not. Qualitative data from open-ended questions were analyzed using content analysis, with thematic coding to extract public opinions on urban environmental policies. Six hundred and seven individuals responded to the survey. Respondents were distributed across 306 US cities of which 20 participants were from rural cities. Of all the respondents, 44 used ChatGPT for health-related queries and decision-making. In the healthcare context, the most effective model highlights ’Competent + Trustworthy + ChatGPT for healthcare queries’, underscoring the critical importance of perceived competence and trustworthiness specifically in the realm of healthcare applications of ChatGPT. On the other hand, the non-healthcare context reveals a broader spectrum of influential factors in its best model, which includes ’Trustworthy + Secure + Benefits outweigh risks + Satisfaction + Willing to take decisions + Intent to use + Persuasive’. In conclusion our study findings suggest a clear demarcation in user expectations and requirements from AI systems based on the context of their use. We advocate for a balanced approach where technological advancement and user readiness are harmonized.

  8. AI Assistant Usage in Student Life

    • kaggle.com
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayesha Saleem (2025). AI Assistant Usage in Student Life [Dataset]. https://www.kaggle.com/datasets/ayeshasal89/ai-assistant-usage-in-student-life-synthetic/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ayesha Saleem
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    If you find this dataset useful, a quick upvote would be greatly appreciated 🙌 It helps more learners discover it!
    

    AI Assistant Usage in Student Life

    Explore how students at different academic levels use AI tools like ChatGPT for tasks such as coding, writing, studying, and brainstorming. Designed for learning, EDA, and ML experimentation.

    What is this dataset?

    This dataset simulates 10,000 sessions of students interacting with an AI assistant (like ChatGPT or similar tools) for various academic tasks. Each row represents a single session, capturing the student’s level, discipline, type of task, session length, AI effectiveness, satisfaction rating, and whether they reused the AI tool later.

    Why was this dataset created?

    As AI tools become mainstream in education, there's a need to analyze and model how students interact with them. However, no public datasets exist for this behavior. This dataset fills that gap by providing a safe, fully synthetic yet realistic simulation for:

    • EDA and visualization practice
    • Machine learning modeling
    • Feature engineering workflows
    • Educational data science exploration

    It’s ideal for students, data science learners, and researchers who want real-world use cases without privacy or copyright constraints.

    How is the dataset structured?

    ColumnDescription
    SessionIDUnique session identifier
    StudentLevelAcademic level: High School, Undergraduate, Graduate
    DisciplineStudent’s field of study (e.g., CS, Psychology, etc.)
    SessionDateDate of the session
    SessionLengthMinLength of AI interaction in minutes
    TotalPromptsNumber of prompts/messages used
    TaskTypeNature of the task (e.g., Coding, Writing, Research)
    AI_AssistanceLevel1–5 scale on how helpful the AI was perceived to be
    FinalOutcomeWhat the student achieved: Assignment Completed, Idea Drafted, etc.
    UsedAgainWhether the student returned to use the assistant again
    SatisfactionRating1–5 rating of overall satisfaction with the session

    All data is synthetically generated using controlled distributions, real-world logic, and behavioral modeling to reflect realistic usage patterns.

    Possible Use Cases

    This dataset is rich with potential for:

    • EDA: Visualize session behavior across levels, tasks, or disciplines
    • Classification: Predict likelihood of reuse (UsedAgain) or final outcome
    • Regression: Model satisfaction or session length based on context
    • Clustering: Segment students by AI interaction behavior
    • Feature engineering practice: Derive prompt density, session efficiency, or task difficulty
    • Survey-style analysis: Discover what makes students satisfied or frustrated

    Key Features

    • Clean and ready-to-use CSV
    • Balanced and realistic distributions
    • No missing values
    • Highly relatable academic context
  9. Global AI Tool Adoption Across Industries

    • kaggle.com
    zip
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishi (2025). Global AI Tool Adoption Across Industries [Dataset]. https://www.kaggle.com/tfisthis/global-ai-tool-adoption-across-industries
    Explore at:
    zip(18481524 bytes)Available download formats
    Dataset updated
    Jun 3, 2025
    Authors
    Rishi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Global AI Tool Adoption Across Industries and Regions (2023–2025)

    A comprehensive, research-grade dataset capturing the adoption, usage, and impact of leading AI tools—such as ChatGPT, Midjourney, Stable Diffusion, Bard, and Claude—across multiple industries, countries, and user demographics. This dataset is designed for advanced analytics, machine learning, natural language processing, and business intelligence applications.

    Dataset Overview

    This dataset provides a panoramic view of how AI technologies are transforming business, industry, and society worldwide. Drawing inspiration from real-world adoption surveys, academic research, and industry reports, it enables users to:

    • Analyze adoption rates of popular AI tools across regions and sectors.
    • Study user demographics and company profiles influencing AI integration.
    • Explore textual user feedback for sentiment and topic modeling.
    • Perform time series analysis on AI adoption trends from 2023 to 2025.
    • Benchmark industries, countries, and company sizes for AI readiness.

    To add a column descriptor (column description) to your Kaggle dataset's Data Card, you should provide a clear and concise explanation for each column. This improves dataset usability and helps users understand your data structure, which is highly recommended for achieving a 10/10 usability score on Kaggle[2][9].

    Below is a ready-to-copy Column Descriptions table for your dataset. You can paste this into the "Column Descriptions" section of your Kaggle Data Card (after clicking the pencil/edit icon in the Data tab)[2][9]:

    Column Descriptions

    Column NameDescription
    countryCountry where the organization or user is located (e.g., USA, India, China, etc.)
    industryIndustry sector of the organization (e.g., Technology, Healthcare, Retail, etc.)
    ai_toolName of the AI tool used (e.g., ChatGPT, Midjourney, Bard, Stable Diffusion, Claude)
    adoption_ratePercentage representing the adoption rate of the AI tool within the sector or company (0–100)
    daily_active_usersEstimated number of daily active users for the AI tool in the given context
    yearYear in which the data was recorded (2023 or 2024)
    user_feedbackFree-text feedback from users about their experience with the AI tool (up to 150 characters)
    age_groupAge group of users (e.g., 18-24, 25-34, 35-44, 45-54, 55+)
    company_sizeSize category of the organization (Startup, SME, Enterprise)

    Example Data

    country,industry,ai_tool,adoption_rate,daily_active_users,year,user_feedback,age_group,company_size
    USA,Technology,ChatGPT,78.5,5423,2024,"Great productivity boost for our team!",25-34,Enterprise
    India,Healthcare,Midjourney,62.3,2345,2024,"Improved patient engagement and workflow.",35-44,SME
    Germany,Manufacturing,Stable Diffusion,45.1,1842,2023,"Enhanced our design process.",45-54,Enterprise
    Brazil,Retail,Bard,33.2,1200,2024,"Helped automate our customer support.",18-24,Startup
    UK,Finance,Claude,55.7,2100,2023,"Increased accuracy in financial forecasting.",25-34,SME
    

    How to Use This Dataset

    1. Load and Preview the Data

    import pandas as pd
    
    df = pd.read_csv('/path/to/ai_adoption_dataset.csv')
    print(df.head())
    print(df.info())
    

    2. Analyze Adoption Rates by Industry and Country

    industry_adoption = df.groupby(['industry', 'country'])['adoption_rate'].mean().reset_index()
    print(industry_adoption.sort_values(by='adoption_rate', ascending=False).head(10))
    

    3. Visualize AI Tool Popularity

    import matplotlib.pyplot as plt
    
    tool_counts = df['ai_tool'].value_counts()
    tool_counts.plot(kind='bar', title='AI Tool Usage Distribution')
    plt.xlabel('AI Tool')
    plt.ylabel('Number of Records')
    plt.show()
    

    4. Sentiment Analysis on User Feedback

    from textblob import TextBlob
    
    df['feedback_sentiment'] = df['user_feedback'].apply(lambda x: TextBlob(x).sentiment.polarity)
    print(df[['user_feedback', 'feedback_sentiment']].head())
    

    5. Time Series Analysis of Adoption Trends

    yearly_trends = df.groupby(['year', 'ai_tool'])['adoption_rate'].mean().unstack()
    yearly_trends.plot(marker='o', title='AI Tool Adoption Rate Over Time')
    plt.xlabel('Year')
    plt.ylabel('Average Adoption Rate (%)')
    plt.show()
    

    **6. Demographic Insights*...

  10. f

    Data from: ChatGPT in Drug Discovery: A Case Study on Anticocaine Addiction...

    • acs.figshare.com
    zip
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Wang; Hongsong Feng; Guo-Wei Wei (2023). ChatGPT in Drug Discovery: A Case Study on Anticocaine Addiction Drug Development with Chatbots [Dataset]. http://doi.org/10.1021/acs.jcim.3c01429.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 13, 2023
    Dataset provided by
    ACS Publications
    Authors
    Rui Wang; Hongsong Feng; Guo-Wei Wei
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The birth of ChatGPT, a cutting-edge language model-based chatbot developed by OpenAI, ushered in a new era in AI. However, due to potential pitfalls, its role in rigorous scientific research is not clear yet. This paper vividly showcases its innovative application within the field of drug discovery. Focused specifically on developing anticocaine addiction drugs, the study employs GPT-4 as a virtual guide, offering strategic and methodological insights to researchers working on generative models for drug candidates. The primary objective is to generate optimal drug-like molecules with desired properties. By leveraging the capabilities of ChatGPT, the study introduces a novel approach to the drug discovery process. This symbiotic partnership between AI and researchers transforms how drug development is approached. Chatbots become facilitators, steering researchers toward innovative methodologies and productive paths for creating effective drug candidates. This research sheds light on the collaborative synergy between human expertise and AI assistance, wherein ChatGPT’s cognitive abilities enhance the design and development of pharmaceutical solutions. This paper not only explores the integration of advanced AI in drug discovery but also reimagines the landscape by advocating for AI-powered chatbots as trailblazers in revolutionizing therapeutic innovation.

  11. WikiTajrobe Dataset

    • kaggle.com
    zip
    Updated Aug 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erfan Tangestani (2024). WikiTajrobe Dataset [Dataset]. https://www.kaggle.com/datasets/thisiserfan/wikitajrobe-dataset
    Explore at:
    zip(7334206 bytes)Available download formats
    Dataset updated
    Aug 22, 2024
    Authors
    Erfan Tangestani
    Description

    All data of experiences (reviews), comments, and companies on WikiTajrobe website.

    File Descriptions

    • reviews.csv (All reviews on WikiTajrobe)
    • comments.csv (All comments on WikiTajrobe)
    • companies.csv (All companies on WikiTajrobe)
    • companies_info.csv (Additional information about companies)
    • mapping_job_titles.csv (To clean the text data)
    • persian_stop_words.txt (To clean the text data)

    Data Fields

    1. reviews.csv

    | Field | Description | Data Type | --- | --- | | id | Primary key | integer | wt_id | Review ID on the WikiTajrobe website | integer | company_id | Company ID on the WikiTajrobe website | integer | default_tag | Is the experience related to work experience or a job interview? | varchar | danger_tag | Does the experience contain a report of sexual harassment? | varchar | title | Experience title | varchar | text | Experience text | varchar | job_title | Job title of the experience submitter | varchar | status | Employment status of the experience submitter | varchar | score | Rating given to the company | integer | salary_offer | Salary offer in the job interview | varchar | salary | Salary received in the work experience | varchar | publish_date | Date of experience publication | varchar | interview_date | Date of interview | varchar | employment_start_date | Start date of the job | varchar | cell_group | The fields of this JSON format are presented as separate fields | jsonb | created_at | Record timestamp in the database | datetime

    2. comments.csv

    | Field | Description | Data Type | --- | --- | | id | Primary key | integer | review_id | Review ID on the WikiTajrobe website | integer | company_id | Company ID on the WikiTajrobe website | integer | text | Comment text | varchar | time_elapsed | Time elapsed since the comment was created | varchar | created_at | Record timestamp in the database | datetime

    3. companies.csv

    | Field | Description | Data Type | --- | --- | | id | Primary key | integer | wt_id | Company ID on the WikiTajrobe website | integer | name | Company name | varchar | username | Company username | varchar | created_at | Record timestamp in the database | datetime

    4. mapping_job_titles.csv

    Created with ChatGPT for Data Cleaning. | Field | Description | Data Type | --- | --- | | job_title | Job title of the experience submitter | varchar | categorized_job_title | Categorized job titles for mapping | varchar

    5. companies_info.csv

    | Field | Description | Data Type | --- | --- | | company_name | Company name | varchar | company_size | Company size | varchar | company_industry | Company industry | varchar

  12. AI-Driven Customer Communication Optimizer

    • kaggle.com
    zip
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrii Siryi (2025). AI-Driven Customer Communication Optimizer [Dataset]. https://www.kaggle.com/datasets/asiryi/ai-driven-customer-communication-optimizer
    Explore at:
    zip(12070 bytes)Available download formats
    Dataset updated
    Nov 13, 2025
    Authors
    Andrii Siryi
    Description

    1. Project Purpose and Value

    Modern customer communication systems generate thousands of messages every day: operational notices, reminders, support responses, delivery updates, billing notifications, and more. While these messages reach their recipients, their clarity, tone, and emotional impact vary widely depending on the sender and the situation.

    The purpose of this project is simple: to help companies automatically improve outgoing communication so it becomes clearer, more professional, and more aligned with brand expectations — without changing the core content or meaning.

    The system evaluates a message, highlights potential issues (tone, sentiment, clarity), and offers improved variants instantly. This solves the common problem of inconsistent communication quality across teams, countries, and channels.

    For businesses using Product solutions — especially Inspire, Parcel Locker software, or customer service platforms — the value is significant:

    • Higher engagement thanks to more readable and polite messages
    • More consistent tone across teams and geographies
    • Reduced load on support teams caused by miscommunication
    • Faster message preparation, especially for non-native English speakers
    • No need for employees to rely on external tools like ChatGPT

    In other words, the system acts as a communication quality amplifier, which matches perfectly with this year’s internal mission: “The AI Amplifier: Doubling Our Customers’ World.”

    2. Why This Solution Is Better Than Using ChatGPT Directly

    Many companies already use ChatGPT or similar tools to rewrite messages. However, ChatGPT has several limitations when used in an enterprise workflow:

    2.1. Lack of Integration

    ChatGPT does not plug into:

    • Product Inspire workflows
    • Parcel locker notification systems
    • Back-office CRM
    • Ticketing systems
    • Internal applications

    Our project provides an API-driven solution that can be embedded anywhere in the company’s ecosystem.

    2.2. Consistency and Control

    Generic LLMs write messages in unpredictable styles.

    Our model is tuned for:

    • professional tone
    • short, actionable sentences
    • business communication etiquette
    • avoiding unnecessary embellishments

    This results in stable output, which is important for enterprise communication.

    2.3. Privacy and Data Governance

    Our system runs on-premises or in a private cloud, ensuring that:

    • customer data never leaves the company’s environment
    • compliance rules and retention policies are respected
    • sensitive communication is not processed by 3rd-party services

    This aligns with enterprise governance standards.

    2.4. Domain-Specific Optimization

    Generic LLMs are not optimized for:

    • postal services
    • logistics
    • payment reminders
    • support ticket replies
    • delivery notifications
    • B2B communication templates

    Our project can be fine-tuned for industry-specific scenarios, which significantly increases relevance and practicality.

    3. AI Models and Approaches Used

    The solution uses a combination of lightweight and efficient NLP components:

    3.1. Text Analysis

    • spaCy — entity extraction, linguistic analysis
    • TextBlob — sentiment scoring
    • Custom heuristics — readability, clarity estimation

    These components allow the system to highlight issues such as:

    • negative tone
    • overly long sentences
    • unclear structure
    • missing context

    3.2. Message Rewriting

    Instead of heavy GPU-based LLMs, the MVP uses:

    • FLAN-T5-Base (or Small) for rewriting
    • beam search for generating clean and controlled text
    • tone-conditioning prompts (“professional”, “friendly”, “neutral”)

    This approach provides:

    • deterministic output
    • small model size
    • fast inference
    • no GPU requirement

    In future iterations we can fine-tune the model on anonymized internal communication samples for even more accuracy.

    4. Machine Learning Concepts Applied

    The project incorporates several ML principles:

    • Sequence-to-sequence text transformation
    • Prompt-conditioning and tone control
    • Sentiment and semantic scoring
    • Text summarization-style rewriting
    • NLP pipelines for error detection

    Even as an MVP, it demonstrates practical use of ML without requiring large computational resources.

    5. Key Benefits for Product

    5.1. Immediately Applicable

    The tool can be embedded into:

    • customer communication preparation workflows
    • template editing tools
    • agent-assist dashboards
    • locker notification systems
    • CRM and ERP integrations
    • internal documentation platforms

    5.2. Aligns with Product’s Vision

    The system amplifies customer value by:

    • doubling communication clarity
    • reducing friction
    • improving professionalism
    • speeding up message creation

    This directly supports the theme of AI-powered enhancement.

    5.3. Enhances Existing Products

    Product Inspire customers can embed the optimizer via REST API, enabling:

    • auto-rewriting message templates
    • adjusting...
  13. Z

    Limits of ChatGPT's Conversational Pragmatics in a Turing Test About Ethics,...

    • data-staging.niaid.nih.gov
    Updated Jan 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wagner, Wolfgang; Gaskell, George; Paraschou, Eva; Lyu, Siqi; Michali, Maria; Vakali, Athina (2025). Limits of ChatGPT's Conversational Pragmatics in a Turing Test About Ethics, Commonsense, and Cultural Sensitivity [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14762323
    Explore at:
    Dataset updated
    Jan 31, 2025
    Dataset provided by
    London School of Economics and Political Science
    University of Tartu
    Aristotle University of Thessaloniki
    South East European Research Centre
    Authors
    Wagner, Wolfgang; Gaskell, George; Paraschou, Eva; Lyu, Siqi; Michali, Maria; Vakali, Athina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Does ChatGPT deliver its explicit claim to be culturally sensitive and its implicit claim to be a friendly digital person when conversing with human users? These claims are investigated from the perspective of linguistic pragmatics, particularly Grice's cooperative principle in communication. Following the pattern of real-life communication, turn-taking conversations reveal limitations in the LLM's grasp of the entire contextual setting described in the prompt. The prompts included ethical issues, a hiking adventure, geographical orientation and bodily movement. For cultural sensitivity the prompts came from a Pakistani Muslim in English language, from a Hindu in English, and from a Chinese in Chinese language. The issues were deeply cultural issues involving feelings and affects. Qualitative analysis of the conversation pragmatics showed that ChatGPT is often unable to conduct conversations according to the pragmatic principles of quantity, reliable quality, remaining in focus, and being clear in expression. We conclude that ChatGPT should not be presented as a global LLM but be subdivided into several culture-specific modules.

  14. d

    AI in Consumer Decision Making | Global Coverage | 190+ Countries

    • datarade.ai
    .json, .csv, .xls
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rwazi (2025). AI in Consumer Decision Making | Global Coverage | 190+ Countries [Dataset]. https://datarade.ai/data-products/ai-in-consumer-decision-making-global-coverage-190-count-rwazi
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Aug 21, 2025
    Dataset authored and provided by
    Rwazihttp://rwazi.com/
    Area covered
    United Kingdom
    Description

    AI in Consumer Decision-Making: Global Zero-Party Dataset

    This dataset captures how consumers around the world are using AI tools like ChatGPT, Perplexity, Gemini, Claude, and Copilot to guide their purchase decisions. It spans multiple product categories, demographics, and geographies, mapping the emerging role of AI as a decision-making companion across the consumer journey.

    What Makes This Dataset Unique

    Unlike datasets inferred from digital traces or modeled from third-party assumptions, this collection is built entirely on zero-party data: direct responses from consumers who voluntarily share their habits and preferences. That means the insights come straight from the people making the purchases, ensuring unmatched accuracy and relevance.

    For FMCG leaders, retailers, and financial services strategists, this dataset provides the missing piece: visibility into how often consumers are letting AI shape their decisions, and where that influence is strongest.

    Dataset Structure

    Each record is enriched with: Product Category – from high-consideration items like electronics to daily staples such as groceries and snacks. AI Tool Used – identifying whether consumers turn to ChatGPT, Gemini, Perplexity, Claude, or Copilot. Influence Level – the percentage of consumers in a given context who rely on AI to guide their choices. Demographics – generational breakdowns from Gen Z through Boomers. Geographic Detail – city- and country-level coverage across Africa, LATAM, Asia, Europe, and North America.

    This structure allows filtering and comparison across categories, age groups, and markets, giving users a multidimensional view of AI’s impact on purchasing.

    Why It Matters

    AI has become a trusted voice in consumers’ daily lives. From meal planning to product comparisons, many people now consult AI before making a purchase—often without realizing how much it shapes the options they consider. For brands, this means that the path to purchase increasingly runs through an AI filter.

    This dataset provides a comprehensive view of that hidden step in the consumer journey, enabling decision-makers to quantify: How much AI shapes consumer thinking before they even reach the shelf or checkout. Which product categories are most influenced by AI consultation. How adoption varies by geography and generation. Which AI platforms are most commonly trusted by consumers.

    Opportunities for Business Leaders

    FMCG & Retail Brands: Understand where AI-driven decision-making is already reshaping category competition. Marketers: Identify demographic segments most likely to consult AI, enabling targeted strategies. Retailers: Align assortments and promotions with the purchase patterns influenced by AI queries. Investors & Innovators: Gauge market readiness for AI-integrated commerce solutions.

    The dataset doesn’t just describe what’s happening—it opens doors to the “so what” questions that define strategy. Which categories are becoming algorithm-driven? Which markets are shifting fastest? Where is the opportunity to get ahead of competitors in an AI-shaped funnel?

    Why Now

    Consumer AI adoption is no longer a forecast; it is a daily behavior. Just as search engines once rewrote the rules of marketing, conversational AI is quietly rewriting how consumers decide what to buy. This dataset offers an early, detailed view into that change, giving brands the ability to act while competitors are still guessing.

    What You Get

    Users gain: A global, city-level view of AI adoption in consumer decision-making. Cross-category comparability to see where AI influence is strongest and weakest. Generational breakdowns that show how adoption differs between younger and older cohorts. AI platform analysis, highlighting how tool preferences vary by region and category. Every row is powered by zero-party input, ensuring the insights reflect actual consumer behavior—not modeled assumptions.

    How It’s Used

    Leverage this data to:

    Validate strategies before entering new markets or categories. Benchmark competitors on AI readiness and influence. Identify growth opportunities in categories where AI-driven recommendations are rapidly shaping decisions. Anticipate risks where brand visibility could be disrupted by algorithmic mediation.

    Core Insights

    The full dataset reveals: Surprising adoption curves across categories where AI wasn’t expected to play a role. Geographic pockets where AI has already become a standard step in purchase decisions. Demographic contrasts showing who trusts AI most—and where skepticism still holds. Clear differences between AI platforms and the consumer profiles most drawn to each.

    These patterns are not visible in traditional retail data, sales reports, or survey summaries. They are only captured here, directly from the consumers themselves.

    Summary

    Winning in FMCG and retail today means more than getting on shelves, capturing price points, or running promotions. It means understanding the invisible algorithms consumers are ...

  15. d

    AI Hallucination Cases Database

    • damiencharlotin.com
    Updated Nov 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damien Charlotin (2025). AI Hallucination Cases Database [Dataset]. https://www.damiencharlotin.com/hallucinations/
    Explore at:
    Dataset updated
    Nov 17, 2025
    Authors
    Damien Charlotin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A curated database of legal cases where generative AI produced hallucinated citations submitted in court filings.

  16. Data from: Enhancing reflective practice with ChatGPT: A new approach to...

    • tandf.figshare.com
    docx
    Updated Aug 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anita Samuel; Michael Soh; Eulho Jung (2025). Enhancing reflective practice with ChatGPT: A new approach to assignment design [Dataset]. http://doi.org/10.6084/m9.figshare.28359807.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Aug 17, 2025
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Anita Samuel; Michael Soh; Eulho Jung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What was the educational challenge? Reflective practice (RP) fosters professional growth but is often hindered by unclear purpose and minimal guidance. Designing comprehensive RP assignments is challenging because it takes time and not all faculty possess the skills to generate effective assignments. This innovation addressed two challenges: (1) creating clear, scaffolded reflective assignments using the transparent assessment framework (TAF) and (2) reducing faculty workload in assignment design. What was the solution? ChatGPT 4o was used to design RP assignments in health professions education (HPE). The TAF guided the design of the assessment. How was the solution implemented? A four-step process was followed to facilitate the design of the assignment. ChatGPT 4o was prompted to design the assignment and refined for the course. What lessons were learned that are relevant to a wider global audience? A pilot study in three graduate-level HPE courses showed ChatGPT-assisted assignments improved clarity, structure, and student performance while decreasing faculty preparation time. What are the next steps? We plan to expand this research to obtain student feedback on the effectiveness of the redesigned assignment and to explore the effectiveness of AI-generated reflective assignments with medical students who may require more structured guidance and contextualized prompts.

  17. Market share of leading desktop search engines worldwide monthly 2015-2025

    • statista.com
    • freeagenlt.com
    • +1more
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Market share of leading desktop search engines worldwide monthly 2015-2025 [Dataset]. https://www.statista.com/statistics/216573/worldwide-market-share-of-search-engines/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2015 - Oct 2025
    Area covered
    Worldwide
    Description

    As of October 2025, Google represented ***** percent of the global online search engine referrals on desktop devices. Despite being much ahead of its competitors, this represents a modest increase from the previous months. Meanwhile, its longtime competitor Bing accounted for ***** percent, as tools like Yahoo and Yandex held shares of over **** percent and **** percent respectively. Google and the global search market Ever since the introduction of Google Search in 1997, the company has dominated the search engine market, while the shares of all other tools has been rather lopsided. The majority of Google revenues are generated through advertising. Its parent corporation, Alphabet, was one of the biggest internet companies worldwide as of 2024, with a market capitalization of **** trillion U.S. dollars. The company has also expanded its services to mail, productivity tools, enterprise products, mobile devices, and other ventures. As a result, Google earned one of the highest tech company revenues in 2024 with roughly ****** billion U.S. dollars. Search engine usage in different countries Google is the most frequently used search engine worldwide. But in some countries, its alternatives are leading or competing with it to some extent. As of the last quarter of 2023, more than ** percent of internet users in Russia used Yandex, whereas Google users represented little over ** percent. Meanwhile, Baidu was the most used search engine in China, despite a strong decrease in the percentage of internet users in the country accessing it. In other countries, like Japan and Mexico, people tend to use Yahoo along with Google. By the end of 2024, nearly half of the respondents in Japan said that they had used Yahoo in the past four weeks. In the same year, over ** percent of users in Mexico said they used Yahoo.

  18. Rohit_sharma_Virat_Dhoni_ODI_stats

    • kaggle.com
    zip
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sukhdev Gujjar (2025). Rohit_sharma_Virat_Dhoni_ODI_stats [Dataset]. https://www.kaggle.com/datasets/sukhigujjar/rohit-sharma-virat-dhoni-odi-stats
    Explore at:
    zip(2805 bytes)Available download formats
    Dataset updated
    Nov 7, 2025
    Authors
    Sukhdev Gujjar
    Description

    This dataset was created using data collected from ESPNcricinfo with assistance from ChatGPT and Google Gemini to organize, clean, and verify player statistics. It focuses on the ODI careers of MS Dhoni, Virat Kohli, and Rohit Sharma — three of India’s most consistent performers.

    The goal was to build a structured dataset that helps in analyzing:

    Year-wise performance trends

    Career summaries and key metrics (Runs, Average, Strike Rate, etc.)

    Dismissal patterns and bowler matchups

    This dataset serves as a foundation for exploring sports analytics, practicing data cleaning, and creating interactive visualizations using Pandas, Seaborn, and Plotly.

  19. h

    az_personal_info_aug_masked

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamza Agar, az_personal_info_aug_masked [Dataset]. https://huggingface.co/datasets/aimtune/az_personal_info_aug_masked
    Explore at:
    Authors
    Hamza Agar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📘 Overview

    This dataset consists of augmented Azerbaijani text pairs (clean & masked) that contain personally identifiable information (PII). All content has been automatically generated using ChatGPT to simulate sensitive data scenarios for tasks like PII detection, anonymization, entity masking, and secure data handling.

      🔍 Dataset Structure
    

    Each example is a paired record:

    original: The full augmented Azerbaijani text containing PII. masked: The same text with PII… See the full description on the dataset page: https://huggingface.co/datasets/aimtune/az_personal_info_aug_masked.

  20. Job Dataset

    • kaggle.com
    zip
    Updated Sep 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravender Singh Rana (2023). Job Dataset [Dataset]. https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset
    Explore at:
    zip(479575920 bytes)Available download formats
    Dataset updated
    Sep 17, 2023
    Authors
    Ravender Singh Rana
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Job Dataset

    This dataset provides a comprehensive collection of synthetic job postings to facilitate research and analysis in the field of job market trends, natural language processing (NLP), and machine learning. Created for educational and research purposes, this dataset offers a diverse set of job listings across various industries and job types.

    Descriptions for each of the columns in the dataset:

    1. Job Id: A unique identifier for each job posting.
    2. Experience: The required or preferred years of experience for the job.
    3. Qualifications: The educational qualifications needed for the job.
    4. Salary Range: The range of salaries or compensation offered for the position.
    5. Location: The city or area where the job is located.
    6. Country: The country where the job is located.
    7. Latitude: The latitude coordinate of the job location.
    8. Longitude: The longitude coordinate of the job location.
    9. Work Type: The type of employment (e.g., full-time, part-time, contract).
    10. Company Size: The approximate size or scale of the hiring company.
    11. Job Posting Date: The date when the job posting was made public.
    12. Preference: Special preferences or requirements for applicants (e.g., Only Male or Only Female, or Both)
    13. Contact Person: The name of the contact person or recruiter for the job.
    14. Contact: Contact information for job inquiries.
    15. Job Title: The job title or position being advertised.
    16. Role: The role or category of the job (e.g., software developer, marketing manager).
    17. Job Portal: The platform or website where the job was posted.
    18. Job Description: A detailed description of the job responsibilities and requirements.
    19. Benefits: Information about benefits offered with the job (e.g., health insurance, retirement plans).
    20. Skills: The skills or qualifications required for the job.
    21. Responsibilities: Specific responsibilities and duties associated with the job.
    22. Company Name: The name of the hiring company.
    23. Company Profile: A brief overview of the company's background and mission.

    Potential Use Cases:

    • Building predictive models to forecast job market trends.
    • Enhancing job recommendation systems for job seekers.
    • Developing NLP models for resume parsing and job matching.
    • Analyzing regional job market disparities and opportunities.
    • Exploring salary prediction models for various job roles.

    Acknowledgements:

    We would like to express our gratitude to the Python Faker library for its invaluable contribution to the dataset generation process. Additionally, we appreciate the guidance provided by ChatGPT in fine-tuning the dataset, ensuring its quality, and adhering to ethical standards.

    Note:

    Please note that the examples provided are fictional and for illustrative purposes. You can tailor the descriptions and examples to match the specifics of your dataset. It is not suitable for real-world applications and should only be used within the scope of research and experimentation. You can also reach me via email at: rrana157@gmail.com

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fanta Sea (2025). ChatGPT Tweets Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/ryosyo0/chatgpt-tweets-sentiment-analysis-clean-data/discussion
Organization logo

ChatGPT Tweets Sentiment Analysis

Data Cleaning and Sentiment Insights from Early Tweets About ChatGPT

Explore at:
19 scholarly articles cite this dataset (View in Google Scholar)
zip(167724620 bytes)Available download formats
Dataset updated
Jan 23, 2025
Authors
Fanta Sea
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset builds upon the original ChatGPT Tweets dataset, adding preprocessing and analysis to further its usability and insights. It retains the original data collection period from Nov 30, 2022, to Feb 11, 2023, capturing global public reactions, opinions, and discussions during ChatGPT’s initial launch phase.

Enhancements include: - Data Cleaning: Encoding errors were corrected, invalid rows removed, and unnecessary elements including hashtags, URLs etc. were cleaned and standardized. - Sentiment Analysis: New columns were added, providing sentiment labels (Positive, Neutral, Negative) for each tweet using a Hugging Face pre-trained model.

Search
Clear search
Close search
Google apps
Main menu