46 datasets found
  1. t

    Producing Charts with AI - Data Analysis

    • tomtunguz.com
    Updated Jul 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomasz Tunguz (2023). Producing Charts with AI - Data Analysis [Dataset]. https://tomtunguz.com/data-analysis-gpt/
    Explore at:
    Dataset updated
    Jul 17, 2023
    Dataset provided by
    Theory Ventures
    Authors
    Tomasz Tunguz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Discover how AI code interpreters are revolutionizing data visualization, reducing chart creation time from 20 to 5 minutes while simplifying complex statistical analysis.

  2. ChatGPT Reddit

    • kaggle.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Armita Razavi (2023). ChatGPT Reddit [Dataset]. https://www.kaggle.com/datasets/armitaraz/chatgpt-reddit/data
    Explore at:
    zip(5282154 bytes)Available download formats
    Dataset updated
    Jan 29, 2023
    Authors
    Armita Razavi
    License

    https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api

    Description

    Here you can find about 50K comments on Reddit website regarding ChatGPT . The comments are gathered from Reddit's Posts from 4 subreddits.

    The data includes comment_id, comment_parent_id, comment_body and subreddit

    • comment_id : the comment's id
    • comment_parent_id: the comment's id which the current comment is replied to.
    • comment_body: the comment
    • subreddit: the community/subreddit name of the comment

    The Date and other information related to comments will be added in the next version. This dataset is useful to get insight about the public take on ChatGPT and also for text analysis, text visualizations, Inline Question Answering, Text Summarization, NER and other tasks like clustering and so on.

    Please note that this dataset is not cleaned or preprocessed so if you want to get your hands dirty with data, it's a good practice to level up your skills in data cleaning too :)

    And please don't forget to UPVOTE it in case you find it useful and enjoy it.

  3. s

    Data from: ChatGPT in education: A discourse analysis of worries and...

    • socialmediaarchive.org
    csv, json, txt
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). ChatGPT in education: A discourse analysis of worries and concerns on social media [Dataset]. https://socialmediaarchive.org/record/54
    Explore at:
    csv(6528597), json(248465998), txt(4908229)Available download formats
    Dataset updated
    Sep 26, 2023
    Description

    The rapid advancements in generative AI models present new opportunities in the education sector. However, it is imperative to acknowledge and address the potential risks and concerns that may arise with their use. We collected Twitter data to identify key concerns related to the use of ChatGPT in education. This dataset is used to support the study "ChatGPT in education: A discourse analysis of worries and concerns on social media."

    In this study, we particularly explored two research questions. RQ1 (Concerns): What are the key concerns that Twitter users perceive with using ChatGPT in education? RQ2 (Accounts): Which accounts are implicated in the discussion of these concerns? In summary, our study underscores the importance of responsible and ethical use of AI in education and highlights the need for collaboration among stakeholders to regulate AI policy.

  4. ChatGPT Ratings & Reviews

    • kaggle.com
    zip
    Updated Dec 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahzad Aslam (2024). ChatGPT Ratings & Reviews [Dataset]. https://www.kaggle.com/datasets/zeesolver/chatgpt
    Explore at:
    zip(9587639 bytes)Available download formats
    Dataset updated
    Dec 28, 2024
    Authors
    Shahzad Aslam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Description

    The dataset contains a total of 193,154 unique reviews about ChatGPT collected between 2023-07-25 and 2024-08-23, providing valuable insights into user feedback and ratings. The reviews are categorized into three groups, with 4% classified as "Good," 1% as "Nice," and the remaining 95% falling into the "Other" category. Ratings in the dataset range from 1 to 5, with 5 being the maximum and 1 the minimum, reflecting user satisfaction levels. The dataset is structured into 4 columns, likely including the review content, category, date, and rating. This data offers a comprehensive overview of user sentiment towards ChatGPT, making it ideal for analyzing satisfaction trends, identifying areas for improvement, and understanding user experiences with the AI service.

    Columns Explanation

    • Review ID: A unique identifier for each review, ensuring distinct entries in the dataset.
    • Review: The textual content of the feedback provided by the user about ChatGPT.
    • Ratings: A numerical score (1 to 5) reflecting user satisfaction, with 1 being the lowest and 5 the highest.
    • Review Date: The date on which the review was submitted, recorded in the format 2023-07-25 and 2024-08-23

    Collection Methods

    • Data Categorization: Analyze the distribution of reviews across categories (Good, Nice, Other) and calculate percentages to understand user sentiment trends.
    • Rating Analysis: Evaluate the frequency and distribution of ratings (1 to 5) to identify user satisfaction levels.
    • Time-Series Analysis: Examine the Review Date column to track trends and changes in feedback over time.
    • Sentiment Analysis: Perform text-based analysis on the Review column to uncover key themes and sentiments associated with user feedback.
  5. W

    ChatGPT Usage Survey Data

    • webfx.com
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebFX (2025). ChatGPT Usage Survey Data [Dataset]. https://www.webfx.com/blog/ai/chatgpt-usage-statistics/
    Explore at:
    Dataset updated
    Sep 2, 2025
    Dataset authored and provided by
    WebFX
    Variables measured
    Average words in first message, Average words per ChatGPT conversation, Average number of messages per conversation, Percentage of conversations that are commands, Percentage of conversations that start as questions, Percentage of conversations in the "learning & understanding" category, Percentage of conversations using advanced features (persona assignment / data upload)
    Description

    Analysis of 13,252 publicly shared ChatGPT conversations by WebFX to uncover usage statistics - prompt length, message count, question vs command distribution, use-case categories.

  6. d

    How are Chat GPT and AI used in medical diagnosis

    • dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maher Asaad Baker (2023). How are Chat GPT and AI used in medical diagnosis [Dataset]. http://doi.org/10.7910/DVN/2HMJ58
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Maher Asaad Baker
    Description

    The potential of using Chat GPT and AI to revolutionize the way we interact with computers, specifically in the field of medical diagnostics. Chat GPT can make conversations between doctors and patients more natural, while AI can analyze vast amounts of patient data to identify trends and estimate a patient’s health. Patients can use Chat GPT to better understand their medical conditions, and both Chat GPT and AI can be used to automate tasks such as scheduling appointments and processing test results. However, there are limitations to using AI, including data bias, complex results, and analysis errors. To reduce errors, it is important to validate findings using various techniques and ensure that data is accurate and up-to-date. Chat GPT also employs security measures to protect patient data privacy and confidentiality.

  7. Data from: Academic Discourse on ChatGPT in Social Sciences: A Topic...

    • figshare.com
    zip
    Updated Jul 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qian Shen (2025). Academic Discourse on ChatGPT in Social Sciences: A Topic Modeling and Sentiment Analysis of Research Article Abstracts [Dataset]. http://doi.org/10.6084/m9.figshare.29625773.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 23, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Qian Shen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the dataset and code used in the study titled “Academic Discourse on ChatGPT in Social Sciences: A Topic Modeling and Sentiment Analysis of Research Article Abstracts.” The study explores how social science scholars frame and evaluate ChatGPT by analyzing 1,227 SSCI-indexed abstracts using Latent Dirichlet Allocation (LDA) topic modeling and lexicon-based sentiment analysis. The data include the collected abstracts (with metadata), while the code files provide the full analytical pipeline in Python and R, covering preprocessing, topic modeling, sentiment scoring using the NRC Emotion Lexicon, and visualization scripts. This repository supports transparency, reproducibility, and reuse of the study’s computational methods and underlying materials.

  8. 4

    Data associated with the article: "Exploring the Viability of ChatGPT for...

    • data.4tu.nl
    zip
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nina van Staalduine, Data associated with the article: "Exploring the Viability of ChatGPT for Personal Data Anonymization in Government: A Comprehensive Analysis of Possibilities, Risks, and Ethical Implications" [Dataset]. http://doi.org/10.4121/a1dfacbe-b463-404f-a3d7-dab8485e6458.v1
    Explore at:
    zipAvailable download formats
    Dataset provided by
    4TU.ResearchData
    Authors
    Nina van Staalduine
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Feb 2023 - Jul 2023
    Dataset funded by
    Justitiële Informatiedienst
    Description

    Artificial Intelligence (AI) applications are expected to promote government service delivery and quality, more efficient handling of cases, and bias reduction in decision-making. One potential benefit of the AI tool ChatGPT is that it may support governments in the anonymization of data. However, it is not clear whether ChatGPT is appropriate to support data anonymization for public organizations. Hence, this study examines the possibilities, risks, and ethical implications for government organizations to employ ChatGPT in the anonymization of personal data. We use a case study approach, combining informal conversations, formal interviews, a literature review, document analysis and experiments to conduct a three-step study. First, we describe the technology behind ChatGPT and its operation. Second, experiments with three types of data (fake data, original literature and modified literature) show that ChatGPT exhibits strong performance in anonymizing these three types of texts. Third, an overview of significant risks and ethical issues related to ChatGPT and its use for anonymization within a specific government organization was generated, including themes such as privacy, responsibility, transparency, bias, human intervention, and sustainability. One significant risk in the current form of ChatGPT is a privacy risk, as inputs are stored and forwarded to OpenAI and potentially other parties. This is unacceptable if texts containing personal data are anonymized with ChatGPT. We discuss several potential solutions to address these risks and ethical issues. This study contributes to the scarce scientific literature on the potential value of employing ChatGPT for personal data anonymization in government. In addition, this study has practical value for civil servants who face the challenges of data anonymization in practice including resource-intensive and costly processes.

  9. ChatGPT User Reviews

    • kaggle.com
    zip
    Updated Jun 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavik Jikadara (2024). ChatGPT User Reviews [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/chatgpt-user-feedback
    Explore at:
    zip(5709734 bytes)Available download formats
    Dataset updated
    Jun 30, 2024
    Authors
    Bhavik Jikadara
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

    This dataset consists of daily-updated user reviews and ratings for the ChatGPT Android App. The dataset includes several key attributes that capture various aspects of the reviews, providing insights into user experiences and feedback over time.

    Columns Explanation

    • userName: The display name of the user who posted the review.
    • content: The text content of the review. This column contains the actual review text written by the user. It includes user opinions, feedback, and detailed descriptions of their experiences with the ChatGPT app.
    • score: The rating given by the user, typically ranging from 1 to 5. This column captures the numerical rating provided by the user. Higher scores indicate better experiences, while lower scores indicate dissatisfaction.
    • thumbsUpCount: The number of thumbs up (likes) the review received. This column shows how many other users found the review helpful or agreed with the sentiments expressed. It serves as a measure of the review's relevancy and impact.
    • at: The timestamp of when the review was posted. This column includes the date and time when the review was submitted. It is crucial for tracking the temporal distribution of reviews and analyzing trends over time.

    Collection Methods

    • Data Source: The data is collected from user reviews submitted through the ChatGPT Android App's review section on the Google Play Store.
    • Frequency: The dataset is updated daily to capture the most recent user feedback and ratings.
    • Automation: An automated script is used to scrape and compile the reviews, ensuring that the dataset is current and comprehensive.
    • Data Cleaning: Basic preprocessing is performed to ensure data quality, such as removing duplicates and handling missing values.
  10. d

    Data and code on the Moral Machine experiment on large language models...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kazuhiro Takemoto (2025). Data and code on the Moral Machine experiment on large language models (LLMs) [Dataset]. http://doi.org/10.5061/dryad.d7wm37q6v
    Explore at:
    Dataset updated
    Jul 29, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Kazuhiro Takemoto
    Time period covered
    Sep 21, 2023
    Description

    As large language models (LLMs) have become more deeply integrated into various sectors, understanding how they make moral judgements has become crucial, particularly in the realm of autonomous driving. This study used the moral machine framework to investigate the ethical decision-making tendencies of prominent LLMs, including GPT-3.5, GPT-4, PaLM 2 and Llama 2, to compare their responses with human preferences. While LLMs' and humans' preferences such as prioritizing humans over pets and favouring saving more lives are broadly aligned, PaLM 2 and Llama 2, especially, evidence distinct deviations. Additionally, despite the qualitative similarities between the LLM and human preferences, there are significant quantitative disparities, suggesting that LLMs might lean toward more uncompromising decisions, compared with the milder inclinations of humans. These insights elucidate the ethical frameworks of LLMs and their potential implications for autonomous driving., Using the MM methodology detailed in the supplementary information of https://www.nature.com/articles/s41586-018-0637-6, we implemented code for generating Moral Machine scenarios. After generating the MM scenarios, responses from GPT-3.5, GPT-4, PaLM 2, and Llama 2 were collected using the application programming interface (API) and relevant code. We applied the conjoint analysis framework to evaluate the relative importance of the nine preferences., , # Data and Code on the Moral Machine Experiment on Large Language Models

    https://doi.org/10.5061/dryad.d7wm37q6v

    Requirements

    • Python 3.9
    pip install -r requirements.txt
    

    NOTE: The script run_chatgpt.py requires an OpenAI API key. Please obtain your API key by following OpenAI's instructions. To run the script run_palm2.py, setup is required. Please refer to the Google Cloud instructions. Specifically, follow these sections in the given order: 1) Set up a project and a development environment and 2) Install the Vertex AI SDK for Python. Before running run_llama2.py, the Llama2 model files must be downloaded. Please follow [the instructi...

  11. m

    Composing alt text using large language models: dataset in Russian

    • data.mendeley.com
    Updated Jun 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yekaterina Kosova (2024). Composing alt text using large language models: dataset in Russian [Dataset]. http://doi.org/10.17632/73dptbyxbb.1
    Explore at:
    Dataset updated
    Jun 17, 2024
    Authors
    Yekaterina Kosova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the results of developing alternative text for images using chatbots based on large language models. The study was carried out in April-June 2024. Microsoft Copilot, Google Gemini, and YandexGPT chatbots were used to generate 108 text descriptions for 12 images. Descriptions were generated by chatbots using keywords specified by a person. The experts then rated the resulting descriptions on a Likert scale (from 1 to 5). The data set is presented in a Microsoft Excel table on the “Data” sheet with the following fields: record number; image number; chatbot; image type (photo, logo); request date; list of keywords; number of keywords; length of keywords; time of compilation of keywords; generated descriptions; required length of descriptions; actual length of descriptions; description generation time; usefulness; reliability; completeness; accuracy; literacy. The “Images” sheet contains links to the original images. Data set is presented in Russian.

  12. H

    ChatGPT examples in the hydrological sciences

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Irvine (2023). ChatGPT examples in the hydrological sciences [Dataset]. http://doi.org/10.4211/hs.fc0552275ea14c7082218c42ebd63da6
    Explore at:
    zip(1.3 MB)Available download formats
    Dataset updated
    Oct 9, 2023
    Dataset provided by
    HydroShare
    Authors
    Dylan Irvine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    WGS 84 EPSG:4326,
    Description

    ChatGPT has forever changed the way that many industries operate. Much of the focus of Artificial Intelligence (AI) has been on their ability to generate text. However, it is likely that their ability to generate computer codes and scripts will also have a major impact. We demonstrate the use of ChatGPT to generate Python scripts to perform hydrological analyses and highlight the opportunities, limitations and risks that AI poses in the hydrological sciences.

    Here, we provide four worked examples of the use of ChatGPT to generate scripts to conduct hydrological analyses. We also provide a full list of the libraries available to the ChatGPT Advanced Data Analysis plugin (only available in the paid version). These files relate to a manuscript that is to be submitted to Hydrological Processes. The authors of the manuscript are Dylan J. Irvine, Landon J.S. Halloran and Philip Brunner.

    If you find these examples useful and/or use them, we would appreciate if you could cite the associated publication in Hydrological Processes. Details to be made available upon final publication.

  13. n

    A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott McGrath (2024). A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions - Full study data [Dataset]. http://doi.org/10.5061/dryad.s4mw6m9cv
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    University of California, Berkeley
    Authors
    Scott McGrath
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: Our objective is to evaluate the efficacy of ChatGPT 4 in accurately and effectively delivering genetic information, building on previous findings with ChatGPT 3.5. We focus on assessing the utility, limitations, and ethical implications of using ChatGPT in medical settings. Materials and Methods: A structured questionnaire, including the Brief User Survey (BUS-15) and custom questions, was developed to assess ChatGPT 4's clinical value. An expert panel of genetic counselors and clinical geneticists independently evaluated ChatGPT 4's responses to these questions. We also involved comparative analysis with ChatGPT 3.5, utilizing descriptive statistics and using R for data analysis. Results: ChatGPT 4 demonstrated improvements over 3.5 in context recognition, relevance, and informativeness. However, performance variability and concerns about the naturalness of the output were noted. No significant difference in accuracy was found between ChatGPT 3.5 and 4.0. Notably, the efficacy of ChatGPT 4 varied significantly across different genetic conditions, with specific differences identified between responses related to BRCA1 and HFE. Discussion and Conclusion: This study highlights ChatGPT 4's potential in genomics, noting significant advancements over its predecessor. Despite these improvements, challenges remain, including the risk of outdated information and the necessity of ongoing refinement. The variability in performance across different genetic conditions underscores the need for expert oversight and continuous AI training. ChatGPT 4, while showing promise, emphasizes the importance of balancing technological innovation with ethical responsibility in healthcare information delivery. Methods Study Design This study was conducted to evaluate the performance of ChatGPT 4 (March 23rd, 2023) Model) in the context of genetic counseling and education. The evaluation involved a structured questionnaire, which included questions selected from the Brief User Survey (BUS-15) and additional custom questions designed to assess the clinical value of ChatGPT 4's responses. Questionnaire Development The questionnaire was built on Qualtrics, which comprised twelve questions: seven selected from the BUS-15 preceded by two additional questions that we designed. The initial questions focused on quality and answer relevancy: 1. The overall quality of the Chatbot’s response is: (5-point Likert: Very poor to Very Good) 2. The Chatbot delivered an answer that provided the relevant information you would include if asked the question. (5-point Likert: Strongly disagree to Strongly agree) The BUS-15 questions (7-point Likert: Strongly disagree to Strongly agree) focused on: 1. Recognition and facilitation of users’ goal and intent: Chatbot seems able to recognize the user’s intent and guide the user to its goals. 2. Relevance of information: The chatbot provides relevant and appropriate information/answer to people at each stage to make them closer to their goal. 3. Maxim of quantity: The chatbot responds in an informative way without adding too much information. 4. Resilience to failure: Chatbot seems able to find ways to respond appropriately even when it encounters situations or arguments it is not equipped to handle. 5. Understandability and politeness: The chatbot seems able to understand input and convey correct statements and answers without ambiguity and with acceptable manners. 6. Perceived conversational credibility: The chatbot responds in a credible and informative way without adding too much information. 7. Meet the neurodiverse needs: Chatbot seems able to meet needs and be used by users independently form their health conditions, well-being, age, etc. Expert Panel and Data Collection A panel of experts (two genetic counselors and two clinical geneticists) was provided with a link to the survey containing the questions. They independently evaluated the responses from ChatGPT 4 without discussing the questions or answers among themselves until after the survey submission. This approach ensured unbiased evaluation.

  14. T

    Text Analytics Market Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Jun 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Text Analytics Market Report [Dataset]. https://www.marketreportanalytics.com/reports/text-analytics-market-89598
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The text analytics market is experiencing robust growth, projected to reach $10.49 billion in 2025 and exhibiting a remarkable Compound Annual Growth Rate (CAGR) of 39.90% from 2019 to 2033. This expansion is fueled by several key drivers. The increasing volume of unstructured data generated across various industries, including healthcare, finance, and customer service, necessitates sophisticated tools for extracting actionable insights. Furthermore, advancements in natural language processing (NLP), machine learning (ML), and artificial intelligence (AI) are empowering text analytics solutions with enhanced capabilities, such as sentiment analysis, topic modeling, and entity recognition. The rising adoption of cloud-based solutions also contributes to market growth, offering scalability, cost-effectiveness, and ease of access. Major industry players like IBM, Microsoft, and SAP are actively investing in research and development, driving innovation and expanding the market's capabilities. Competitive pressures are fostering a continuous improvement in the accuracy and efficiency of text analytics tools, making them increasingly attractive to businesses of all sizes. The growing demand for real-time insights and improved customer experience further propels market expansion. While the market enjoys significant growth momentum, certain challenges persist. Data security and privacy concerns remain paramount, necessitating robust security measures within text analytics platforms. The complexity of implementing and integrating these solutions into existing IT infrastructures can also pose a barrier to adoption, particularly for smaller businesses lacking dedicated data science teams. Furthermore, the accuracy and reliability of text analytics outputs can be affected by the quality and consistency of the input data. Overcoming these challenges through improved data governance, user-friendly interfaces, and robust customer support will be crucial for continued market expansion. Despite these restraints, the overall market outlook remains positive, driven by the continuous evolution of technology and the growing reliance on data-driven decision-making across diverse sectors. Recent developments include: January 2023- Microsoft announced a new multibillion-dollar investment in ChatGPT maker Open AI. ChatGPT, automatically generates text based on written prompts in a more creative and advanced than the chatbots. Through this investment, the company will accelerate breakthroughs in AI, and both companies will commercialize advanced technologies., November 2022 - Tntra and Invenio have partnered to develop a platform that offers comprehensive data analysis on a firm. Throughout the process, Tntra offered complete engineering support and cooperation to Invenio. Tantra offers feeds, knowledge graphs, intelligent text extraction, and analytics, which enables Invenio to give information on seven parts of the business, such as false news identification, subject categorization, dynamic data extraction, article summaries, sentiment analysis, and keyword extraction.. Key drivers for this market are: Growing Demand for Social Media Analytics, Rising Practice of Predictive Analytics. Potential restraints include: Growing Demand for Social Media Analytics, Rising Practice of Predictive Analytics. Notable trends are: Retail and E-commerce to Hold a Significant Share in Text Analytics Market.

  15. J

    Data associated with the publication: Does chatting with chatbots improve...

    • archive.data.jhu.edu
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Feifei Wang; Amanda J. Neitzel; Ching Sing Chai (2024). Data associated with the publication: Does chatting with chatbots improve language learning performance? A meta-analysis of chatbot-assisted language learning [Dataset]. http://doi.org/10.7281/T1/XOL4BR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 31, 2024
    Dataset provided by
    Johns Hopkins Research Data Repository
    Authors
    Feifei Wang; Amanda J. Neitzel; Ching Sing Chai
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Given the importance of conversation practice in language learning, chatbots, especially ChatGPT, have attracted considerable attention for their ability to converse with learners using natural language. This review contributes to the literature by examining the currently unclear overall effect of using chatbots on language learning performance and comprehensively identifying important study characteristics that affect the overall effectiveness. We meta-analyzed 70 effect sizes from 28 studies, using robust variance estimation. The effects were assessed based on 18 study characteristics about learners, chatbots, learning objectives, context, communication/interaction, and methodological and pedagogical designs. Results indicated that using chatbots produced a positive overall effect on language learning performance (g = 0.486), compared to non-chatbot conditions. Moreover, four characteristics (i.e., educational level, language level, interface design, and interaction capability) affected the overall effectiveness. In an in-depth discussion on how the 18 characteristics are related to the effectiveness, future implications for practice and research are presented.

  16. f

    S1 Data -

    • plos.figshare.com
    xlsx
    Updated Nov 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun Qiu; Youlian Zhou (2024). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0311937.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Jun Qiu; Youlian Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundChatGPT, developed by OpenAI, is an artificial intelligence software designed to generate text-based responses. The objective of this study is to evaluate the accuracy and consistency of ChatGPT’s responses to single-choice questions pertaining to carbon monoxide poisoning. This evaluation will contribute to our understanding of the reliability of ChatGPT-generated information in the medical field.MethodsThe questions utilized in this study were selected from the "Medical Exam Assistant (Yi Kao Bang)" application and encompassed a range of topics related to carbon monoxide poisoning. A total of 44 single-choice questions were included in the study following a screening process. Each question was entered into ChatGPT ten times in Chinese, followed by a translation into English, where it was also entered ten times. The responses generated by ChatGPT were subjected to statistical analysis with the objective of assessing their accuracy and consistency in both languages. In this assessment process, the "Medical Exam Assistant (Yi Kao Bang)" reference responses were employed as benchmarks. The data analysis was conducted using the Python.ResultsIn approximately 50% of the cases, the responses generated by ChatGPT exhibited a high degree of consistency, whereas in approximately one-third of the cases, the responses exhibited unacceptable blurring of the answers. Meanwhile, the accuracy of these responses was less favorable, with an accuracy rate of 61.1% in Chinese and 57% in English. This indicates that ChatGPT could be enhanced with respect to both consistency and accuracy in responding to queries pertaining to carbon monoxide poisoning.ConclusionsIt is currently evident that the consistency and accuracy of responses generated by ChatGPT regarding carbon monoxide poisoning is inadequate. Although it offers significant insights, it should not supersede the role of healthcare professionals in making clinical decisions.

  17. f

    Data Sheet 1_On the emergent capabilities of ChatGPT 4 to estimate...

    • frontiersin.figshare.com
    zip
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Piastra; Patrizia Catellani (2025). Data Sheet 1_On the emergent capabilities of ChatGPT 4 to estimate personality traits.zip [Dataset]. http://doi.org/10.3389/frai.2025.1484260.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Frontiers
    Authors
    Marco Piastra; Patrizia Catellani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study investigates the potential of ChatGPT 4 in the assessment of personality traits based on written texts. Using two publicly available datasets containing both written texts and self-assessments of the authors’ psychological traits based on the Big Five model, we aimed to evaluate the predictive performance of ChatGPT 4. For each sample text, we asked for numerical predictions on an eleven-point scale and compared them with the self-assessments. We also asked for ChatGPT 4 confidence scores on an eleven-point scale for each prediction. To keep the study within a manageable scope, a zero-prompt modality was chosen, although more sophisticated prompting strategies could potentially improve performance. The results show that ChatGPT 4 has moderate but significant abilities to automatically infer personality traits from written text. However, it also shows limitations in recognizing whether the input text is appropriate or representative enough to make accurate inferences, which could hinder practical applications. Furthermore, the results suggest that improved benchmarking methods could increase the efficiency and reliability of the evaluation process. These results pave the way for a more comprehensive evaluation of the capabilities of Large Language Models in assessing personality traits from written texts.

  18. 500k ChatGPT-related Tweets Jan-Mar 2023

    • kaggle.com
    zip
    Updated Apr 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khalid Ansari (2023). 500k ChatGPT-related Tweets Jan-Mar 2023 [Dataset]. https://www.kaggle.com/datasets/khalidryder777/500k-chatgpt-tweets-jan-mar-2023/code
    Explore at:
    zip(49816658 bytes)Available download formats
    Dataset updated
    Apr 11, 2023
    Authors
    Khalid Ansari
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a CSV file related to ChatGPT including keywords(chatgpt, chat gpt) #hashtags and @mentions about ChatGPT. OpenAI's conversational AI model. The file includes information on 500,000 tweets. The dataset aims to help understand public opinion, trends, and potential applications of ChatGPT by analyzing tweet volume, sentiment, user engagement, and the influence of key AI events. The dataset offers valuable insights for companies, researchers, and policymakers, allowing them to make informed decisions and shape the future of AI-powered conversational technologies.

    Check out my Comprehensive Analysis on this dataset: Medium article "Cracking the ChatGPT Code: A Deep Dive into 500,000 Tweets using Advanced NLP Techniques"

    Learn about the collection process in Medium article "Effortlessly Scraping Massive Twitter Data"

  19. Z

    Toy Qualitative Data Project (Interview Transcripts)

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Dec 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Curty, Renata Gonçalves (2024). Toy Qualitative Data Project (Interview Transcripts) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14043000
    Explore at:
    Dataset updated
    Dec 17, 2024
    Dataset provided by
    University of California, Santa Barbara
    Authors
    Curty, Renata Gonçalves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please be advised that this project is intended solely for instructional purposes and should not be used for actual research. This dataset is intended to complement the instructional material and provide a hands-on learning experience for the workshop: Handling and Sharing Qualitative Data Responsibly and Effectively.

    This hypothetical research project is designed to demonstrate key concepts related to human subject qualitative data management and thematic analysis coding. It includes interview transcripts generated with ChatGPT 4.0 Mini for a fictional graduate student in Communication named Sarah, whose main research question is: How do content creators/digital influencers view their role in shaping their followers' consumer behavior, and what ethical dilemmas do they face when promoting products?

    Given the novelty of this research topic and the limited academic literature available, Sarah hopes that the insights gained from this small-scale qualitative exploratory study will help identify key variables for a larger survey study with a representative sample of content creators/digital influencers across the U.S.

    Sarah has previous experience with quantitative methods but is very new to qualitative research and could use our help for better handling the data. Having already conducted six short structured interviews with subjects from top revenue niches (i.e., Home Decor and DYI, Travel & Adventure, Fashion & Style, Health & Wellness, Finance & Investment, Beauty & Skincare) and planning to conduct a dozen more, Sarah is eager to begin engaging with the data she has collected so far and deciding how to best organize and interpret it. We’ll be walking her through this process, providing the necessary guidance and support for effective and responsible data management.

    Interviews were conducted over Zoom and audio recorded with participants' consent. The interview included four main questions, which were consistent across all interviews:

    Q1. Please tell me a little about your work as a content creator/digital influencer how it started, and how you have established yourself in your current niche.

    Q2. In what ways do you believe content creators/digital influencers shape consumer behavior? Could you share any examples?

    Q3. What strategies would you say content creators/digital influencers typically use to increase sales of sponsored products and services? Which ones have you used? What worked and what did not work for you? Why?

    Q4. In your view, what are the essential ethical responsibilities that content creators and digital influencers should uphold? Can you share any personal experiences that illustrate these responsibilities in action?

    Each interview generated approximately 15 minutes of audio recording, which Sarah manually transcribed. Sarah decided to keep the transcription true to the recordings and seek assistance to mitigate any risk of identification.

  20. Shopping Mall Customer Data Segmentation Analysis

    • kaggle.com
    zip
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataZng (2024). Shopping Mall Customer Data Segmentation Analysis [Dataset]. https://www.kaggle.com/datasets/datazng/shopping-mall-customer-data-segmentation-analysis
    Explore at:
    zip(5890828 bytes)Available download formats
    Dataset updated
    Aug 4, 2024
    Authors
    DataZng
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Demographic Analysis of Shopping Behavior: Insights and Recommendations

    Dataset Information: The Shopping Mall Customer Segmentation Dataset comprises 15,079 unique entries, featuring Customer ID, age, gender, annual income, and spending score. This dataset assists in understanding customer behavior for strategic marketing planning.

    Cleaned Data Details: Data cleaned and standardized, 15,079 unique entries with attributes including - Customer ID, age, gender, annual income, and spending score. Can be used by marketing analysts to produce a better strategy for mall specific marketing.

    Challenges Faced: 1. Data Cleaning: Overcoming inconsistencies and missing values required meticulous attention. 2. Statistical Analysis: Interpreting demographic data accurately demanded collaborative effort. 3. Visualization: Crafting informative visuals to convey insights effectively posed design challenges.

    Research Topics: 1. Consumer Behavior Analysis: Exploring psychological factors driving purchasing decisions. 2. Market Segmentation Strategies: Investigating effective targeting based on demographic characteristics.

    Suggestions for Project Expansion: 1. Incorporate External Data: Integrate social media analytics or geographic data to enrich customer insights. 2. Advanced Analytics Techniques: Explore advanced statistical methods and machine learning algorithms for deeper analysis. 3. Real-Time Monitoring: Develop tools for agile decision-making through continuous customer behavior tracking. This summary outlines the demographic analysis of shopping behavior, highlighting key insights, dataset characteristics, team contributions, challenges, research topics, and suggestions for project expansion. Leveraging these insights can enhance marketing strategies and drive business growth in the retail sector.

    References OpenAI. (2022). ChatGPT [Computer software]. Retrieved from https://openai.com/chatgpt. Mustafa, Z. (2022). Shopping Mall Customer Segmentation Data [Data set]. Kaggle. Retrieved from https://www.kaggle.com/datasets/zubairmustafa/shopping-mall-customer-segmentation-data Donkeys. (n.d.). Kaggle Python API [Jupyter Notebook]. Kaggle. Retrieved from https://www.kaggle.com/code/donkeys/kaggle-python-api/notebook Pandas-Datareader. (n.d.). Retrieved from https://pypi.org/project/pandas-datareader/

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tomasz Tunguz (2023). Producing Charts with AI - Data Analysis [Dataset]. https://tomtunguz.com/data-analysis-gpt/

Producing Charts with AI - Data Analysis

Explore at:
Dataset updated
Jul 17, 2023
Dataset provided by
Theory Ventures
Authors
Tomasz Tunguz
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Discover how AI code interpreters are revolutionizing data visualization, reducing chart creation time from 20 to 5 minutes while simplifying complex statistical analysis.

Search
Clear search
Close search
Google apps
Main menu