40 datasets found
  1. e

    ChatGPT Usage by Age Group – Survey Data

    • expresslegalfunding.com
    html
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Express Legal Funding (2025). ChatGPT Usage by Age Group – Survey Data [Dataset]. https://expresslegalfunding.com/chatgpt-study/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Express Legal Funding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    60+, 18–29, 30–44, 45–60
    Description

    This dataset presents ChatGPT usage patterns across different age groups, showing the percentage of users who have followed its advice, used it without following advice, or have never used it, based on a 2025 U.S. survey.

  2. e

    Characteristics of ChatGPT users from Germany: implications for the digital...

    • b2find.eudat.eu
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Characteristics of ChatGPT users from Germany: implications for the digital divide from web tracking data - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/bcfed45e-617e-5a91-b174-6d676aa137e5
    Explore at:
    Dataset updated
    Jul 23, 2024
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Germany
    Description

    A major challenge of our time is reducing disparities in access to and effective use of digital technologies, with recent discussions highlighting the role of AI in exacerbating the digital divide. We examine user characteristics that predict usage of the AI-powered conversational agent ChatGPT. We combine behavioral and survey data in a web tracked sample of N=1376 German citizens to investigate differences in ChatGPT activity (usage, visits, and adoption) during the first 11 months from the launch of the service (November 30, 2022). Guided by a model of technology acceptance (UTAUT-2), we examine the role of socio-demographics commonly associated with the digital divide in ChatGPT activity and explore further socio-political attributes identified via stability selection in Lasso regressions. We confirm that lower age and higher education affect ChatGPT usage, but neither gender nor income do. We find full-time employment and more children to be barriers to ChatGPT activity. Using a variety of social media was positively associated with ChatGPT activity. In terms of political variables, political knowledge and political self-efficacy as well as some political behaviors such as voting, debating political issues online and offline and political action online were all associated with ChatGPT activity, with online political debating and political self-efficacy negatively so. Finally, need for cognition and communication skills such as writing, attending meetings, or giving presentations, were also associated with ChatGPT engagement, though chairing/organizing meetings was negatively associated. Our research informs efforts to address digital disparities and promote digital literacy among underserved populations by presenting implications, recommendations, and discussions on ethical and social issues of our findings.

  3. e

    Types of ChatGPT Advice Used – Survey Data

    • expresslegalfunding.com
    html
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Express Legal Funding (2025). Types of ChatGPT Advice Used – Survey Data [Dataset]. https://expresslegalfunding.com/chatgpt-study/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Express Legal Funding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Legal Advice, Career Advice, Educational Help, Financial Advice, Medical Information, Relationship Advice, Mental Health Topics, News / Current Events, Product Recommendations
    Description

    This dataset shows the types of advice users sought from ChatGPT based on a 2025 U.S. survey, including education, financial, medical, and legal topics.

  4. 🤖 ChatGPT App Google Store Reviews

    • kaggle.com
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2023). 🤖 ChatGPT App Google Store Reviews [Dataset]. http://doi.org/10.34740/kaggle/ds/4017553
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fd7e02bf38f4b08df2508d6b6e42f3066%2Fchatgpt2.png?generation=1700233710310045&alt=media" alt="">

    Based on their wikipedia page

    ChatGPT (Chat Generative Pre-trained Transformer) is a large language model-based chatbot developed by OpenAI and launched on November 30, 2022, that enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.

    These reviews were extracted from Google Store App

    Usage

    This dataset should paint a good picture on what is the public's perception of the app over the years. Using this dataset, we can do the following

    1. Extract sentiments and trends
    2. Identify which version of the app had the most positive feedback, the worst.
    3. Use topic modeling to identify the pain points of the application.

    (AND MANY MORE!)

    Note

    Images generated using Bing Image Generator

  5. Datasets .csv

    • figshare.com
    txt
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaser Alhasawi (2024). Datasets .csv [Dataset]. http://doi.org/10.6084/m9.figshare.25053146.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    figshare
    Authors
    Yaser Alhasawi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset for this research project was meticulously constructed to investigate the adoption of ChatGPT among students in the United States. The primary objective was to gain insights into the technological barriers and resistances faced by students in integrating ChatGPT into their information systems. The dataset was designed to capture the diverse adoption patterns among students in various public and private schools and universities across the United States. By examining adoption rates, frequency of usage, and the contexts in which ChatGPT is employed, the research sought to provide a comprehensive understanding of how students are incorporating this technology into their information systems. Moreover, by including participants from diverse educational institutions, the research sought to ensure a comprehensive representation of the student population in the United States. This approach aimed to provide nuanced insights into how factors such as educational background, institution type, and technological familiarity influence ChatGPT adoption.

  6. h

    ChatGPT-Gemini-Claude-Perplexity-Human-Evaluation-Multi-Aspects-Review-Dataset...

    • huggingface.co
    Updated Nov 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ChatGPT-Gemini-Claude-Perplexity-Human-Evaluation-Multi-Aspects-Review-Dataset [Dataset]. https://huggingface.co/datasets/DeepNLP/ChatGPT-Gemini-Claude-Perplexity-Human-Evaluation-Multi-Aspects-Review-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 12, 2024
    Authors
    DeepNLP
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ChatGPT Gemini Claude Perplexity Human Evaluation Multi Aspect Review Dataset

      Introduction
    

    Human evaluation and reviews with scalar score of AI Services responses are very usefuly in LLM Finetuning, Human Preference Alignment, Few-Shot Learning, Bad Case Shooting, etc, but extremely difficult to collect. This dataset is collected from DeepNLP AI Service User Review panel (http://www.deepnlp.org/store), which is an open review website for users to give reviews and upload… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/ChatGPT-Gemini-Claude-Perplexity-Human-Evaluation-Multi-Aspects-Review-Dataset.

  7. e

    ChatGPT Usage by U.S. Census Region – Survey Data

    • expresslegalfunding.com
    html
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Express Legal Funding (2025). ChatGPT Usage by U.S. Census Region – Survey Data [Dataset]. https://expresslegalfunding.com/chatgpt-study/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Express Legal Funding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Pacific, Mountain, New England, South Atlantic, Middle Atlantic, East North Central, East South Central, West North Central, West South Central
    Description

    This dataset presents ChatGPT usage patterns across U.S. Census regions, based on a 2025 nationwide survey. It tracks how often users followed, partially used, or never used ChatGPT by state region.

  8. #ChatGPT 1000 Daily 🐦 Tweets

    • kaggle.com
    Updated May 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enric Domingo (2023). #ChatGPT 1000 Daily 🐦 Tweets [Dataset]. http://doi.org/10.34740/kaggle/dsv/5685262
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2023
    Dataset provided by
    Kaggle
    Authors
    Enric Domingo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    UPDATE: Due to new Twitter API conditions changed by Elon Musk, now it's no longer free to use the Twitter (X) API and the pricing is 100 $/month in the hobby plan. So my automated ETL notebook stopped from updating new tweets to this dataset on May 13th 2023.

    This dataset is was updated everyday with the addition of 1000 tweets/day containing any of the words "ChatGPT", "GPT3", or "GPT4", starting from the 3rd of April 2023. Everyday's tweets are uploaded 24-72h later, so the counter on tweets' likes, retweets, messages and impressions gets enough time to be relevant. Tweets are from any language selected randomly from all hours of the day. There are some basic filters applied trying to discard sensitive tweets and spam.

    This dataset can be used for many different applications regarding to Data Analysis and Visualization but also NLP Sentiment Analysis techniques and more.

    Consider upvoting this Dataset and the ETL scheduled Notebook providing new data everyday into it if you found them interesting, thanks! 🤗

    Columns Description:

    • tweet_id: Integer. unique identifier for each tweet. Older tweets have smaller IDs.

    • tweet_created: Timestamp. Time of the tweet's creation.

    • tweet_extracted: Timestamp. The UTC time when the ETL pipeline pulled the tweet and its metadata (likes count, retweets count, etc).

    • text: String. The raw payload text from the tweet.

    • lang: String. Short name for the Tweet text's language.

    • user_id: Integer. Twitter's unique user id.

    • user_name: String. The author's public name on Twitter.

    • user_username: String. The author's Twitter account username (@example)

    • user_location: String. The author's public location.

    • user_description: String. The author's public profile's bio.

    • user_created: Timestamp. Timestamp of user's Twitter account creation.

    • user_followers_count: Integer. The number of followers of the author's account at the moment of the tweet extraction

    • user_following_count: Integer. The number of followed accounts from the author's account at the moment of the Tweet extraction

    • user_tweet_count: Integer. The number of Tweets that the author has published at the moment of the Tweet extraction.

    • user_verified: Boolean. True if the user is verified (blue mark).

    • source: The device/app used to publish the tweet (Apparently not working, all values are Nan so far).

    • retweet_count: Integer. Number of retweets to the Tweet at the moment of the Tweet extraction.

    • like_count: Integer. Number of Likes to the Tweet at the moment of the Tweet extraction.

    • reply_count: Integer. Number of reply messages to the Tweet.

    • impression_count: Integer. Number of times the Tweet has been seen at the moment of the Tweet extraction.

    More info: Tweets API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet Users API info definition: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user

  9. R

    Monarch Butterfly Detector Dataset

    • universe.roboflow.com
    zip
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Cole (2023). Monarch Butterfly Detector Dataset [Dataset]. https://universe.roboflow.com/scott-cole-a3ty4/monarch-butterfly-detector/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset authored and provided by
    Scott Cole
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Monarch Butterfly Bounding Boxes
    Description

    Monarch Butterfly Detector

    The Monarch Butterfly Detector is an advanced computer vision model that detects and localizes Monarch butterflies in images. With its cutting-edge technology and high accuracy, this model opens up exciting possibilities for biodiversity monitoring, migration studies, citizen science projects, identification guides, and environmental education.

    Key Features

    • Accurate Detection: The Monarch Butterfly Detector utilizes state-of-the-art computer vision algorithms to accurately identify and localize Monarch butterflies within images.

    • Versatile Use Cases: This powerful model has diverse applications, ranging from scientific research and conservation efforts to citizen science projects and environmental education programs.

    • Easy Integration: The Monarch Butterfly Detector can be seamlessly integrated into existing platforms, apps, or websites, making it accessible to many users and enabling them to contribute effortlessly to butterfly monitoring.

    • User-Friendly Interface: We provide a user-friendly interface/API that allows users to easily interact with the model, upload images, and obtain instant results.

    For detailed documentation and tutorials on using Roboflow, refer to the https://docs.roboflow.com/

    Contribution Guidelines

    We welcome contributions from the open-source community to enhance the Monarch Butterfly Detector. If you're interested in contributing, please follow the guidelines outlined in [CONTRIBUTING.md] and submit your pull requests.

    License

    This project is licensed under the [Roboflow License]. For more information, see the [LICENSE] file provided by Roboflow.

    Contact Information

    For any questions, suggestions, or collaborations, please reach out to us at savetheworld at 150left.com

    Congratulations if you have made it this far. 🥳

    🎁🎁🎁I thought I would use another piece of AI to get some more content about this detector I created. Here is the dialog I had with ChatGPT.

    Scott: ChatGPT, can you make my readers drool over my Monarch Butterfly Detector model?

    ChatGPT: Here are 10 captivating outro or call-to-action suggestions to encourage users to try out the Monarch Butterfly Detector and contribute to its success:

    "Join us in preserving the delicate beauty of Monarch butterflies. Try out the Monarch Butterfly Detector today and become a guardian of these magnificent creatures."

    "Unlock the secrets of Monarch butterflies with our cutting-edge detection model. Experience the wonders of computer vision and contribute to vital research efforts by using the Monarch Butterfly Detector."

    "Make a difference in conservation by being a part of the Monarch Butterfly Detector community. Help protect these iconic insects and their habitats with the power of AI and computer vision."

    "Calling all nature enthusiasts and citizen scientists! Contribute to the Monarch Butterfly Detector project and help us gather valuable data on Monarch butterflies. Together, we can make a lasting impact."

    "Explore the hidden world of Monarch butterflies with the Monarch Butterfly Detector. Whether you're a researcher, educator, or simply curious, join us on this journey and unravel the mysteries of these captivating creatures."

    "Become a Monarch detective! Unleash the potential of computer vision and aid in the conservation of Monarch butterflies. Download the Monarch Butterfly Detector and help us protect these extraordinary pollinators."

    "Attention conservationists and researchers! The Monarch Butterfly Detector needs your expertise. Contribute your insights and collaborate with us to further our understanding of Monarch butterflies and support their conservation."

    "Empower yourself with AI and contribute to science. With the Monarch Butterfly Detector, you can be a part of groundbreaking research on Monarch butterflies. Start detecting, start making a difference."

    "Inspire the next generation of environmentalists. Use the Monarch Butterfly Detector as an educational tool and engage students in the wonders of computer vision and the importance of protecting our natural world."

    "Experience the thrill of discovery with the Monarch Butterfly Detector. Join a vibrant community of nature enthusiasts and scientists, and together, let's unlock new insights into the fascinating lives of Monarch butterflies."

    As you can see, AI certainly likes my idea.

  10. h

    chats-data-2023-10-16

    • huggingface.co
    Updated Oct 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Collective Cognition (2023). chats-data-2023-10-16 [Dataset]. https://huggingface.co/datasets/CollectiveCognition/chats-data-2023-10-16
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 16, 2023
    Dataset authored and provided by
    Collective Cognition
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for "Collective Cognition ChatGPT Conversations"

      Dataset Description
    
    
    
    
    
      Dataset Summary
    

    The "Collective Cognition ChatGPT Conversations" dataset is a collection of chat logs between users and the ChatGPT model. These conversations have been shared by users on the "Collective Cognition" website. The dataset provides insights into user interactions with language models and can be utilized for multiple purposes, including training, research, and… See the full description on the dataset page: https://huggingface.co/datasets/CollectiveCognition/chats-data-2023-10-16.

  11. e

    ChatGPT Trust Levels by Advice Category – Survey Data

    • expresslegalfunding.com
    html
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Express Legal Funding (2025). ChatGPT Trust Levels by Advice Category – Survey Data [Dataset]. https://expresslegalfunding.com/chatgpt-study/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Express Legal Funding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Legal Advice, Career Advice, Educational Help, Financial Advice, Medical Information, Relationship Advice, Mental Health Topics, News / Current Events, Product Recommendations
    Description

    This dataset presents how much users trust ChatGPT across different advice categories, including career, education, financial, legal, and medical advice, based on a 2025 U.S. survey.

  12. f

    Data_Sheet_1_Advanced large language models and visualization tools for data...

    • frontiersin.figshare.com
    txt
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    Frontiers
    Authors
    Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.

  13. 4

    Supplementary data for the paper 'Personality and acceptance as predictors...

    • data.4tu.nl
    zip
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joost de Winter; Dimitra Dodou; Yke Bauke Eisma (2024). Supplementary data for the paper 'Personality and acceptance as predictors of ChatGPT use' [Dataset]. http://doi.org/10.4121/e2e3ac25-e264-4592-b413-254eb4ac5022.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Joost de Winter; Dimitra Dodou; Yke Bauke Eisma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Within a year of its launch, ChatGPT has seen a surge in popularity. While many are drawn to its effectiveness and user-friendly interface, ChatGPT also introduces moral concerns, such as the temptation to present generated text as one’s own. This led us to theorize that personality traits such as Machiavellianism and sensation-seeking may be predictive of ChatGPT usage. We launched two online questionnaires with 2,000 respondents each, in September 2023 and March 2024, respectively. In Questionnaire 1, 22% of respondents were students, and 54% were full-time employees; 32% indicated they used ChatGPT at least weekly. Analysis of our ChatGPT Acceptance Scale revealed two factors, Effectiveness and Concerns, which correlated positively and negatively, respectively, with ChatGPT use frequency. A specific aspect of Machiavellianism (manipulation tactics) was found to predict ChatGPT usage. Questionnaire 2 was a replication of Questionnaire 1, with 21% students and 54% full-time employees, of which 43% indicated using ChatGPT weekly. In Questionnaire 2, more extensive personality scales were used. We found a moderate correlation between Machiavellianism and ChatGPT usage (r = .22) and with an opportunistic attitude towards undisclosed use (r = .30), relationships that largely remained intact after controlling for gender, age, education level, and the respondents’ country. We conclude that covert use of ChatGPT is associated with darker personality traits, something that requires further attention.

  14. DORIS-MAE-v1

    • zenodo.org
    bin, json
    Updated Oct 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi (2023). DORIS-MAE-v1 [Dataset]. http://doi.org/10.5281/zenodo.8299749
    Explore at:
    bin, jsonAvailable download formats
    Dataset updated
    Oct 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.

    Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.

    The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.

    The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.

    The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.

    Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).

    The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:

    ├── ada_embedding_for_DORIS-MAE_v1.pickle
    ├── "Query"
    │ ├── query_id_1 (Embedding of query_1)
    │ ├── query_id_2 (Embedding of query_2)
    │ └── query_id_3 (Embedding of query_3)
    │ .
    │ .
    │ .
    └── "Corpus"
    ├── corpus_id_1 (Embedding of abstract_1)
    ├── corpus_id_2 (Embedding of abstract_2)
    └── corpus_id_3 (Embedding of abstract_3)
    .
    .
    .

  15. f

    The Dataset for the book chapter on "Classifying User Intent for Effective...

    • figshare.com
    xlsx
    Updated Dec 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seyedmoein Mohsenimofidi; Akshy Sripad Raghavendra Prasad; Aida Zahid; Usman Rafiq; Xiaofeng Wang; Muhammad Attal Idris (2023). The Dataset for the book chapter on "Classifying User Intent for Effective Prompt Engineering: A Case of a Chatbot for StartupTeams". [Dataset]. http://doi.org/10.6084/m9.figshare.24847920.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Dec 27, 2023
    Dataset provided by
    figshare
    Authors
    Seyedmoein Mohsenimofidi; Akshy Sripad Raghavendra Prasad; Aida Zahid; Usman Rafiq; Xiaofeng Wang; Muhammad Attal Idris
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset has been used to write a book chapter on the topic of "Classifying User Intent for Effective Prompt Engineering: A Case of a Chatbot for Startup Teams". The dataset contains the following five resources:Startup questions and intent classifications- This resource demonstrates a list of possible questions and the classification of those questions into four intents i.e. reflecting on own experience, seeking information, brainstorming, and seeking advicePrompt_Book_v1- The file contains a brief guide on how questions are classified, a description of prompt patterns and templates, and lastly matching purpose-prompt patternQuestions_classification_script- The Python script used in our work to classify user intentSurvey_questionnaire- The original survey questions asked from the participantssurvey_responses- Survey responses from study respondents

  16. e

    Outcome of ChatGPT Advice – Survey Data

    • expresslegalfunding.com
    html
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Express Legal Funding (2025). Outcome of ChatGPT Advice – Survey Data [Dataset]. https://expresslegalfunding.com/chatgpt-study/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Express Legal Funding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Unsure – Not sure yet, Helpful – It led to a good result, Neutral – It made no real difference, Harmful – It caused problems or a bad result
    Description

    This dataset summarizes how ChatGPT users rated the outcomes of the advice they received, including whether it was helpful, harmful, neutral, or uncertain, based on a 2025 U.S. survey.

  17. Healthcare Workforce Mental Health Dataset

    • kaggle.com
    Updated Feb 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rivalytics (2025). Healthcare Workforce Mental Health Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10768196
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2025
    Dataset provided by
    Kaggle
    Authors
    Rivalytics
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    📌**Context**

    The Healthcare Workforce Mental Health Dataset is designed to explore workplace mental health challenges in the healthcare industry, an environment known for high stress and burnout rates.

    This dataset enables users to analyze key trends related to:

    💠 Workplace Stressors: Examining the impact of heavy workloads, poor work environments, and emotional demands.

    💠 Mental Health Outcomes: Understanding how stress and burnout influence job satisfaction, absenteeism, and turnover intention.

    💠 Educational & Analytical Applications: A valuable resource for data analysts, students, and career changers looking to practice skills in data exploration and data visualization.

    To help users gain deeper insights, this dataset is fully compatible with a Power BI Dashboard, available as part of a complete analytics bundle for enhanced visualization and reporting.

    📌**Source**

    This dataset was synthetically generated using the following methods:

    💠 Python & Data Science Techniques: Probabilistic modeling to simulate realistic data distributions. Industry-informed variable relationships based on healthcare workforce studies.

    💠 Guidance & Validation Using AI (ChatGPT): Assisted in refining dataset realism and logical mappings.

    💠 Industry Research & Reports: Based on insights from WHO, CDC, OSHA, and academic studies on workplace stress and mental health in healthcare settings.

    📌**Inspiration**

    This dataset was inspired by ongoing discussions in healthcare regarding burnout, mental health, and staff retention. The goal is to bridge the gap between raw data and actionable insights by providing a structured, analyst-friendly dataset.

    For those who want a ready-to-use reporting solution, a Power BI Dashboard Template is available, designed for interactive data exploration, workforce insights, and stress factor analysis.

    📌**Important Note** This dataset is synthetic and intended for educational purposes only. It is not real-world employee data and should not be used for actual decision-making or policy implementation.

  18. f

    Minimal dataset.

    • figshare.com
    txt
    Updated Mar 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avishek Choudhury; Safa Elkefi; Achraf Tounsi (2024). Minimal dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0296151.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 8, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Avishek Choudhury; Safa Elkefi; Achraf Tounsi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As ChatGPT emerges as a potential ally in healthcare decision-making, it is imperative to investigate how users leverage and perceive it. The repurposing of technology is innovative but brings risks, especially since AI’s effectiveness depends on the data it’s fed. In healthcare, ChatGPT might provide sound advice based on current medical knowledge, which could turn into misinformation if its data sources later include erroneous information. Our study assesses user perceptions of ChatGPT, particularly of those who used ChatGPT for healthcare-related queries. By examining factors such as competence, reliability, transparency, trustworthiness, security, and persuasiveness of ChatGPT, the research aimed to understand how users rely on ChatGPT for health-related decision-making. A web-based survey was distributed to U.S. adults using ChatGPT at least once a month. Bayesian Linear Regression was used to understand how much ChatGPT aids in informed decision-making. This analysis was conducted on subsets of respondents, both those who used ChatGPT for healthcare decisions and those who did not. Qualitative data from open-ended questions were analyzed using content analysis, with thematic coding to extract public opinions on urban environmental policies. Six hundred and seven individuals responded to the survey. Respondents were distributed across 306 US cities of which 20 participants were from rural cities. Of all the respondents, 44 used ChatGPT for health-related queries and decision-making. In the healthcare context, the most effective model highlights ’Competent + Trustworthy + ChatGPT for healthcare queries’, underscoring the critical importance of perceived competence and trustworthiness specifically in the realm of healthcare applications of ChatGPT. On the other hand, the non-healthcare context reveals a broader spectrum of influential factors in its best model, which includes ’Trustworthy + Secure + Benefits outweigh risks + Satisfaction + Willing to take decisions + Intent to use + Persuasive’. In conclusion our study findings suggest a clear demarcation in user expectations and requirements from AI systems based on the context of their use. We advocate for a balanced approach where technological advancement and user readiness are harmonized.

  19. f

    ICEIS 2025 - Support Documentation for Study Execution - "Exploring the Use...

    • figshare.com
    docx
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felipe Sonntag Manzoni; Ana Carolina Oran; Rávella Thuany Carvalho Rodrigues (2025). ICEIS 2025 - Support Documentation for Study Execution - "Exploring the Use of ChatGPT for the Generation of User Story Based Test Cases: An Experimental Study" [Dataset]. http://doi.org/10.6084/m9.figshare.28053713.v2
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 16, 2025
    Dataset provided by
    figshare
    Authors
    Felipe Sonntag Manzoni; Ana Carolina Oran; Rávella Thuany Carvalho Rodrigues
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset and documents used for the study execution provided by the authors. In this raw data, you can find the scripts used by the evaluators and the TAM form submitted by them during the evaluation process.This raw data is part of the developed work published on ICEIS 2025 by the title of: "Exploring the Use of ChatGPT for the Generation of User Story Based Test Cases: An Experimental Study"All the data presented here is licensed under the CC BY 4.0 license and should be followed for any copyright matter.

  20. Dataset: Automatic Generation of Explainability Requirements and Software...

    • zenodo.org
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Obaidi; Martin Obaidi; Jannik Fischbach; Jannik Fischbach; Jakob Droste; Jakob Droste; Hannah Deters; Hannah Deters; Marc Herrmann; Marc Herrmann; Jil Ann-Christin Klünder; Jil Ann-Christin Klünder; Kurt Schneider; Kurt Schneider (2025). Dataset: Automatic Generation of Explainability Requirements and Software Explanations From User Reviews [Dataset]. http://doi.org/10.5281/zenodo.15839753
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martin Obaidi; Martin Obaidi; Jannik Fischbach; Jannik Fischbach; Jakob Droste; Jakob Droste; Hannah Deters; Hannah Deters; Marc Herrmann; Marc Herrmann; Jil Ann-Christin Klünder; Jil Ann-Christin Klünder; Kurt Schneider; Kurt Schneider
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset: Automatic Generation of Explainability Requirements and Software Explanations From User Reviews

    Authors:
    Martin Obaidi, Jannik Fischbach, Jakob Droste, Hannah Deters, Marc Herrmann, Jil Klünder, Steffen Krätzig, Hugo Villamizar, and Kurt Schneider

    This replication package accompanies the study on Automatic Generation of Explainability Requirements and Software Explanations from User Reviews. It includes all datasets and scripts necessary to reproduce the evaluations and analyses presented in the paper. The dataset consists of 58 user reviews with explainability needs, for which explainability requirements and explanations were manually and automatically generated.

    The package is divided into three key components:

    1. Evaluation of Explainability Requirements and Explanations
      • Data from two studies where participants evaluated manually and ChatGPT-generated explainability requirements and explanations.
      • Preference votes and reasoning categories (Clarity, Style, Tone, Correctness, Level of Detail, Relevance) assigned by study participants.
    2. Formulation of Explainability Requirements and Explanations
      • Dataset from a workshop with four requirements engineers, who manually formulated explainability requirements and explanations based on user reviews.
    3. Tool for Automated Explainability Generation
      • Source code of a UI-based tool that automatically generates summaries, explainability requirements, and explanations from user reviews via ChatGPT API calls.
      • The tool allows manual refinement of generated outputs and supports requirements engineers in analyzing user feedback.

    A README file is provided, detailing the folder structure, study methodology, and instructions for reproducing the results. This package ensures transparency and enables further research on LLM-supported explainability requirement engineering.

    This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Grant No.: 470146331, project softXplain (2022–2025).

    Citation

    If you use this dataset, please cite the following publication:

    Obaidi, M., Fischbach, J., Droste, J., Deters, H., Herrmann, M., Klünder, J., Krätzig, S., Villamizar, H., Schneider, K.: Automatic Generation of Explainability Requirements and Software Explanations From User Reviews. 2025 IEEE 33rd International Requirements Engineering Conference Workshops (REW).

    License

    This dataset is provided under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

    Contact

    For questions regarding the dataset, please contact the corresponding author as listed in the publication.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Express Legal Funding (2025). ChatGPT Usage by Age Group – Survey Data [Dataset]. https://expresslegalfunding.com/chatgpt-study/

ChatGPT Usage by Age Group – Survey Data

Explore at:
htmlAvailable download formats
Dataset updated
May 2, 2025
Dataset authored and provided by
Express Legal Funding
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured
60+, 18–29, 30–44, 45–60
Description

This dataset presents ChatGPT usage patterns across different age groups, showing the percentage of users who have followed its advice, used it without following advice, or have never used it, based on a 2025 U.S. survey.

Search
Clear search
Close search
Google apps
Main menu