76 datasets found
  1. f

    Data_Sheet_1_Advanced large language models and visualization tools for data...

    • frontiersin.figshare.com
    txt
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 8, 2024
    Dataset provided by
    Frontiers
    Authors
    Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.

  2. c

    Research data supporting "Using ChatGPT for Thematic Analysis Working Paper:...

    • repository.cam.ac.uk
    csv, pdf
    Updated May 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turobov, Aleksei; Coyle, Diane; Harding, Verity (2024). Research data supporting "Using ChatGPT for Thematic Analysis Working Paper: UN policy documents 2017-2024." [Dataset]. http://doi.org/10.17863/CAM.108401
    Explore at:
    pdf(64191 bytes), csv(254606 bytes), csv(17873 bytes)Available download formats
    Dataset updated
    May 17, 2024
    Dataset provided by
    University of Cambridge
    Apollo
    Authors
    Turobov, Aleksei; Coyle, Diane; Harding, Verity
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Nations
    Description

    Data contain 63 UN policy documents and press releases from the Digital Library covering the years 2017 to March 2024 and over 700 distinct codes for Thematic Analysis generated by the custom GPT model developed for the AI & Geopolitics Project (AIxGEO). More information can be found in the ReadMe file.

  3. s

    Data from: ChatGPT in education: A discourse analysis of worries and...

    • socialmediaarchive.org
    csv, json, txt
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). ChatGPT in education: A discourse analysis of worries and concerns on social media [Dataset]. https://socialmediaarchive.org/record/54
    Explore at:
    csv(6528597), json(248465998), txt(4908229)Available download formats
    Dataset updated
    Sep 26, 2023
    Description

    The rapid advancements in generative AI models present new opportunities in the education sector. However, it is imperative to acknowledge and address the potential risks and concerns that may arise with their use. We collected Twitter data to identify key concerns related to the use of ChatGPT in education. This dataset is used to support the study "ChatGPT in education: A discourse analysis of worries and concerns on social media."

    In this study, we particularly explored two research questions. RQ1 (Concerns): What are the key concerns that Twitter users perceive with using ChatGPT in education? RQ2 (Accounts): Which accounts are implicated in the discussion of these concerns? In summary, our study underscores the importance of responsible and ethical use of AI in education and highlights the need for collaboration among stakeholders to regulate AI policy.

  4. f

    Performance of GPT-3.5, GPT-4, and GPT-4o according to different subjects.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yao-Cheng Wu; Yun-Chi Wu; Ya-Chuan Chang; Chia-Ying Yu; Chun-Lin Wu; Wen-Wei Sung (2025). Performance of GPT-3.5, GPT-4, and GPT-4o according to different subjects. [Dataset]. http://doi.org/10.1371/journal.pone.0324841.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Yao-Cheng Wu; Yun-Chi Wu; Ya-Chuan Chang; Chia-Ying Yu; Chun-Lin Wu; Wen-Wei Sung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of GPT-3.5, GPT-4, and GPT-4o according to different subjects.

  5. a

    gpt-oss-120B (high) Output Speed by Provider

    • artificialanalysis.ai
    Updated Dec 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2023). gpt-oss-120B (high) Output Speed by Provider [Dataset]. https://artificialanalysis.ai/
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Output Speed: Output Tokens per Second by Provider

  6. f

    Summary of the scores of GPT-3.5, GPT-4, and GPT-4o in Stage 1 SPTEMD.

    • plos.figshare.com
    xls
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yao-Cheng Wu; Yun-Chi Wu; Ya-Chuan Chang; Chia-Ying Yu; Chun-Lin Wu; Wen-Wei Sung (2025). Summary of the scores of GPT-3.5, GPT-4, and GPT-4o in Stage 1 SPTEMD. [Dataset]. http://doi.org/10.1371/journal.pone.0324841.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Yao-Cheng Wu; Yun-Chi Wu; Ya-Chuan Chang; Chia-Ying Yu; Chun-Lin Wu; Wen-Wei Sung
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary of the scores of GPT-3.5, GPT-4, and GPT-4o in Stage 1 SPTEMD.

  7. d

    Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its...

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang, Jieshu; Kiran, Elif; S.R. Aurora (also known as Mai P. Trinh); Simeone, Michael; Lobo, José (2024). Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its Performance in the Public Sector Workforce [Dataset]. http://doi.org/10.7910/DVN/P3CDHS
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Wang, Jieshu; Kiran, Elif; S.R. Aurora (also known as Mai P. Trinh); Simeone, Michael; Lobo, José
    Description

    This repository contains two datasets used in the study exploring the impact of Generative AI, specifically ChatGPT, on the public sector workforce in the United States. The datasets provide detailed information on the core tasks of public sector occupations and their estimated performance metrics, including potential for automation and augmentation by ChatGPT. These estimations are generated by OpenAI’s GPT-4 model (GPT-4-1106-preview) through OpenAI API.

  8. a

    Seconds to First Answer Token Received by Model

    • artificialanalysis.ai
    Updated Dec 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2023). Seconds to First Answer Token Received by Model [Dataset]. https://artificialanalysis.ai/
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Seconds to First Answer Token Received; Accounts for Reasoning Model 'Thinking' time by Model

  9. f

    Data Sheet 1_Large language models generating synthetic clinical datasets: a...

    • frontiersin.figshare.com
    xlsx
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 1_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

  10. Data and Code for: Generative AI for Economic Research: Use Cases and...

    • openicpsr.org
    delimited
    Updated Oct 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Korinek (2023). Data and Code for: Generative AI for Economic Research: Use Cases and Implications for Economists [Dataset]. http://doi.org/10.3886/E194623V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Oct 21, 2023
    Dataset provided by
    American Economic Associationhttp://www.aeaweb.org/
    Authors
    Anton Korinek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generative AI, in particular large language models (LLMs) such as ChatGPT, has the potential to revolutionize research. I describe dozens of use cases along six domains in which LLMs are starting to become useful as both research assistants and tutors: ideation and feedback, writing, background research, data analysis, coding, and mathematical derivations. I provide general instructions and demonstrate specific examples of how to take advantage of each of these, classifying the LLM capabilities from experimental to highly useful. I argue that economists can reap significant productivity gains by taking advantage of generative AI to automate micro tasks. Moreover, these gains will grow as the performance of AI systems across all of these domains will continue to improve. I also speculate on the longer-term implications of AI-powered cognitive automation for economic research.The resources provided here contain the prompts and code to reproduce the chats with GPT-3.5, GPT-4, ChatGPT and Claude 2 that are listed in the paper.

  11. a

    Seconds to Output 500 Tokens, including reasoning model 'thinking' time by...

    • artificialanalysis.ai
    Updated Dec 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2023). Seconds to Output 500 Tokens, including reasoning model 'thinking' time by Model [Dataset]. https://artificialanalysis.ai/
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better by Model

  12. S

    Chat GPT Data

    • scidb.cn
    Updated Aug 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emmanuel Mensah Kparl; Iddris Faisal (2024). Chat GPT Data [Dataset]. http://doi.org/10.57760/sciencedb.11927
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Emmanuel Mensah Kparl; Iddris Faisal
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This if the data we used for our analysis

  13. SynthFluencers: AI-Generated Influencers

    • kaggle.com
    Updated Jan 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnthonyTherrien (2024). SynthFluencers: AI-Generated Influencers [Dataset]. http://doi.org/10.34740/kaggle/dsv/7444505
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    AnthonyTherrien
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Introduction

    Background

    Exploring the creation of a unique dataset of synthetic influencer profiles using AI technologies, including OpenAI's GPT-3.5.

    Methodology

    Data Generation Process

    1. Influencer Profile Generation: Profiles are generated with demographic details like age, gender, etc.
    2. Location Allocation: Randomly assigning U.S. states or Canadian provinces based on population distribution.
    3. GPT-3.5 Integration: Generating detailed backstories for each influencer profile using OpenAI's GPT-3.5-turbo-instruct model.

    Dataset Overview

    Structure

    • The dataset contains profiles with attributes like Name, Age, Sex, Lifestyle, Country of Origin, State or Province, Education Level, MBTI Personality and Backstory.

    Applications and Use Cases

    Potential Uses

    • Market Research: Understanding influencer dynamics in different niches.
    • AI Training: Enhancing the realism and diversity of AI-generated personas.
    • Social Media Strategy: Informing content creation and marketing strategies.

    Analysis and Insights

    Statistical Breakdown

    • Distribution of influencers across various lifestyles and locations.
    • Correlation between attractiveness ratings and lifestyle niches.

    Key Insights

    • Predominant trends in influencer personas based on demographics and location.

    Challenges and Limitations

    Ethical Considerations

    • The impact of synthetic influencers on real-world perceptions and digital marketing.

    Limitations of AI

    • Challenges in capturing the full depth of human characteristics and experiences.

    Conclusion

    Summary

    • This dataset provides a unique lens into the world of synthetic influencers, blending AI creativity with insights into social media dynamics.
  14. D

    A dataset of 1500-word stories generated by gpt-4o-mini for 236...

    • dataverse.no
    • search.dataone.org
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jill Walker Rettberg; Jill Walker Rettberg; Hermann Wigers; Hermann Wigers (2025). A dataset of 1500-word stories generated by gpt-4o-mini for 236 nationalities [Dataset]. http://doi.org/10.18710/VM2K4O
    Explore at:
    text/comma-separated-values(18583), txt(19740), zip(42408986)Available download formats
    Dataset updated
    May 28, 2025
    Dataset provided by
    DataverseNO
    Authors
    Jill Walker Rettberg; Jill Walker Rettberg; Hermann Wigers; Hermann Wigers
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    European Research Council
    Research Council of Norway
    Description

    We created a dataset of stories generated by OpenAI’s gpt-4o-miniby using a Python script to construct prompts that were sent to the OpenAI API. We used Statistics Norway’s list of 252 countries, added demonyms for each country, for example Norwegian for Norway, and removed countries without demonyms, leaving us with 236 countries. Our base prompt was “Write a 1500 word potential {demonym} story”, and we generated 50 stories for each country. The scripts used to generate the data, and additional scripts for analysis are available at the GitHub repository https://github.com/MachineVisionUiB/GPT_stories

  15. f

    Data from: Developing Students’ Statistical Expertise Through Writing in the...

    • tandf.figshare.com
    pdf
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura S. DeLuca; Alex Reinhart; Gordon Weinberg; Michael Laudenbach; Sydney Miller; David West Brown (2025). Developing Students’ Statistical Expertise Through Writing in the Age of AI [Dataset]. http://doi.org/10.6084/m9.figshare.28883205.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Laura S. DeLuca; Alex Reinhart; Gordon Weinberg; Michael Laudenbach; Sydney Miller; David West Brown
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As large language models (LLMs) such as GPT have become more accessible, concerns about their potential effects on students’ learning have grown. In data science education, the specter of students’ turning to LLMs raises multiple issues, as writing is a means not just of conveying information but of developing their statistical reasoning. In our study, we engage with questions surrounding LLMs and their pedagogical impact by: (a) quantitatively and qualitatively describing how select LLMs write report introductions and complete data analysis reports; and (b) comparing patterns in texts authored by LLMs to those authored by students and by published researchers. Our results show distinct differences between machine-generated and human-generated writing, as well as between novice and expert writing. Those differences are evident in how writers manage information, modulate confidence, signal importance, and report statistics. The findings can help inform classroom instruction, whether that instruction is aimed at dissuading the use of LLMs or at guiding their use as a productivity tool. It also has implications for students’ development as statistical thinkers and writers. What happens when they offload the work of data science to a model that doesn’t write quite like a data scientist? Supplementary materials for this article are available online.

  16. q

    The GPT Group SWOT, PESTLE, Porters Five Force and Financial Analysis

    • quaintel.com
    Updated Aug 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quaintel Research Solutions (2023). The GPT Group SWOT, PESTLE, Porters Five Force and Financial Analysis [Dataset]. https://quaintel.com/store/report/the-gpt-group-company-profile-swot-pestle-value-chain-analysis
    Explore at:
    Dataset updated
    Aug 3, 2023
    Dataset authored and provided by
    Quaintel Research Solutions
    License

    https://quaintel.com/privacy-policyhttps://quaintel.com/privacy-policy

    Area covered
    Global
    Description

    The GPT Group Company Profile, Opportunities, Challenges and Risk (SWOT, PESTLE and Value Chain); Corporate and ESG Strategies; Competitive Intelligence; Financial KPI’s; Operational KPI’s; Recent Trends: “ Read More

  17. I

    Global Generative Pre-trained Transformer (GPT) Market Risk Analysis...

    • statsndata.org
    excel, pdf
    Updated Jul 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Generative Pre-trained Transformer (GPT) Market Risk Analysis 2025-2032 [Dataset]. https://www.statsndata.org/report/generative-pre-trained-transformer-gpt-market-70513
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    Jul 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Generative Pre-trained Transformer (GPT) market has emerged as a transformative force in the realm of artificial intelligence, fundamentally reshaping how businesses and industries approach language processing tasks. As organizations increasingly seek to harness the power of advanced AI, the GPT technology is ra

  18. AI Financial Market Data

    • kaggle.com
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Science Lovers (2025). AI Financial Market Data [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/ai-financial-and-market-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data Science Lovers
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

    Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

    This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.

    This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

    This analyse will be helpful for those working in Finance or Share Market domain.

    From this dataset, we extract various insights using Python in our Project.

    1) How much amount the companies spent on R & D ?

    2) Revenue Earned by the companies

    3) Date-wise Impact on the Stock

    4) Events when Maximum Stock Impact was observed

    5) AI Revenue Growth of the companies

    6) Correlation between the columns

    7) Expenditure vs Revenue year-by-year

    8) Event Impact Analysis

    9) Change in the index wrt Year & Company

    These are the main Features/Columns available in the dataset :

    1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.

    2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".

    3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.

    4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.

    5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.

    6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.

    7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.

  19. Z

    Comparative Analysis of Artificial Intelligence Platforms: GPT-4 and Google...

    • data.niaid.nih.gov
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muluk, Erhan (2024). Comparative Analysis of Artificial Intelligence Platforms: GPT-4 and Google Gemini in Answering Questions about Birth Control Methods [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14387974
    Explore at:
    Dataset updated
    Dec 11, 2024
    Dataset authored and provided by
    Muluk, Erhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AbstractBackground: Birth control methods (BCMs) are often underutilized or misunderstood, especially among young individuals entering their reproductive years. With the growing reliance on artificial intelligence (AI) platforms for health-related information, this study evaluates the performance of GPT-4 (OpenAI, San Francisco, CA, USA) and Google Gemini (Google, Mountain View, CA, USA) in addressing commonly asked questions about BCMs.Methods: Thirty questions, derived from the American College of Obstetrics and Gynecologists website, were posed to both AI platforms. Questions spanned four categories: general contraception, specific contraceptive types, emergency contraception, and other topics. Responses were evaluated using a 5-point rubric assessing accuracy, completeness, and lack of false information. Overall scores were calculated by averaging the rubric scores. Statistical analysis, including the Wilcoxon signed-rank and Kruskal-Wallis tests, was performed to compare performance metrics.Results: ChatGPT and Google Gemini both provided high-quality responses, with overall scores averaging 4.38 ± 0.58 and 4.37 ± 0.52, respectively, categorized as "excellent." ChatGPT outperformed in reducing false information (4.70 ± 0.60 vs. 4.47 ± 0.73), while Google Gemini excelled in accuracy (4.53 ± 0.57 vs. 4.30 ± 0.70). Completeness scores were comparable. No significant differences were found in overall performance (p = 0.548), though Google Gemini showed a significant edge in accuracy (p = 0.035). Both platforms scored consistently across question categories, with no statistically significant differences noted.Conclusions: GPT-4 and Google Gemini provide reliable and accurate responses to BCM-related queries, with slight differences in strengths. These findings underscore the potential of AI tools in addressing public health information needs, particularly for young individuals seeking guidance on contraception. Further studies with larger datasets may elucidate nuanced differences between AI platforms.

  20. f

    Data from: Ontolomics‑P: Advancing Proteomics Data Interpretation through...

    • acs.figshare.com
    xlsx
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yin Yang; Shisheng Wang; Yuzhe Chen; Xinyuan Wang; Wei Jiang; Youmei Jin; Wenjuan Zeng; Dongbo Wu; Bairong Shen; Hao Yang (2025). Ontolomics‑P: Advancing Proteomics Data Interpretation through GPT-4o Reannotated Topic Ontology and Data-Driven Analysis [Dataset]. http://doi.org/10.1021/acs.analchem.5c00390.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    ACS Publications
    Authors
    Yin Yang; Shisheng Wang; Yuzhe Chen; Xinyuan Wang; Wei Jiang; Youmei Jin; Wenjuan Zeng; Dongbo Wu; Bairong Shen; Hao Yang
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The interpretation of proteomics data often relies on functional enrichment analysis, such as Gene Ontology (GO) enrichment, to uncover the biological functions of proteins, as well as the examination of protein expression patterns across data sets like the Clinical Proteomic Tumor Analysis Consortium (CPTAC) database. However, conventional approaches to functional enrichment frequently produce extensive and redundant term lists, complicating interpretation and synthesis. Moreover, the absence of specialized tools tailored to proteomics researchers limits the efficient exploration of protein expression within specific biological contexts. To address these challenges, we developed Ontolomics-P, a user-friendly web-based tool designed to advance proteomics data interpretation. Ontolomics-P integrates topic modeling using latent Dirichlet allocation (LDA) with GO semantic similarity analysis, enabling the consolidation of redundant terms into coherent topics. These topics are further refined and reannotated using the GPT-4o language model, creating a novel topics database that provides precise and interpretable insights into shared biological functions. Additionally, Ontolomics-P incorporates quantitative proteomic data from 10 diverse cancer types archived in the CPTAC database, allowing for a comprehensive exploration of protein expression profiles from a data-driven perspective. Through detailed case studies, we demonstrate the tool’s capacity to streamline workflows, simplify interpretation, and provide actionable biological insights. Ontolomics-P represents a significant advancement in proteomics data analysis, offering innovative solutions for functional annotation, quantitative exploration, and visualization, ultimately empowering researchers to accelerate discoveries in systems biology and beyond.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001

Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Aug 8, 2024
Dataset provided by
Frontiers
Authors
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.

Search
Clear search
Close search
Google apps
Main menu