76 datasets found

f
Data_Sheet_1_Advanced large language models and visualization tools for data...
frontiersin.figshare.com
txt
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1418006.s001
Dataset updated
Aug 8, 2024
Dataset provided by
Frontiers
Authors
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.
c
Research data supporting "Using ChatGPT for Thematic Analysis Working Paper:...
repository.cam.ac.uk
csv, pdf
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Turobov, Aleksei; Coyle, Diane; Harding, Verity (2024). Research data supporting "Using ChatGPT for Thematic Analysis Working Paper: UN policy documents 2017-2024." [Dataset]. http://doi.org/10.17863/CAM.108401
Explore at:
pdf(64191 bytes), csv(254606 bytes), csv(17873 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.108401
Dataset updated
May 17, 2024
Dataset provided by
University of Cambridge
Apollo
Authors
Turobov, Aleksei; Coyle, Diane; Harding, Verity
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United Nations
Description
Data contain 63 UN policy documents and press releases from the Digital Library covering the years 2017 to March 2024 and over 700 distinct codes for Thematic Analysis generated by the custom GPT model developed for the AI & Geopolitics Project (AIxGEO). More information can be found in the ReadMe file.
s
Data from: ChatGPT in education: A discourse analysis of worries and...
socialmediaarchive.org
csv, json, txt
Updated Sep 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). ChatGPT in education: A discourse analysis of worries and concerns on social media [Dataset]. https://socialmediaarchive.org/record/54
Explore at:
csv(6528597), json(248465998), txt(4908229)Available download formats
Dataset updated
Sep 26, 2023
Description
The rapid advancements in generative AI models present new opportunities in the education sector. However, it is imperative to acknowledge and address the potential risks and concerns that may arise with their use. We collected Twitter data to identify key concerns related to the use of ChatGPT in education. This dataset is used to support the study "ChatGPT in education: A discourse analysis of worries and concerns on social media."

In this study, we particularly explored two research questions. RQ1 (Concerns): What are the key concerns that Twitter users perceive with using ChatGPT in education? RQ2 (Accounts): Which accounts are implicated in the discussion of these concerns? In summary, our study underscores the importance of responsible and ethical use of AI in education and highlights the need for collaboration among stakeholders to regulate AI policy.
f
Performance of GPT-3.5, GPT-4, and GPT-4o according to different subjects.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yao-Cheng Wu; Yun-Chi Wu; Ya-Chuan Chang; Chia-Ying Yu; Chun-Lin Wu; Wen-Wei Sung (2025). Performance of GPT-3.5, GPT-4, and GPT-4o according to different subjects. [Dataset]. http://doi.org/10.1371/journal.pone.0324841.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0324841.t002
Dataset updated
Jun 4, 2025
Dataset provided by
PLOS ONE
Authors
Yao-Cheng Wu; Yun-Chi Wu; Ya-Chuan Chang; Chia-Ying Yu; Chun-Lin Wu; Wen-Wei Sung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of GPT-3.5, GPT-4, and GPT-4o according to different subjects.
a
gpt-oss-120B (high) Output Speed by Provider
artificialanalysis.ai
Updated Dec 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2023). gpt-oss-120B (high) Output Speed by Provider [Dataset]. https://artificialanalysis.ai/
Explore at:
Dataset updated
Dec 30, 2023
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Output Speed: Output Tokens per Second by Provider
f
Summary of the scores of GPT-3.5, GPT-4, and GPT-4o in Stage 1 SPTEMD.
plos.figshare.com
xls
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yao-Cheng Wu; Yun-Chi Wu; Ya-Chuan Chang; Chia-Ying Yu; Chun-Lin Wu; Wen-Wei Sung (2025). Summary of the scores of GPT-3.5, GPT-4, and GPT-4o in Stage 1 SPTEMD. [Dataset]. http://doi.org/10.1371/journal.pone.0324841.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0324841.t001
Dataset updated
Jun 4, 2025
Dataset provided by
PLOS ONE
Authors
Yao-Cheng Wu; Yun-Chi Wu; Ya-Chuan Chang; Chia-Ying Yu; Chun-Lin Wu; Wen-Wei Sung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary of the scores of GPT-3.5, GPT-4, and GPT-4o in Stage 1 SPTEMD.
d
Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its...
search.dataone.org
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang, Jieshu; Kiran, Elif; S.R. Aurora (also known as Mai P. Trinh); Simeone, Michael; Lobo, José (2024). Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its Performance in the Public Sector Workforce [Dataset]. http://doi.org/10.7910/DVN/P3CDHS
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/P3CDHS
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Wang, Jieshu; Kiran, Elif; S.R. Aurora (also known as Mai P. Trinh); Simeone, Michael; Lobo, José
Description
This repository contains two datasets used in the study exploring the impact of Generative AI, specifically ChatGPT, on the public sector workforce in the United States. The datasets provide detailed information on the core tasks of public sector occupations and their estimated performance metrics, including potential for automation and augmentation by ChatGPT. These estimations are generated by OpenAI’s GPT-4 model (GPT-4-1106-preview) through OpenAI API.
a
Seconds to First Answer Token Received by Model
artificialanalysis.ai
Updated Dec 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2023). Seconds to First Answer Token Received by Model [Dataset]. https://artificialanalysis.ai/
Explore at:
Dataset updated
Dec 30, 2023
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Seconds to First Answer Token Received; Accounts for Reasoning Model 'Thinking' time by Model
f
Data Sheet 1_Large language models generating synthetic clinical datasets: a...
frontiersin.figshare.com
xlsx
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 1_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2025.1533508.s001
Dataset updated
Feb 5, 2025
Dataset provided by
Frontiers
Authors
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
Data and Code for: Generative AI for Economic Research: Use Cases and...
openicpsr.org
delimited
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anton Korinek (2023). Data and Code for: Generative AI for Economic Research: Use Cases and Implications for Economists [Dataset]. http://doi.org/10.3886/E194623V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E194623V1
Dataset updated
Oct 21, 2023
Dataset provided by
American Economic Associationhttp://www.aeaweb.org/
Authors
Anton Korinek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Generative AI, in particular large language models (LLMs) such as ChatGPT, has the potential to revolutionize research. I describe dozens of use cases along six domains in which LLMs are starting to become useful as both research assistants and tutors: ideation and feedback, writing, background research, data analysis, coding, and mathematical derivations. I provide general instructions and demonstrate specific examples of how to take advantage of each of these, classifying the LLM capabilities from experimental to highly useful. I argue that economists can reap significant productivity gains by taking advantage of generative AI to automate micro tasks. Moreover, these gains will grow as the performance of AI systems across all of these domains will continue to improve. I also speculate on the longer-term implications of AI-powered cognitive automation for economic research.The resources provided here contain the prompts and code to reproduce the chats with GPT-3.5, GPT-4, ChatGPT and Claude 2 that are listed in the paper.
a
Seconds to Output 500 Tokens, including reasoning model 'thinking' time by...
artificialanalysis.ai
Updated Dec 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2023). Seconds to Output 500 Tokens, including reasoning model 'thinking' time by Model [Dataset]. https://artificialanalysis.ai/
Explore at:
Dataset updated
Dec 30, 2023
Dataset authored and provided by
Artificial Analysis
Description
Comparison of Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better by Model
S
Chat GPT Data
scidb.cn
Updated Aug 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emmanuel Mensah Kparl; Iddris Faisal (2024). Chat GPT Data [Dataset]. http://doi.org/10.57760/sciencedb.11927
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.11927
Dataset updated
Aug 14, 2024
Dataset provided by
Science Data Bank
Authors
Emmanuel Mensah Kparl; Iddris Faisal
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This if the data we used for our analysis
SynthFluencers: AI-Generated Influencers
kaggle.com
Updated Jan 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AnthonyTherrien (2024). SynthFluencers: AI-Generated Influencers [Dataset]. http://doi.org/10.34740/kaggle/dsv/7444505
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7444505
Dataset updated
Jan 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
AnthonyTherrien
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Introduction

Background

Exploring the creation of a unique dataset of synthetic influencer profiles using AI technologies, including OpenAI's GPT-3.5.

Methodology

Data Generation Process

Influencer Profile Generation: Profiles are generated with demographic details like age, gender, etc.

Location Allocation: Randomly assigning U.S. states or Canadian provinces based on population distribution.

GPT-3.5 Integration: Generating detailed backstories for each influencer profile using OpenAI's GPT-3.5-turbo-instruct model.

Dataset Overview

Structure

The dataset contains profiles with attributes like Name, Age, Sex, Lifestyle, Country of Origin, State or Province, Education Level, MBTI Personality and Backstory.

Applications and Use Cases

Potential Uses

Market Research: Understanding influencer dynamics in different niches.

AI Training: Enhancing the realism and diversity of AI-generated personas.

Social Media Strategy: Informing content creation and marketing strategies.

Analysis and Insights

Statistical Breakdown

Distribution of influencers across various lifestyles and locations.

Correlation between attractiveness ratings and lifestyle niches.

Key Insights

Predominant trends in influencer personas based on demographics and location.

Challenges and Limitations

Ethical Considerations

The impact of synthetic influencers on real-world perceptions and digital marketing.

Limitations of AI

Challenges in capturing the full depth of human characteristics and experiences.

Conclusion

Summary

This dataset provides a unique lens into the world of synthetic influencers, blending AI creativity with insights into social media dynamics.
D
A dataset of 1500-word stories generated by gpt-4o-mini for 236...
dataverse.no
search.dataone.org
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jill Walker Rettberg; Jill Walker Rettberg; Hermann Wigers; Hermann Wigers (2025). A dataset of 1500-word stories generated by gpt-4o-mini for 236 nationalities [Dataset]. http://doi.org/10.18710/VM2K4O
Explore at:
text/comma-separated-values(18583), txt(19740), zip(42408986)Available download formats
Unique identifier
https://doi.org/10.18710/VM2K4O
Dataset updated
May 28, 2025
Dataset provided by
DataverseNO
Authors
Jill Walker Rettberg; Jill Walker Rettberg; Hermann Wigers; Hermann Wigers
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
European Research Council
Research Council of Norway
Description
We created a dataset of stories generated by OpenAI’s gpt-4o-miniby using a Python script to construct prompts that were sent to the OpenAI API. We used Statistics Norway’s list of 252 countries, added demonyms for each country, for example Norwegian for Norway, and removed countries without demonyms, leaving us with 236 countries. Our base prompt was “Write a 1500 word potential {demonym} story”, and we generated 50 stories for each country. The scripts used to generate the data, and additional scripts for analysis are available at the GitHub repository https://github.com/MachineVisionUiB/GPT_stories
f
Data from: Developing Students’ Statistical Expertise Through Writing in the...
tandf.figshare.com
pdf
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura S. DeLuca; Alex Reinhart; Gordon Weinberg; Michael Laudenbach; Sydney Miller; David West Brown (2025). Developing Students’ Statistical Expertise Through Writing in the Age of AI [Dataset]. http://doi.org/10.6084/m9.figshare.28883205.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28883205.v2
Dataset updated
Jun 30, 2025
Dataset provided by
Taylor & Francis
Authors
Laura S. DeLuca; Alex Reinhart; Gordon Weinberg; Michael Laudenbach; Sydney Miller; David West Brown
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As large language models (LLMs) such as GPT have become more accessible, concerns about their potential effects on students’ learning have grown. In data science education, the specter of students’ turning to LLMs raises multiple issues, as writing is a means not just of conveying information but of developing their statistical reasoning. In our study, we engage with questions surrounding LLMs and their pedagogical impact by: (a) quantitatively and qualitatively describing how select LLMs write report introductions and complete data analysis reports; and (b) comparing patterns in texts authored by LLMs to those authored by students and by published researchers. Our results show distinct differences between machine-generated and human-generated writing, as well as between novice and expert writing. Those differences are evident in how writers manage information, modulate confidence, signal importance, and report statistics. The findings can help inform classroom instruction, whether that instruction is aimed at dissuading the use of LLMs or at guiding their use as a productivity tool. It also has implications for students’ development as statistical thinkers and writers. What happens when they offload the work of data science to a model that doesn’t write quite like a data scientist? Supplementary materials for this article are available online.
q
The GPT Group SWOT, PESTLE, Porters Five Force and Financial Analysis
quaintel.com
Updated Aug 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quaintel Research Solutions (2023). The GPT Group SWOT, PESTLE, Porters Five Force and Financial Analysis [Dataset]. https://quaintel.com/store/report/the-gpt-group-company-profile-swot-pestle-value-chain-analysis
Explore at:
Dataset updated
Aug 3, 2023
Dataset authored and provided by
Quaintel Research Solutions
License
https://quaintel.com/privacy-policyhttps://quaintel.com/privacy-policy
Area covered
Global
Description
The GPT Group Company Profile, Opportunities, Challenges and Risk (SWOT, PESTLE and Value Chain); Corporate and ESG Strategies; Competitive Intelligence; Financial KPI’s; Operational KPI’s; Recent Trends: “ Read More
I
Global Generative Pre-trained Transformer (GPT) Market Risk Analysis...
statsndata.org
excel, pdf
Updated Jul 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global Generative Pre-trained Transformer (GPT) Market Risk Analysis 2025-2032 [Dataset]. https://www.statsndata.org/report/generative-pre-trained-transformer-gpt-market-70513
Explore at:
pdf, excelAvailable download formats
Dataset updated
Jul 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
The Generative Pre-trained Transformer (GPT) market has emerged as a transformative force in the realm of artificial intelligence, fundamentally reshaping how businesses and industries approach language processing tasks. As organizations increasingly seek to harness the power of advanced AI, the GPT technology is ra
AI Financial Market Data
kaggle.com
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Science Lovers (2025). AI Financial Market Data [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/ai-financial-and-market-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data Science Lovers
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.

This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

This analyse will be helpful for those working in Finance or Share Market domain.

From this dataset, we extract various insights using Python in our Project.

1) How much amount the companies spent on R & D ?

2) Revenue Earned by the companies

3) Date-wise Impact on the Stock

4) Events when Maximum Stock Impact was observed

5) AI Revenue Growth of the companies

6) Correlation between the columns

7) Expenditure vs Revenue year-by-year

8) Event Impact Analysis

9) Change in the index wrt Year & Company

These are the main Features/Columns available in the dataset :

1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.

2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".

3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.

4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.

5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.

6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.

7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.
Z
Comparative Analysis of Artificial Intelligence Platforms: GPT-4 and Google...
data.niaid.nih.gov
Updated Dec 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muluk, Erhan (2024). Comparative Analysis of Artificial Intelligence Platforms: GPT-4 and Google Gemini in Answering Questions about Birth Control Methods [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14387974
Explore at:
Dataset updated
Dec 11, 2024
Dataset authored and provided by
Muluk, Erhan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AbstractBackground: Birth control methods (BCMs) are often underutilized or misunderstood, especially among young individuals entering their reproductive years. With the growing reliance on artificial intelligence (AI) platforms for health-related information, this study evaluates the performance of GPT-4 (OpenAI, San Francisco, CA, USA) and Google Gemini (Google, Mountain View, CA, USA) in addressing commonly asked questions about BCMs.Methods: Thirty questions, derived from the American College of Obstetrics and Gynecologists website, were posed to both AI platforms. Questions spanned four categories: general contraception, specific contraceptive types, emergency contraception, and other topics. Responses were evaluated using a 5-point rubric assessing accuracy, completeness, and lack of false information. Overall scores were calculated by averaging the rubric scores. Statistical analysis, including the Wilcoxon signed-rank and Kruskal-Wallis tests, was performed to compare performance metrics.Results: ChatGPT and Google Gemini both provided high-quality responses, with overall scores averaging 4.38 ± 0.58 and 4.37 ± 0.52, respectively, categorized as "excellent." ChatGPT outperformed in reducing false information (4.70 ± 0.60 vs. 4.47 ± 0.73), while Google Gemini excelled in accuracy (4.53 ± 0.57 vs. 4.30 ± 0.70). Completeness scores were comparable. No significant differences were found in overall performance (p = 0.548), though Google Gemini showed a significant edge in accuracy (p = 0.035). Both platforms scored consistently across question categories, with no statistically significant differences noted.Conclusions: GPT-4 and Google Gemini provide reliable and accurate responses to BCM-related queries, with slight differences in strengths. These findings underscore the potential of AI tools in addressing public health information needs, particularly for young individuals seeking guidance on contraception. Further studies with larger datasets may elucidate nuanced differences between AI platforms.
f
Data from: Ontolomics‑P: Advancing Proteomics Data Interpretation through...
acs.figshare.com
xlsx
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yin Yang; Shisheng Wang; Yuzhe Chen; Xinyuan Wang; Wei Jiang; Youmei Jin; Wenjuan Zeng; Dongbo Wu; Bairong Shen; Hao Yang (2025). Ontolomics‑P: Advancing Proteomics Data Interpretation through GPT-4o Reannotated Topic Ontology and Data-Driven Analysis [Dataset]. http://doi.org/10.1021/acs.analchem.5c00390.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.5c00390.s002
Dataset updated
May 6, 2025
Dataset provided by
ACS Publications
Authors
Yin Yang; Shisheng Wang; Yuzhe Chen; Xinyuan Wang; Wei Jiang; Youmei Jin; Wenjuan Zeng; Dongbo Wu; Bairong Shen; Hao Yang
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The interpretation of proteomics data often relies on functional enrichment analysis, such as Gene Ontology (GO) enrichment, to uncover the biological functions of proteins, as well as the examination of protein expression patterns across data sets like the Clinical Proteomic Tumor Analysis Consortium (CPTAC) database. However, conventional approaches to functional enrichment frequently produce extensive and redundant term lists, complicating interpretation and synthesis. Moreover, the absence of specialized tools tailored to proteomics researchers limits the efficient exploration of protein expression within specific biological contexts. To address these challenges, we developed Ontolomics-P, a user-friendly web-based tool designed to advance proteomics data interpretation. Ontolomics-P integrates topic modeling using latent Dirichlet allocation (LDA) with GO semantic similarity analysis, enabling the consolidation of redundant terms into coherent topics. These topics are further refined and reannotated using the GPT-4o language model, creating a novel topics database that provides precise and interpretable insights into shared biological functions. Additionally, Ontolomics-P incorporates quantitative proteomic data from 10 diverse cancer types archived in the CPTAC database, allowing for a comprehensive exploration of protein expression profiles from a data-driven perspective. Through detailed case studies, we demonstrate the tool’s capacity to streamline workflows, simplify interpretation, and provide actionable biological insights. Ontolomics-P represents a significant advancement in proteomics data analysis, offering innovative solutions for functional annotation, quantitative exploration, and visualization, ultimately empowering researchers to accelerate discoveries in systems biology and beyond.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001

Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.3389/feduc.2024.1418006.s001

Dataset updated

Aug 8, 2024

Dataset provided by

Frontiers

Authors

Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_Advanced large language models and visualization tools for data...

Research data supporting "Using ChatGPT for Thematic Analysis Working Paper:...

Data from: ChatGPT in education: A discourse analysis of worries and...

Performance of GPT-3.5, GPT-4, and GPT-4o according to different subjects.

gpt-oss-120B (high) Output Speed by Provider

Summary of the scores of GPT-3.5, GPT-4, and GPT-4o in Stage 1 SPTEMD.

Replication Data for: ChatGPT on ChatGPT: An Exploratory Analysis of its...

Seconds to First Answer Token Received by Model

Data Sheet 1_Large language models generating synthetic clinical datasets: a...

Data and Code for: Generative AI for Economic Research: Use Cases and...

Seconds to Output 500 Tokens, including reasoning model 'thinking' time by...

Chat GPT Data

SynthFluencers: AI-Generated Influencers

Introduction

Background

Methodology

Data Generation Process

Dataset Overview

Structure

Applications and Use Cases

Potential Uses

Analysis and Insights

Statistical Breakdown

Key Insights

Challenges and Limitations

Ethical Considerations

Limitations of AI

Conclusion

Summary

A dataset of 1500-word stories generated by gpt-4o-mini for 236...

Data from: Developing Students’ Statistical Expertise Through Writing in the...

The GPT Group SWOT, PESTLE, Porters Five Force and Financial Analysis

Global Generative Pre-trained Transformer (GPT) Market Risk Analysis...

AI Financial Market Data

📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

Comparative Analysis of Artificial Intelligence Platforms: GPT-4 and Google...

Data from: Ontolomics‑P: Advancing Proteomics Data Interpretation through...

Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv