100+ datasets found

Daily active users of DeepSeek 2025
statista.com
Updated Mar 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Daily active users of DeepSeek 2025 [Dataset]. https://www.statista.com/statistics/1561128/deepseek-daily-active-users/
Explore at:
Dataset updated
Mar 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 11, 2025 - Feb 15, 2025
Area covered
China
Description
As of mid-February 2025, the Chinese AI chatbot DeepSeek had around 47 million daily active users. When DeepSeek released its research paper illustrating the capabilities of their chatbot, a global audience became aware of the company. As a result, the number of daily active users skyrocketed.
Firms planned LLM model usage in commercial deployments worldwide 2024
statista.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Firms planned LLM model usage in commercial deployments worldwide 2024 [Dataset]. https://www.statista.com/statistics/1485176/choice-of-llm-models-for-commercial-deployment-global/
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
As of 2024, over **** the global firms planned to use LLMs (LLama and LLama-like models), while ** percent chose to use embedding models (BERT and family) in their commercial deployment. Additionally, only ***** percent planned to utilize multi-modal models.
h
Bitext-customer-support-llm-chatbot-training-dataset
huggingface.co
opendatalab.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext, Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.
Global ranking of LLM tools in 2023
statista.com
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global ranking of LLM tools in 2023 [Dataset]. https://www.statista.com/statistics/1458138/leading-llm-tools/
Explore at:
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
Worldwide
Description
In 2023, Claude 3 Opus was the large language model (LLM) tool that had the largest average worldwide, with an average total of ***** percent. Close behind, in second place, was Gemini 1.5 Pro with an average of about ** percent.
b
ChatGPT Revenue and Usage Statistics (2025)
businessofapps.com
Updated Feb 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Business of Apps (2023). ChatGPT Revenue and Usage Statistics (2025) [Dataset]. https://www.businessofapps.com/data/chatgpt-statistics/
Explore at:
Dataset updated
Feb 9, 2023
Dataset authored and provided by
Business of Apps
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
ChatGPT has taken the world by storm, setting a record for the fastest app to reach a 100 million users, which it hit in two months. The implications of this tool are far-reaching, universities...
v
Global Large Language Model (LLM) Market Size By Component, By Application,...
verifiedmarketresearch.com
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Global Large Language Model (LLM) Market Size By Component, By Application, By Deployment Mode, By Organization Size, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/large-language-model-llm-market/
Explore at:
Dataset updated
Jul 25, 2024
Dataset authored and provided by
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2031
Area covered
Global
Description
Large Language Model (LLM) Market size was valued at USD 4.6 Billion in 2023 and is projected to reach USD 64.9 Billion by 2031, growing at a CAGR of 32.1% during the forecast period 2024-2031.

Global Large Language Model (LLM) Market Drivers

The market drivers for the Large Language Model (LLM) Market can be influenced by various factors. These may include:

Advancements in AI and Machine Learning: Continuous improvements in AI algorithms and machine learning techniques are pushing the capabilities of (LLM), making them more attractive for a variety of applications.

Increasing Demand for Automation: Businesses and industries are increasingly seeking automation solutions for customer service, content creation, and data analysis, which drives the demand for (LLM).

Rising Investments in AI: There is a significant influx of investments from both private and public sectors in AI research and development, fostering the growth of the (LLM) market.

Expanding Application Areas: (LLM) are being applied in a wider range of fields such as healthcare, finance, legal, and education, which broadens their market scope.

Enhanced Computing Power: Improvements in computing infrastructure, including the advent of advanced GPUs and cloud computing services, are making it feasible to train and deploy large language models more efficiently.

Growing Digital Transformation Initiatives: Companies undergoing digital transformation are adopting (LLM) to leverage their capabilities in natural language understanding and generation for improved business processes.

Increased Availability of Data: The abundance of text data from the internet and other sources provides the necessary training material for developing more sophisticated (LLM).

Consumer Demand for Better User Experiences: There is a growing expectation for intuitive and responsive user interfaces enabled by (LLM), particularly in applications like virtual assistants and catboats.

Developments in Natural Language Processing: Progress in natural language processing (NLP) techniques contributes to more effective and efficient (LLM), enhancing their practical utility and market value.

Regulatory and Compliance Requirements: Certain industries are leveraging (LLM) to ensure compliance with legal and regulatory standards by automating documentation and reporting tasks.
C
Data from: Dataset used to generate Variability-Driven User-Story using LLM...
dataverse.cirad.fr
application/x-gzip +1
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marianne Huchard; Alain Gutierrez; Zhang Yulin (Huaxi); Zhang Yulin (Huaxi); Alexandre Bazin; Alexandre Bazin; Pierre Martin; Marianne Huchard; Alain Gutierrez; Pierre Martin (2025). Dataset used to generate Variability-Driven User-Story using LLM and Triadic Concept Analysis [Dataset]. http://doi.org/10.18167/DVN1/GNJMAV
Explore at:
text/markdown(226), application/x-gzip(33538), application/x-gzip(29961), application/x-gzip(23669), text/markdown(2072), application/x-gzip(50676), application/x-gzip(1061)Available download formats
Unique identifier
https://doi.org/10.18167/DVN1/GNJMAV
Dataset updated
Feb 7, 2025
Dataset provided by
CIRAD Dataverse
Authors
Marianne Huchard; Alain Gutierrez; Zhang Yulin (Huaxi); Zhang Yulin (Huaxi); Alexandre Bazin; Alexandre Bazin; Pierre Martin; Marianne Huchard; Alain Gutierrez; Pierre Martin
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
France, Herault, Montpellier
Description
The diffusion of knowledge, extracted from the Knomana knowledge base (Silvie et al., 2021) or inferred using Knomana, requires to develop a specific application for each type of final user (e.g. researcher vs. farmer). This involves to develop a family of similar applications, which can be done thanks to the paradigm of software product line. A challenge is to identify, analyze and structure the functionalities required by each user before the development of such applications. The considered solution is to use website presenting similar functionalities to design the ones of Knomana. The solution evaluated consists in using an LLM.
LLM prompts in the context of machine learning
kaggle.com
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Nelson (2024). LLM prompts in the context of machine learning [Dataset]. https://www.kaggle.com/datasets/jordanln/llm-prompts-in-the-context-of-machine-learning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 1, 2024
Dataset provided by
Kaggle
Authors
Jordan Nelson
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is an extension of my previous work on creating a dataset for natural language processing tasks. It leverages binary representation to characterise various machine learning models. The attributes in the dataset are derived from a dictionary, which was constructed from a corpus of prompts typically provided to a large language model (LLM). These prompts reference specific machine learning algorithms and their implementations. For instance, consider a user asking an LLM or a generative AI to create a Multi-Layer Perceptron (MLP) model for a particular application. By applying this concept to multiple machine learning models, we constructed our corpus. This corpus was then transformed into the current dataset using a bag-of-words approach. In this dataset, each attribute corresponds to a word from our dictionary, represented as a binary value: 1 indicates the presence of the word in a given prompt, and 0 indicates its absence. At the end of each entry, there is a label. Each entry in the dataset pertains to a single class, where each class represents a distinct machine learning model or algorithm. This dataset is intended for multi-class classification tasks, not multi-label classification, as each entry is associated with only one label and does not belong to multiple labels simultaneously. This dataset has been utilised with a Convolutional Neural Network (CNN) using the Keras Automodel API, achieving impressive training and testing accuracy rates exceeding 97%. Post-training, the model's predictive performance was rigorously evaluated in a production environment, where it continued to demonstrate exceptional accuracy. For this evaluation, we employed a series of questions, which are listed below. These questions were intentionally designed to be similar to ensure that the model can effectively distinguish between different machine learning models, even when the prompts are closely related.

KNN How would you create a KNN model to classify emails as spam or not spam based on their content and metadata? How could you implement a KNN model to classify handwritten digits using the MNIST dataset? How would you use a KNN approach to build a recommendation system for suggesting movies to users based on their ratings and preferences? How could you employ a KNN algorithm to predict the price of a house based on features such as its location, size, and number of bedrooms etc? Can you create a KNN model for classifying different species of flowers based on their petal length, petal width, sepal length, and sepal width? How would you utilise a KNN model to predict the sentiment (positive, negative, or neutral) of text reviews or comments? Can you create a KNN model for me that could be used in malware classification? Can you make me a KNN model that can detect a network intrusion when looking at encrypted network traffic? Can you make a KNN model that would predict the stock price of a given stock for the next week? Can you create a KNN model that could be used to detect malware when using a dataset relating to certain permissions a piece of software may have access to?

Decision Tree Can you describe the steps involved in building a decision tree model to classify medical images as malignant or benign for cancer diagnosis and return a model for me? How can you utilise a decision tree approach to develop a model for classifying news articles into different categories (e.g., politics, sports, entertainment) based on their textual content? What approach would you take to create a decision tree model for recommending personalised university courses to students based on their academic strengths and weaknesses? Can you describe how to create a decision tree model for identifying potential fraud in financial transactions based on transaction history, user behaviour, and other relevant data? In what ways might you apply a decision tree model to classify customer complaints into different categories determining the severity of language used? Can you create a decision tree classifier for me? Can you make me a decision tree model that will help me determine the best course of action across a given set of strategies? Can you create a decision tree model for me that can recommend certain cars to customers based on their preferences and budget? How can you make a decision tree model that will predict the movement of star constellations in the sky based on data provided by the NASA website? How do I create a decision tree for time-series forecasting?

Random Forest Can you describe the steps involved in building a random forest model to classify different types of anomalies in network traffic data for cybersecurity purposes and return the code for me? In what ways could you implement a random forest model to predict the severity of traffic congestion in urban areas based on historical traffic patterns, weather...
h
llm_robot
huggingface.co
Updated May 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aryaduta (2024). llm_robot [Dataset]. https://huggingface.co/datasets/Aryaduta/llm_robot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 5, 2024
Authors
Aryaduta
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Robotic Plan Generation

This dataset is for training LLM for robotic plan generation.

Dataset Details Dataset Description

The aim is to provide dataset that contain context (in this case, arm robot is used as example and 2 object is manipulated) and user goal. The output should be json string containing high level function that will be executed by the robot.

Dataset Structure Data Instances

A JSON-formatted example… See the full description on the dataset page: https://huggingface.co/datasets/Aryaduta/llm_robot.
L
Large Language Model (LLM) Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Large Language Model (LLM) Report [Dataset]. https://www.marketresearchforecast.com/reports/large-language-model-llm-38890
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Mar 18, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Large Language Model (LLM) market is experiencing explosive growth, projected to reach a substantial size driven by advancements in artificial intelligence and increasing demand across diverse sectors. The market's compound annual growth rate (CAGR) of 34.5% from 2019 to 2024 indicates a rapid expansion, and this momentum is expected to continue through 2033. The 2024 market size of $11.38 billion (assuming the provided "11380" refers to billions of dollars) underscores the significant investment and adoption of LLMs. Key drivers include the increasing availability of large datasets for training, advancements in deep learning algorithms, and the growing need for sophisticated natural language processing capabilities across various applications. The market segmentation highlights the diverse applications of LLMs, with Medical, Financial, and Industrial sectors being prominent early adopters. The availability of LLMs with varying parameter counts ("Hundreds of Billions" and "Trillions") reflects the spectrum of capabilities and corresponding resource requirements, influencing the market's pricing and target user base. The presence of major technology companies like Google, Microsoft, Amazon, and Meta further solidifies the market's significance and competitive landscape. The rapid adoption of LLMs is further fueled by ongoing research and development, leading to improvements in model accuracy, efficiency, and accessibility. While the specific constraints are not provided, potential challenges could include the ethical implications of LLMs, concerns regarding data privacy and security, and the ongoing need for robust infrastructure to support computationally intensive model training and deployment. Geographical distribution shows a strong presence in North America and Asia Pacific, with Europe and other regions exhibiting significant growth potential. The forecast period (2025-2033) offers substantial opportunity for continued market expansion, particularly as LLMs become more integrated into everyday applications and services, transforming various industries. The diverse range of companies involved reflects the significant interest and investment in this transformative technology, promising further innovation and market expansion.
C
DeepSeek AI Statistics and Facts (2025)
coolest-gadgets.com
Updated Jan 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coolest Gadgets (2025). DeepSeek AI Statistics and Facts (2025) [Dataset]. https://coolest-gadgets.com/deepseek-ai-statistics/
Explore at:
Dataset updated
Jan 29, 2025
Dataset authored and provided by
Coolest Gadgets
License
https://coolest-gadgets.com/privacy-policyhttps://coolest-gadgets.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Introduction

DeepSeek AI Statistics: DeepSeek AI, founded by Liang Wenfeng in May 2023, has quickly emerged as a significant competitor in the global artificial intelligence market, particularly recognized for its cost-effective and large-scale models. Despite the strong presence of U.S.-based companies like OpenAI, DeepSeek made a notable entry into the international arena in January 2025. The company benefits from unique funding provided by High-Flyer, a quantitative hedge fund also established by Wenfeng. This support allows DeepSeek to focus on long-term projects without the influence of external investors.

The core team at DeepSeek is composed of young and talented graduates from top Chinese universities, providing a fresh perspective and a deep understanding of AI development. The company prioritizes technical skills over traditional experience in its hiring practices, fostering a culture of innovation and efficiency.

DeepSeek has achieved significant milestones, including the release of the DeepSeek Coder in November 2023, an open-source model designed for coding tasks. Following this, they launched the DeepSeek LLM, which features 67 billion parameters. In May 2024, they unveiled DeepSeek-V2, a model that sparked a price competition in the Chinese AI market due to its affordability and impressive performance. The success of this model led major Chinese tech companies to lower their prices in order to remain competitive.

(Source: github.com/deepseek-ai/DeepSeek-LLM)

The more advanced DeepSeek-Coder-V2 has been introduced, boasting 236 billion parameters and a context length capacity of up to 128,000 tokens. This model is available via an API, priced at USD 0.14 per million input tokens and USD 0.28 per million output tokens. This pricing structure highlights the company's commitment to providing accessible and efficient AI solutions.
h
lmsys-chat-1m
huggingface.co
Updated May 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aarush Sah (2024). lmsys-chat-1m [Dataset]. https://huggingface.co/datasets/AarushSah/lmsys-chat-1m
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 8, 2024
Authors
Aarush Sah
Description
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

This dataset contains one million real-world conversations with 25 state-of-the-art LLMs. It is collected from 210K unique IP addresses in the wild on the Vicuna demo and Chatbot Arena website from April to August 2023. Each sample includes a conversation ID, model name, conversation text in OpenAI API JSON format, detected language tag, and OpenAI moderation API tag. User consent is obtained through the "Terms of use"… See the full description on the dataset page: https://huggingface.co/datasets/AarushSah/lmsys-chat-1m.
h
Bitext-retail-banking-llm-chatbot-training-dataset
huggingface.co
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-retail-banking-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-retail-banking-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Retail Banking Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail Banking] sector can be easily achieved using our two-step approach to LLM Fine-Tuning.… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-banking-llm-chatbot-training-dataset.
b
Palantir Technologies Overview
bullfincher.io
Updated May 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bullfincher (2025). Palantir Technologies Overview [Dataset]. https://bullfincher.io/companies/palantir-technologies/overview
Explore at:
Dataset updated
May 31, 2025
Dataset authored and provided by
Bullfincher
License
https://bullfincher.io/privacy-policyhttps://bullfincher.io/privacy-policy
Description
Palantir Technologies Inc. builds and deploys software platforms for the intelligence community to assist in counterterrorism investigations and operations in the United States, the United Kingdom, and internationally. The company provides Palantir Gotham, a software platform which enables users to identify patterns hidden deep within datasets, ranging from signals intelligence sources to reports from confidential informants, as well as facilitates the handoff between analysts and operational users, helping operators plan and execute real-world responses to threats that have been identified within the platform. It also offers Palantir Foundry, a platform that transforms the ways organizations operate by creating a central operating system for their data; and allows individual users to integrate and analyze the data they need in one place. In addition, it provides Palantir Apollo, a software that delivers software and updates across the business, as well as enables customers to deploy their software virtually in any environment; and Palantir Artificial Intelligence Platform (AIP) that provides unified access to open-source, self-hosted, and commercial large language models (LLM) that can transform structured and unstructured data into LLM-understandable objects and can turn organizations' actions and processes into tools for humans and LLM-driven agents. The company was incorporated in 2003 and is headquartered in Denver, Colorado.
Mobile On-Device LLM Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jun 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Mobile On-Device LLM Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/mobile-on-device-llm-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Jun 29, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Mobile On-Device LLM Market Outlook

According to our latest research, the global Mobile On-Device LLM market size was valued at USD 1.92 billion in 2024 and is poised to reach USD 16.8 billion by 2033, expanding at a robust CAGR of 27.4% during the forecast period. This remarkable growth trajectory is primarily driven by the increasing demand for real-time, privacy-preserving AI functionalities directly on mobile devices across various industries. The rapid evolution of mobile hardware, combined with advancements in compact and efficient large language models (LLMs), is enabling a new era of intelligent, on-device applications that do not rely on constant cloud connectivity.

One of the most significant growth factors for the Mobile On-Device LLM market is the rising consumer and enterprise awareness regarding data privacy and security. As more sensitive personal and organizational data is processed on mobile devices, there is a heightened need to minimize data transmission to external servers. On-device LLMs address this concern by enabling advanced AI functionalities, such as natural language understanding, text generation, and personalization, to be performed locally. This approach not only enhances privacy but also reduces latency, leading to faster and more reliable user experiences. The proliferation of privacy regulations worldwide, such as GDPR and CCPA, is further accelerating the adoption of on-device AI solutions in smartphones, wearables, and IoT devices.

Another key driver is the rapid technological advancement in mobile chipsets and memory architectures, which now support the deployment of increasingly sophisticated language models. The integration of AI accelerators and NPUs (Neural Processing Units) in modern mobile devices has made it feasible to run small and medium-scale LLMs efficiently without significant battery drain. As a result, device manufacturers and software developers are leveraging these capabilities to offer innovative applications, including virtual assistants, real-time translation, and context-aware recommendations. The competitive landscape among device OEMs is fostering continuous innovation, with leading brands racing to differentiate their products through advanced on-device AI features powered by LLMs.

The expanding ecosystem of AI development frameworks and toolkits tailored for mobile environments is also fueling market growth. Open-source initiatives and collaborations between semiconductor companies and AI research organizations have led to the optimization of LLM architectures for resource-constrained devices. This democratization of technology is lowering the entry barriers for app developers and enterprises, enabling a broader range of applications and services to harness the power of on-device language models. The growing developer community, coupled with increasing investments in AI research, is expected to further accelerate the adoption and innovation in the Mobile On-Device LLM market over the next decade.

From a regional perspective, North America currently dominates the Mobile On-Device LLM market, accounting for over 36% of global revenue in 2024, followed closely by Asia Pacific and Europe. The high penetration of advanced smartphones, robust digital infrastructure, and early adoption of AI technologies contribute to North America's leadership. However, Asia Pacific is expected to witness the fastest growth, with a projected CAGR of 29.1% through 2033, driven by the sheer volume of mobile users, increasing investments in AI-driven innovation, and the rapid expansion of 5G networks. Europe remains a significant market, propelled by stringent data privacy regulations and a strong focus on secure, user-centric AI solutions.

Model Type Analysis

The Model Type segment of the Mobile On-Device LLM market is categorized into Small Language Models, Medium Language Models, and Large Language Models. Small Language Models have gained significant traction due to their ability t
m
Large Language Model (LLM) Market Size | CAGR of 33.7%
market.us
csv, pdf
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market.us (2025). Large Language Model (LLM) Market Size | CAGR of 33.7% [Dataset]. https://market.us/report/large-language-model-llm-market/
Explore at:
csv, pdfAvailable download formats
Dataset updated
Mar 21, 2025
Dataset provided by
Market.us
License
https://market.us/privacy-policy/https://market.us/privacy-policy/
Time period covered
2022 - 2032
Area covered
Global
Description
The Large Language Model (LLM) Market is estimated to reach USD 82.1 Billion by 2033, riding on a strong 33.7% CAGR.
LLM market size in Japan FY 2024-2028
statista.com
ai-chatbox.pro
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). LLM market size in Japan FY 2024-2028 [Dataset]. https://www.statista.com/statistics/1550077/japan-large-language-model-market-size/
Explore at:
Dataset updated
Jun 6, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Japan
Description
The value of the large language model (LLM) market in Japan was projected to reach ** billion Japanese yen in fiscal year 2024. Partly based on the assumption that the market would diversify with the release of specialized and cheaper LLMs from fiscal year 2025 onward, the market size was forecast to more than quadruple by fiscal year 2028.
L
Large Language Model (LLM) Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Large Language Model (LLM) Report [Dataset]. https://www.marketreportanalytics.com/reports/large-language-model-llm-52544
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Large Language Model (LLM) market is experiencing explosive growth, driven by advancements in artificial intelligence and the increasing demand for sophisticated natural language processing capabilities across various sectors. While precise market sizing for 2025 requires proprietary data, leveraging publicly available reports and industry analyses, we can estimate a 2025 market value of approximately $20 billion, projecting a Compound Annual Growth Rate (CAGR) of 35% from 2025 to 2033. This robust growth is fueled by several key factors. The proliferation of cloud computing services provides the necessary infrastructure for LLM development and deployment. Furthermore, the rising adoption of LLMs in diverse applications, including customer service chatbots, content generation, language translation, and code development, is significantly contributing to market expansion. The trend toward personalized user experiences and the growing need for efficient data analysis further bolster market demand. However, challenges remain, including concerns about data privacy, ethical considerations surrounding AI bias, and the high computational costs associated with training and deploying large language models. These restraints are expected to moderate growth but not stifle the overall upward trajectory of the market. Segment analysis reveals significant opportunities within specific application areas. The most prominent segments include customer service (driven by automation needs), content creation (leveraging automated writing and editing tools), and software development (utilizing LLMs for code generation and debugging). Similarly, segmentation by type reveals a strong preference for cloud-based LLMs due to their scalability and accessibility, while on-premise deployments remain relevant for organizations with stringent data security requirements. Geographically, North America and Europe currently hold the largest market share, driven by early adoption and a robust technological infrastructure. However, the Asia-Pacific region is poised for rapid growth, particularly in countries like China and India, due to their large populations and rapidly expanding digital economies. The competitive landscape is dynamic, with major technology companies leading the development and deployment of LLMs, alongside numerous startups offering specialized solutions. Over the forecast period, consolidation and strategic partnerships are anticipated, reshaping the competitive dynamics and market structure.
HiST-LLM
zenodo.org
bin, json
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakob Elias Hauser; Jakob Elias Hauser (2025). HiST-LLM [Dataset]. http://doi.org/10.5281/zenodo.14671248
Explore at:
bin, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14671248
Dataset updated
Jan 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jakob Elias Hauser; Jakob Elias Hauser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large Language Models' Expert-level Global History Knowledge Benchmark (HiST-LLM)

Large Language Models (LLMs) have the potential to transform humanities and social science research, yet their history knowledge and comprehension at a graduate level remains untested. Benchmarking LLMs in history is particularly challenging, given that human knowledge of history is inherently unbalanced, with more information available on Western history and recent periods. We introduce the History Seshat Test for LLMs (Hist-LLM), based on a subset of the Seshat Global History Databank, which provides a structured representation of human historical knowledge, containing 36,000 data points across 600 historical societies and over 2,700 scholarly references. This dataset covers every major world region from the Neolithic period to the Industrial Revolution and includes information reviewed and assembled by history experts and graduate research assistants. Using this dataset, we benchmark a total of seven models from the Gemini, OpenAI, and Llama families. We find that, in a four-choice format, LLMs have a balanced accuracy ranging from 33.6% (Llama-3.1-8B) to 46% (GPT-4-Turbo), outperforming random guessing (25%) but falling short of expert comprehension. LLMs perform better on earlier historical periods. Regionally, performance is more even but still better for the Americas and lowest in Oceania and Sub-Saharan Africa for the more advanced models. Our benchmark shows that while LLMs possess some expert-level historical knowledge, there is considerable room for improvement.

Dataset links

Dataset Repository (Github)

Croissant Metadata (Github)

Usage

This dataset can be used to benchmark LLMs on their expert level history knowledge.

Loading the dataset

using Python and Pandas:

import pandas as pd main = pd.read_parquet("Neurips_HiST-LLM.parquet") ref = pd.read_parquet("references.parquet")

Dataset metadata

Dataset metadata documented in the croissant.json file.

Model Fingerprints

When model fingerprint are available we created extra columns for each model fingerprint. These columns are named via the following pattern .

Column Descriptions

additional_review

Boolean This column describes whether datapoints underwent additional expert review. See section 3.2 of the Paper.

Q

The multiple choice question.

A

The expected completion of the prompt.

polity old id

ID for polity according to Seshat ids.

start year str

String for when polity started existing (in BCE/CE format).

end year str

String for when polity stopped existing (in BCE/CE format).

start year int

Int for when polity started existing (in BCE/CE format).

end year int

Int for when polity stopped existing (in BCE/CE format).

name

Polity name.

nga

Natural Geographic Area for Polity.

world_region

The world region of a NGA (based on the UN regions with some modifications)

category

Immediate parent category of fact from Seshat codebook.

root cat

Major category of fact.

value

Value of data point.

variable

Variable of data point.

id

Request id for openai batch requests.

description

Description provided by RAs for fact.
Synthetic Customer Churn Prediction Dataset
opendatabay.com
.undefined
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Customer Churn Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/5d7ef013-5848-4367-bf3b-2ce359587b43
Explore at:
.undefinedAvailable download formats
Dataset updated
May 6, 2025
Dataset provided by
Buy & Sell Data | Opendatabay - AI & Synthetic Data Marketplace
Authors
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Retail & Consumer Behavior
Description
This Synthetic Customer Churn Prediction Dataset has been designed as an educational resource for exploring data science, machine learning, and predictive modelling techniques in a customer retention context. The dataset simulates key attributes relevant to customer churn analysis, such as service usage, contract details, and customer demographics. It allows users to practice data manipulation, visualization, and the development of models to predict churn behaviour in industries like telecommunications, subscription services, or utilities.

Dataset Features:

Customer_Id: Unique identifier for each customer (not included in this dataset for privacy).

Gender: Gender of the customer (e.g., "Male," "Female").

Partner: Whether the customer has a partner (e.g., "Yes," "No").

Dependents: Whether the customer has dependents (e.g., "Yes," "No").

Tenure (Months): The number of months the customer has been with the company.

PhoneService: Whether the customer has a phone service (e.g., "Yes," "No").

MultipleLines: Whether the customer has multiple phone lines (e.g., "Yes," "No phone service").

InternetService: Type of internet service (e.g., "DSL," "Fiber optic," "No").

OnlineSecurity: Whether the customer has online security services (e.g., "Yes," "No," "No internet service").

OnlineBackup: Whether the customer has online backup services (e.g., "Yes," "No," "No internet service").

DeviceProtection: Whether the customer has device protection services (e.g., "Yes," "No," "No internet service").

TechSupport: Whether the customer has tech support services (e.g., "Yes," "No," "No internet service").

StreamingTV: Whether the customer has streaming TV services (e.g., "Yes," "No," "No internet service").

StreamingMovies: Whether the customer has streaming movies services (e.g., "Yes," "No," "No internet service").

Contract: Type of contract the customer has (e.g., "Month-to-month," "One year," "Two year").

PaperlessBilling: Whether the customer uses paperless billing (e.g., "Yes," "No").

PaymentMethod: The payment method used by the customer (e.g., "Electronic check," "Credit card," "Bank transfer").

MonthlyCharges: Monthly charges billed to the customer.

TotalCharges: Total charges incurred by the customer over their tenure.

Churn: Whether the customer has churned (e.g., "Yes," "No").

Distribution:

https://storage.googleapis.com/opendatabay_public/images/churn_c4aae9d4-3939-4866-a249-35d81c5965dc.png" alt="Synthetic Customer Churn Prediction Dataset Distribution">

Usage:

This dataset is useful for a variety of applications, including:

Customer Behavior Analysis: To understand factors influencing customer retention and churn.

Educational Training: To practice data cleaning, feature engineering, and visualization techniques in customer analytics.

Predictive Modeling: To build machine learning models for predicting customer churn based on service usage patterns and demographic information.

Coverage:

This dataset is synthetic and anonymized, making it a safe tool for experimentation and learning without compromising real patient privacy.

License:

CCO (Public Domain)

Who can use it:

Data scientists and enthusiasts: For developing customer analytics skills and predictive modelling expertise.

Business analysts: To understand customer churn drivers and improve retention strategies.

Educators and students: For teaching and learning applications in data science and machine learning.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Daily active users of DeepSeek 2025 [Dataset]. https://www.statista.com/statistics/1561128/deepseek-daily-active-users/

Daily active users of DeepSeek 2025

Explore at:

Dataset updated

Mar 11, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Jan 11, 2025 - Feb 15, 2025

Area covered

China

Description

As of mid-February 2025, the Chinese AI chatbot DeepSeek had around 47 million daily active users. When DeepSeek released its research paper illustrating the capabilities of their chatbot, a global audience became aware of the company. As a result, the number of daily active users skyrocketed.

Clear search

Close search

Google apps

Main menu

Daily active users of DeepSeek 2025

Firms planned LLM model usage in commercial deployments worldwide 2024

Bitext-customer-support-llm-chatbot-training-dataset

Global ranking of LLM tools in 2023

ChatGPT Revenue and Usage Statistics (2025)

Global Large Language Model (LLM) Market Size By Component, By Application,...

Data from: Dataset used to generate Variability-Driven User-Story using LLM...

LLM prompts in the context of machine learning

llm_robot

Large Language Model (LLM) Report

DeepSeek AI Statistics and Facts (2025)

Introduction

lmsys-chat-1m

Bitext-retail-banking-llm-chatbot-training-dataset

Palantir Technologies Overview

Mobile On-Device LLM Market Research Report 2033

Mobile On-Device LLM Market Outlook

Model Type Analysis

Large Language Model (LLM) Market Size | CAGR of 33.7%

LLM market size in Japan FY 2024-2028

Large Language Model (LLM) Report

HiST-LLM

Large Language Models' Expert-level Global History Knowledge Benchmark (HiST-LLM)

Dataset links

Usage

Loading the dataset

Dataset metadata

Model Fingerprints

When model fingerprint are available we created extra columns for each model fingerprint. These columns are named via the following pattern .

Column Descriptions

additional_review

Q

A

polity old id

start year str

end year str

start year int

end year int

name

nga

world_region

category

root cat

value

variable

id

description

Synthetic Customer Churn Prediction Dataset

Dataset Features:

Distribution:

Usage:

Coverage:

License:

Who can use it:

Daily active users of DeepSeek 2025