100+ datasets found
  1. Daily active users of DeepSeek 2025

    • statista.com
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Daily active users of DeepSeek 2025 [Dataset]. https://www.statista.com/statistics/1561128/deepseek-daily-active-users/
    Explore at:
    Dataset updated
    Mar 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 11, 2025 - Feb 15, 2025
    Area covered
    China
    Description

    As of mid-February 2025, the Chinese AI chatbot DeepSeek had around 47 million daily active users. When DeepSeek released its research paper illustrating the capabilities of their chatbot, a global audience became aware of the company. As a result, the number of daily active users skyrocketed.

  2. Firms planned LLM model usage in commercial deployments worldwide 2024

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Firms planned LLM model usage in commercial deployments worldwide 2024 [Dataset]. https://www.statista.com/statistics/1485176/choice-of-llm-models-for-commercial-deployment-global/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    As of 2024, over **** the global firms planned to use LLMs (LLama and LLama-like models), while ** percent chose to use embedding models (BERT and family) in their commercial deployment. Additionally, only ***** percent planned to utilize multi-modal models.

  3. h

    Bitext-customer-support-llm-chatbot-training-dataset

    • huggingface.co
    • opendatalab.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext, Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.

  4. Global ranking of LLM tools in 2023

    • statista.com
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global ranking of LLM tools in 2023 [Dataset]. https://www.statista.com/statistics/1458138/leading-llm-tools/
    Explore at:
    Dataset updated
    Jun 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Worldwide
    Description

    In 2023, Claude 3 Opus was the large language model (LLM) tool that had the largest average worldwide, with an average total of ***** percent. Close behind, in second place, was Gemini 1.5 Pro with an average of about ** percent.

  5. b

    ChatGPT Revenue and Usage Statistics (2025)

    • businessofapps.com
    Updated Feb 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Business of Apps (2023). ChatGPT Revenue and Usage Statistics (2025) [Dataset]. https://www.businessofapps.com/data/chatgpt-statistics/
    Explore at:
    Dataset updated
    Feb 9, 2023
    Dataset authored and provided by
    Business of Apps
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    ChatGPT has taken the world by storm, setting a record for the fastest app to reach a 100 million users, which it hit in two months. The implications of this tool are far-reaching, universities...

  6. v

    Global Large Language Model (LLM) Market Size By Component, By Application,...

    • verifiedmarketresearch.com
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Large Language Model (LLM) Market Size By Component, By Application, By Deployment Mode, By Organization Size, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/large-language-model-llm-market/
    Explore at:
    Dataset updated
    Jul 25, 2024
    Dataset authored and provided by
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Large Language Model (LLM) Market size was valued at USD 4.6 Billion in 2023 and is projected to reach USD 64.9 Billion by 2031, growing at a CAGR of 32.1% during the forecast period 2024-2031.

    Global Large Language Model (LLM) Market Drivers

    The market drivers for the Large Language Model (LLM) Market can be influenced by various factors. These may include:

    Advancements in AI and Machine Learning: Continuous improvements in AI algorithms and machine learning techniques are pushing the capabilities of (LLM), making them more attractive for a variety of applications.

    Increasing Demand for Automation: Businesses and industries are increasingly seeking automation solutions for customer service, content creation, and data analysis, which drives the demand for (LLM).

    Rising Investments in AI: There is a significant influx of investments from both private and public sectors in AI research and development, fostering the growth of the (LLM) market.

    Expanding Application Areas: (LLM) are being applied in a wider range of fields such as healthcare, finance, legal, and education, which broadens their market scope.

    Enhanced Computing Power: Improvements in computing infrastructure, including the advent of advanced GPUs and cloud computing services, are making it feasible to train and deploy large language models more efficiently.

    Growing Digital Transformation Initiatives: Companies undergoing digital transformation are adopting (LLM) to leverage their capabilities in natural language understanding and generation for improved business processes.

    Increased Availability of Data: The abundance of text data from the internet and other sources provides the necessary training material for developing more sophisticated (LLM).

    Consumer Demand for Better User Experiences: There is a growing expectation for intuitive and responsive user interfaces enabled by (LLM), particularly in applications like virtual assistants and catboats.

    Developments in Natural Language Processing: Progress in natural language processing (NLP) techniques contributes to more effective and efficient (LLM), enhancing their practical utility and market value.

    Regulatory and Compliance Requirements: Certain industries are leveraging (LLM) to ensure compliance with legal and regulatory standards by automating documentation and reporting tasks.

  7. C

    Data from: Dataset used to generate Variability-Driven User-Story using LLM...

    • dataverse.cirad.fr
    application/x-gzip +1
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marianne Huchard; Alain Gutierrez; Zhang Yulin (Huaxi); Zhang Yulin (Huaxi); Alexandre Bazin; Alexandre Bazin; Pierre Martin; Marianne Huchard; Alain Gutierrez; Pierre Martin (2025). Dataset used to generate Variability-Driven User-Story using LLM and Triadic Concept Analysis [Dataset]. http://doi.org/10.18167/DVN1/GNJMAV
    Explore at:
    text/markdown(226), application/x-gzip(33538), application/x-gzip(29961), application/x-gzip(23669), text/markdown(2072), application/x-gzip(50676), application/x-gzip(1061)Available download formats
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    CIRAD Dataverse
    Authors
    Marianne Huchard; Alain Gutierrez; Zhang Yulin (Huaxi); Zhang Yulin (Huaxi); Alexandre Bazin; Alexandre Bazin; Pierre Martin; Marianne Huchard; Alain Gutierrez; Pierre Martin
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    France, Herault, Montpellier
    Description

    The diffusion of knowledge, extracted from the Knomana knowledge base (Silvie et al., 2021) or inferred using Knomana, requires to develop a specific application for each type of final user (e.g. researcher vs. farmer). This involves to develop a family of similar applications, which can be done thanks to the paradigm of software product line. A challenge is to identify, analyze and structure the functionalities required by each user before the development of such applications. The considered solution is to use website presenting similar functionalities to design the ones of Knomana. The solution evaluated consists in using an LLM.

  8. LLM prompts in the context of machine learning

    • kaggle.com
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan Nelson (2024). LLM prompts in the context of machine learning [Dataset]. https://www.kaggle.com/datasets/jordanln/llm-prompts-in-the-context-of-machine-learning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2024
    Dataset provided by
    Kaggle
    Authors
    Jordan Nelson
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is an extension of my previous work on creating a dataset for natural language processing tasks. It leverages binary representation to characterise various machine learning models. The attributes in the dataset are derived from a dictionary, which was constructed from a corpus of prompts typically provided to a large language model (LLM). These prompts reference specific machine learning algorithms and their implementations. For instance, consider a user asking an LLM or a generative AI to create a Multi-Layer Perceptron (MLP) model for a particular application. By applying this concept to multiple machine learning models, we constructed our corpus. This corpus was then transformed into the current dataset using a bag-of-words approach. In this dataset, each attribute corresponds to a word from our dictionary, represented as a binary value: 1 indicates the presence of the word in a given prompt, and 0 indicates its absence. At the end of each entry, there is a label. Each entry in the dataset pertains to a single class, where each class represents a distinct machine learning model or algorithm. This dataset is intended for multi-class classification tasks, not multi-label classification, as each entry is associated with only one label and does not belong to multiple labels simultaneously. This dataset has been utilised with a Convolutional Neural Network (CNN) using the Keras Automodel API, achieving impressive training and testing accuracy rates exceeding 97%. Post-training, the model's predictive performance was rigorously evaluated in a production environment, where it continued to demonstrate exceptional accuracy. For this evaluation, we employed a series of questions, which are listed below. These questions were intentionally designed to be similar to ensure that the model can effectively distinguish between different machine learning models, even when the prompts are closely related.

    KNN How would you create a KNN model to classify emails as spam or not spam based on their content and metadata? How could you implement a KNN model to classify handwritten digits using the MNIST dataset? How would you use a KNN approach to build a recommendation system for suggesting movies to users based on their ratings and preferences? How could you employ a KNN algorithm to predict the price of a house based on features such as its location, size, and number of bedrooms etc? Can you create a KNN model for classifying different species of flowers based on their petal length, petal width, sepal length, and sepal width? How would you utilise a KNN model to predict the sentiment (positive, negative, or neutral) of text reviews or comments? Can you create a KNN model for me that could be used in malware classification? Can you make me a KNN model that can detect a network intrusion when looking at encrypted network traffic? Can you make a KNN model that would predict the stock price of a given stock for the next week? Can you create a KNN model that could be used to detect malware when using a dataset relating to certain permissions a piece of software may have access to?

    Decision Tree Can you describe the steps involved in building a decision tree model to classify medical images as malignant or benign for cancer diagnosis and return a model for me? How can you utilise a decision tree approach to develop a model for classifying news articles into different categories (e.g., politics, sports, entertainment) based on their textual content? What approach would you take to create a decision tree model for recommending personalised university courses to students based on their academic strengths and weaknesses? Can you describe how to create a decision tree model for identifying potential fraud in financial transactions based on transaction history, user behaviour, and other relevant data? In what ways might you apply a decision tree model to classify customer complaints into different categories determining the severity of language used? Can you create a decision tree classifier for me? Can you make me a decision tree model that will help me determine the best course of action across a given set of strategies? Can you create a decision tree model for me that can recommend certain cars to customers based on their preferences and budget? How can you make a decision tree model that will predict the movement of star constellations in the sky based on data provided by the NASA website? How do I create a decision tree for time-series forecasting?

    Random Forest Can you describe the steps involved in building a random forest model to classify different types of anomalies in network traffic data for cybersecurity purposes and return the code for me? In what ways could you implement a random forest model to predict the severity of traffic congestion in urban areas based on historical traffic patterns, weather...

  9. h

    llm_robot

    • huggingface.co
    Updated May 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aryaduta (2024). llm_robot [Dataset]. https://huggingface.co/datasets/Aryaduta/llm_robot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 5, 2024
    Authors
    Aryaduta
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Robotic Plan Generation

    This dataset is for training LLM for robotic plan generation.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    The aim is to provide dataset that contain context (in this case, arm robot is used as example and 2 object is manipulated) and user goal. The output should be json string containing high level function that will be executed by the robot.

      Dataset Structure
    
    
    
    
    
    
    
      Data Instances
    

    A JSON-formatted example… See the full description on the dataset page: https://huggingface.co/datasets/Aryaduta/llm_robot.

  10. L

    Large Language Model (LLM) Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Large Language Model (LLM) Report [Dataset]. https://www.marketresearchforecast.com/reports/large-language-model-llm-38890
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Mar 18, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Large Language Model (LLM) market is experiencing explosive growth, projected to reach a substantial size driven by advancements in artificial intelligence and increasing demand across diverse sectors. The market's compound annual growth rate (CAGR) of 34.5% from 2019 to 2024 indicates a rapid expansion, and this momentum is expected to continue through 2033. The 2024 market size of $11.38 billion (assuming the provided "11380" refers to billions of dollars) underscores the significant investment and adoption of LLMs. Key drivers include the increasing availability of large datasets for training, advancements in deep learning algorithms, and the growing need for sophisticated natural language processing capabilities across various applications. The market segmentation highlights the diverse applications of LLMs, with Medical, Financial, and Industrial sectors being prominent early adopters. The availability of LLMs with varying parameter counts ("Hundreds of Billions" and "Trillions") reflects the spectrum of capabilities and corresponding resource requirements, influencing the market's pricing and target user base. The presence of major technology companies like Google, Microsoft, Amazon, and Meta further solidifies the market's significance and competitive landscape. The rapid adoption of LLMs is further fueled by ongoing research and development, leading to improvements in model accuracy, efficiency, and accessibility. While the specific constraints are not provided, potential challenges could include the ethical implications of LLMs, concerns regarding data privacy and security, and the ongoing need for robust infrastructure to support computationally intensive model training and deployment. Geographical distribution shows a strong presence in North America and Asia Pacific, with Europe and other regions exhibiting significant growth potential. The forecast period (2025-2033) offers substantial opportunity for continued market expansion, particularly as LLMs become more integrated into everyday applications and services, transforming various industries. The diverse range of companies involved reflects the significant interest and investment in this transformative technology, promising further innovation and market expansion.

  11. C

    DeepSeek AI Statistics and Facts (2025)

    • coolest-gadgets.com
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coolest Gadgets (2025). DeepSeek AI Statistics and Facts (2025) [Dataset]. https://coolest-gadgets.com/deepseek-ai-statistics/
    Explore at:
    Dataset updated
    Jan 29, 2025
    Dataset authored and provided by
    Coolest Gadgets
    License

    https://coolest-gadgets.com/privacy-policyhttps://coolest-gadgets.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    DeepSeek AI Statistics: DeepSeek AI, founded by Liang Wenfeng in May 2023, has quickly emerged as a significant competitor in the global artificial intelligence market, particularly recognized for its cost-effective and large-scale models. Despite the strong presence of U.S.-based companies like OpenAI, DeepSeek made a notable entry into the international arena in January 2025. The company benefits from unique funding provided by High-Flyer, a quantitative hedge fund also established by Wenfeng. This support allows DeepSeek to focus on long-term projects without the influence of external investors.

    The core team at DeepSeek is composed of young and talented graduates from top Chinese universities, providing a fresh perspective and a deep understanding of AI development. The company prioritizes technical skills over traditional experience in its hiring practices, fostering a culture of innovation and efficiency.

    DeepSeek has achieved significant milestones, including the release of the DeepSeek Coder in November 2023, an open-source model designed for coding tasks. Following this, they launched the DeepSeek LLM, which features 67 billion parameters. In May 2024, they unveiled DeepSeek-V2, a model that sparked a price competition in the Chinese AI market due to its affordability and impressive performance. The success of this model led major Chinese tech companies to lower their prices in order to remain competitive.

    Introducing DeepSeek LLM

    (Source: github.com/deepseek-ai/DeepSeek-LLM)

    The more advanced DeepSeek-Coder-V2 has been introduced, boasting 236 billion parameters and a context length capacity of up to 128,000 tokens. This model is available via an API, priced at USD 0.14 per million input tokens and USD 0.28 per million output tokens. This pricing structure highlights the company's commitment to providing accessible and efficient AI solutions.

  12. h

    lmsys-chat-1m

    • huggingface.co
    Updated May 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aarush Sah (2024). lmsys-chat-1m [Dataset]. https://huggingface.co/datasets/AarushSah/lmsys-chat-1m
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2024
    Authors
    Aarush Sah
    Description

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

    This dataset contains one million real-world conversations with 25 state-of-the-art LLMs. It is collected from 210K unique IP addresses in the wild on the Vicuna demo and Chatbot Arena website from April to August 2023. Each sample includes a conversation ID, model name, conversation text in OpenAI API JSON format, detected language tag, and OpenAI moderation API tag. User consent is obtained through the "Terms of use"… See the full description on the dataset page: https://huggingface.co/datasets/AarushSah/lmsys-chat-1m.

  13. h

    Bitext-retail-banking-llm-chatbot-training-dataset

    • huggingface.co
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-retail-banking-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-retail-banking-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Retail Banking Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail Banking] sector can be easily achieved using our two-step approach to LLM Fine-Tuning.… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-banking-llm-chatbot-training-dataset.

  14. b

    Palantir Technologies Overview

    • bullfincher.io
    Updated May 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bullfincher (2025). Palantir Technologies Overview [Dataset]. https://bullfincher.io/companies/palantir-technologies/overview
    Explore at:
    Dataset updated
    May 31, 2025
    Dataset authored and provided by
    Bullfincher
    License

    https://bullfincher.io/privacy-policyhttps://bullfincher.io/privacy-policy

    Description

    Palantir Technologies Inc. builds and deploys software platforms for the intelligence community to assist in counterterrorism investigations and operations in the United States, the United Kingdom, and internationally. The company provides Palantir Gotham, a software platform which enables users to identify patterns hidden deep within datasets, ranging from signals intelligence sources to reports from confidential informants, as well as facilitates the handoff between analysts and operational users, helping operators plan and execute real-world responses to threats that have been identified within the platform. It also offers Palantir Foundry, a platform that transforms the ways organizations operate by creating a central operating system for their data; and allows individual users to integrate and analyze the data they need in one place. In addition, it provides Palantir Apollo, a software that delivers software and updates across the business, as well as enables customers to deploy their software virtually in any environment; and Palantir Artificial Intelligence Platform (AIP) that provides unified access to open-source, self-hosted, and commercial large language models (LLM) that can transform structured and unstructured data into LLM-understandable objects and can turn organizations' actions and processes into tools for humans and LLM-driven agents. The company was incorporated in 2003 and is headquartered in Denver, Colorado.

  15. Mobile On-Device LLM Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Mobile On-Device LLM Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/mobile-on-device-llm-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jun 29, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Mobile On-Device LLM Market Outlook



    According to our latest research, the global Mobile On-Device LLM market size was valued at USD 1.92 billion in 2024 and is poised to reach USD 16.8 billion by 2033, expanding at a robust CAGR of 27.4% during the forecast period. This remarkable growth trajectory is primarily driven by the increasing demand for real-time, privacy-preserving AI functionalities directly on mobile devices across various industries. The rapid evolution of mobile hardware, combined with advancements in compact and efficient large language models (LLMs), is enabling a new era of intelligent, on-device applications that do not rely on constant cloud connectivity.




    One of the most significant growth factors for the Mobile On-Device LLM market is the rising consumer and enterprise awareness regarding data privacy and security. As more sensitive personal and organizational data is processed on mobile devices, there is a heightened need to minimize data transmission to external servers. On-device LLMs address this concern by enabling advanced AI functionalities, such as natural language understanding, text generation, and personalization, to be performed locally. This approach not only enhances privacy but also reduces latency, leading to faster and more reliable user experiences. The proliferation of privacy regulations worldwide, such as GDPR and CCPA, is further accelerating the adoption of on-device AI solutions in smartphones, wearables, and IoT devices.




    Another key driver is the rapid technological advancement in mobile chipsets and memory architectures, which now support the deployment of increasingly sophisticated language models. The integration of AI accelerators and NPUs (Neural Processing Units) in modern mobile devices has made it feasible to run small and medium-scale LLMs efficiently without significant battery drain. As a result, device manufacturers and software developers are leveraging these capabilities to offer innovative applications, including virtual assistants, real-time translation, and context-aware recommendations. The competitive landscape among device OEMs is fostering continuous innovation, with leading brands racing to differentiate their products through advanced on-device AI features powered by LLMs.




    The expanding ecosystem of AI development frameworks and toolkits tailored for mobile environments is also fueling market growth. Open-source initiatives and collaborations between semiconductor companies and AI research organizations have led to the optimization of LLM architectures for resource-constrained devices. This democratization of technology is lowering the entry barriers for app developers and enterprises, enabling a broader range of applications and services to harness the power of on-device language models. The growing developer community, coupled with increasing investments in AI research, is expected to further accelerate the adoption and innovation in the Mobile On-Device LLM market over the next decade.




    From a regional perspective, North America currently dominates the Mobile On-Device LLM market, accounting for over 36% of global revenue in 2024, followed closely by Asia Pacific and Europe. The high penetration of advanced smartphones, robust digital infrastructure, and early adoption of AI technologies contribute to North America's leadership. However, Asia Pacific is expected to witness the fastest growth, with a projected CAGR of 29.1% through 2033, driven by the sheer volume of mobile users, increasing investments in AI-driven innovation, and the rapid expansion of 5G networks. Europe remains a significant market, propelled by stringent data privacy regulations and a strong focus on secure, user-centric AI solutions.





    Model Type Analysis



    The Model Type segment of the Mobile On-Device LLM market is categorized into Small Language Models, Medium Language Models, and Large Language Models. Small Language Models have gained significant traction due to their ability t

  16. m

    Large Language Model (LLM) Market Size | CAGR of 33.7%

    • market.us
    csv, pdf
    Updated Mar 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.us (2025). Large Language Model (LLM) Market Size | CAGR of 33.7% [Dataset]. https://market.us/report/large-language-model-llm-market/
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Mar 21, 2025
    Dataset provided by
    Market.us
    License

    https://market.us/privacy-policy/https://market.us/privacy-policy/

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    The Large Language Model (LLM) Market is estimated to reach USD 82.1 Billion by 2033, riding on a strong 33.7% CAGR.

  17. LLM market size in Japan FY 2024-2028

    • statista.com
    • ai-chatbox.pro
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). LLM market size in Japan FY 2024-2028 [Dataset]. https://www.statista.com/statistics/1550077/japan-large-language-model-market-size/
    Explore at:
    Dataset updated
    Jun 6, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Japan
    Description

    The value of the large language model (LLM) market in Japan was projected to reach ** billion Japanese yen in fiscal year 2024. Partly based on the assumption that the market would diversify with the release of specialized and cheaper LLMs from fiscal year 2025 onward, the market size was forecast to more than quadruple by fiscal year 2028.

  18. L

    Large Language Model (LLM) Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Large Language Model (LLM) Report [Dataset]. https://www.marketreportanalytics.com/reports/large-language-model-llm-52544
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Large Language Model (LLM) market is experiencing explosive growth, driven by advancements in artificial intelligence and the increasing demand for sophisticated natural language processing capabilities across various sectors. While precise market sizing for 2025 requires proprietary data, leveraging publicly available reports and industry analyses, we can estimate a 2025 market value of approximately $20 billion, projecting a Compound Annual Growth Rate (CAGR) of 35% from 2025 to 2033. This robust growth is fueled by several key factors. The proliferation of cloud computing services provides the necessary infrastructure for LLM development and deployment. Furthermore, the rising adoption of LLMs in diverse applications, including customer service chatbots, content generation, language translation, and code development, is significantly contributing to market expansion. The trend toward personalized user experiences and the growing need for efficient data analysis further bolster market demand. However, challenges remain, including concerns about data privacy, ethical considerations surrounding AI bias, and the high computational costs associated with training and deploying large language models. These restraints are expected to moderate growth but not stifle the overall upward trajectory of the market. Segment analysis reveals significant opportunities within specific application areas. The most prominent segments include customer service (driven by automation needs), content creation (leveraging automated writing and editing tools), and software development (utilizing LLMs for code generation and debugging). Similarly, segmentation by type reveals a strong preference for cloud-based LLMs due to their scalability and accessibility, while on-premise deployments remain relevant for organizations with stringent data security requirements. Geographically, North America and Europe currently hold the largest market share, driven by early adoption and a robust technological infrastructure. However, the Asia-Pacific region is poised for rapid growth, particularly in countries like China and India, due to their large populations and rapidly expanding digital economies. The competitive landscape is dynamic, with major technology companies leading the development and deployment of LLMs, alongside numerous startups offering specialized solutions. Over the forecast period, consolidation and strategic partnerships are anticipated, reshaping the competitive dynamics and market structure.

  19. HiST-LLM

    • zenodo.org
    bin, json
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jakob Elias Hauser; Jakob Elias Hauser (2025). HiST-LLM [Dataset]. http://doi.org/10.5281/zenodo.14671248
    Explore at:
    bin, jsonAvailable download formats
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jakob Elias Hauser; Jakob Elias Hauser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large Language Models' Expert-level Global History Knowledge Benchmark (HiST-LLM)

    Large Language Models (LLMs) have the potential to transform humanities and social science research, yet their history knowledge and comprehension at a graduate level remains untested. Benchmarking LLMs in history is particularly challenging, given that human knowledge of history is inherently unbalanced, with more information available on Western history and recent periods. We introduce the History Seshat Test for LLMs (Hist-LLM), based on a subset of the Seshat Global History Databank, which provides a structured representation of human historical knowledge, containing 36,000 data points across 600 historical societies and over 2,700 scholarly references. This dataset covers every major world region from the Neolithic period to the Industrial Revolution and includes information reviewed and assembled by history experts and graduate research assistants. Using this dataset, we benchmark a total of seven models from the Gemini, OpenAI, and Llama families. We find that, in a four-choice format, LLMs have a balanced accuracy ranging from 33.6% (Llama-3.1-8B) to 46% (GPT-4-Turbo), outperforming random guessing (25%) but falling short of expert comprehension. LLMs perform better on earlier historical periods. Regionally, performance is more even but still better for the Americas and lowest in Oceania and Sub-Saharan Africa for the more advanced models. Our benchmark shows that while LLMs possess some expert-level historical knowledge, there is considerable room for improvement.

    Dataset links

    Dataset Repository (Github)

    Croissant Metadata (Github)

    Usage

    This dataset can be used to benchmark LLMs on their expert level history knowledge.

    Loading the dataset

    using Python and Pandas:

    import pandas as pd
    main = pd.read_parquet("Neurips_HiST-LLM.parquet")
    ref = pd.read_parquet("references.parquet") 

    Dataset metadata

    Dataset metadata documented in the croissant.json file.

    Model Fingerprints

    When model fingerprint are available we created extra columns for each model fingerprint. These columns are named via the following pattern .

    Column Descriptions

    additional_review

    Boolean This column describes whether datapoints underwent additional expert review. See section 3.2 of the Paper.

    Q

    The multiple choice question.

    A

    The expected completion of the prompt.

    polity old id

    ID for polity according to Seshat ids.

    start year str

    String for when polity started existing (in BCE/CE format).

    end year str

    String for when polity stopped existing (in BCE/CE format).

    start year int

    Int for when polity started existing (in BCE/CE format).

    end year int

    Int for when polity stopped existing (in BCE/CE format).

    name

    Polity name.

    nga

    Natural Geographic Area for Polity.

    world_region

    The world region of a NGA (based on the UN regions with some modifications)

    category

    Immediate parent category of fact from Seshat codebook.

    root cat

    Major category of fact.

    value

    Value of data point.

    variable

    Variable of data point.

    id

    Request id for openai batch requests.

    description

    Description provided by RAs for fact.

  20. Synthetic Customer Churn Prediction Dataset

    • opendatabay.com
    .undefined
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Customer Churn Prediction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/5d7ef013-5848-4367-bf3b-2ce359587b43
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    Buy & Sell Data | Opendatabay - AI & Synthetic Data Marketplace
    Authors
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Retail & Consumer Behavior
    Description

    This Synthetic Customer Churn Prediction Dataset has been designed as an educational resource for exploring data science, machine learning, and predictive modelling techniques in a customer retention context. The dataset simulates key attributes relevant to customer churn analysis, such as service usage, contract details, and customer demographics. It allows users to practice data manipulation, visualization, and the development of models to predict churn behaviour in industries like telecommunications, subscription services, or utilities.

    Dataset Features:

    • Customer_Id: Unique identifier for each customer (not included in this dataset for privacy).
    • Gender: Gender of the customer (e.g., "Male," "Female").
    • Partner: Whether the customer has a partner (e.g., "Yes," "No").
    • Dependents: Whether the customer has dependents (e.g., "Yes," "No").
    • Tenure (Months): The number of months the customer has been with the company.
    • PhoneService: Whether the customer has a phone service (e.g., "Yes," "No").
    • MultipleLines: Whether the customer has multiple phone lines (e.g., "Yes," "No phone service").
    • InternetService: Type of internet service (e.g., "DSL," "Fiber optic," "No").
    • OnlineSecurity: Whether the customer has online security services (e.g., "Yes," "No," "No internet service").
    • OnlineBackup: Whether the customer has online backup services (e.g., "Yes," "No," "No internet service").
    • DeviceProtection: Whether the customer has device protection services (e.g., "Yes," "No," "No internet service").
    • TechSupport: Whether the customer has tech support services (e.g., "Yes," "No," "No internet service").
    • StreamingTV: Whether the customer has streaming TV services (e.g., "Yes," "No," "No internet service").
    • StreamingMovies: Whether the customer has streaming movies services (e.g., "Yes," "No," "No internet service").
    • Contract: Type of contract the customer has (e.g., "Month-to-month," "One year," "Two year").
    • PaperlessBilling: Whether the customer uses paperless billing (e.g., "Yes," "No").
    • PaymentMethod: The payment method used by the customer (e.g., "Electronic check," "Credit card," "Bank transfer").
    • MonthlyCharges: Monthly charges billed to the customer.
    • TotalCharges: Total charges incurred by the customer over their tenure.
    • Churn: Whether the customer has churned (e.g., "Yes," "No").

    Distribution:

    https://storage.googleapis.com/opendatabay_public/images/churn_c4aae9d4-3939-4866-a249-35d81c5965dc.png" alt="Synthetic Customer Churn Prediction Dataset Distribution">

    Usage:

    This dataset is useful for a variety of applications, including:

    • Customer Behavior Analysis: To understand factors influencing customer retention and churn.
    • Educational Training: To practice data cleaning, feature engineering, and visualization techniques in customer analytics.
    • Predictive Modeling: To build machine learning models for predicting customer churn based on service usage patterns and demographic information.

    Coverage:

    This dataset is synthetic and anonymized, making it a safe tool for experimentation and learning without compromising real patient privacy.

    License:

    CCO (Public Domain)

    Who can use it:

    • Data scientists and enthusiasts: For developing customer analytics skills and predictive modelling expertise.
    • Business analysts: To understand customer churn drivers and improve retention strategies.
    • Educators and students: For teaching and learning applications in data science and machine learning.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Daily active users of DeepSeek 2025 [Dataset]. https://www.statista.com/statistics/1561128/deepseek-daily-active-users/
Organization logo

Daily active users of DeepSeek 2025

Explore at:
Dataset updated
Mar 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 11, 2025 - Feb 15, 2025
Area covered
China
Description

As of mid-February 2025, the Chinese AI chatbot DeepSeek had around 47 million daily active users. When DeepSeek released its research paper illustrating the capabilities of their chatbot, a global audience became aware of the company. As a result, the number of daily active users skyrocketed.

Search
Clear search
Close search
Google apps
Main menu