53 datasets found
  1. Machine learning software market share worldwide 2021

    • statista.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Machine learning software market share worldwide 2021 [Dataset]. https://www.statista.com/statistics/1258541/machine-learning-market-share-technology-worldwide/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2021
    Area covered
    Worldwide
    Description

    Newsle led the global machine learning industry in 2021 with a market share of ***** percent, followed by TensorFlow and Torch. The source indicates that machine learning software is utilized for the application of artificial intelligence (AI) that allows systems the ability to automatically or "artificially" learn and improve functions based on experience without being specifically programmed to do so.

  2. Use case frequency of machine learning and artificial intelligence 2020-2021...

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Use case frequency of machine learning and artificial intelligence 2020-2021 [Dataset]. https://www.statista.com/statistics/1111204/machine-learning-use-case-frequency/
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2020
    Area covered
    Worldwide
    Description

    In 2021, with ** percent, improving customer experience represents the top artificial intelligence and machine learning use cases. The deployment of machine learning and artificial intelligence can advance a variety of business processes.

  3. State of AI/machine learning in hospitals in the U.S. in 2021

    • statista.com
    Updated Jun 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2021). State of AI/machine learning in hospitals in the U.S. in 2021 [Dataset]. https://www.statista.com/statistics/1249788/ai-machine-learning-in-hospitals-in-the-us/
    Explore at:
    Dataset updated
    Jun 23, 2021
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2021 - Apr 2021
    Area covered
    United States
    Description

    According to a survey conducted among healthcare providers in the United States in April 2021, ** percent of respondents reported that in their hospital or health systems artificial intelligence (AI)/machine learning efforts were in the pilot stage and the rollout was to be decided, while a further ** percent said that it is in the early stage initiatives.

  4. o

    Data from: Covid-19 and AI: Unexpected Challenges and Lessons

    • explore.openaire.eu
    Updated Jan 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Guedj (2021). Covid-19 and AI: Unexpected Challenges and Lessons [Dataset]. https://explore.openaire.eu/search/other?orpId=od_165::81d628d490820c4e7b17a3f0eeca78d6
    Explore at:
    Dataset updated
    Jan 1, 2021
    Authors
    Benjamin Guedj
    Description

    On May 21st, 2021, we held the webinar "Covid-19 and AI: unexpected challenges and lessons". This short note presents its highlights.

  5. Machine learning challenges in companies 2018-2021

    • statista.com
    Updated Apr 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Machine learning challenges in companies 2018-2021 [Dataset]. https://www.statista.com/statistics/1111249/machine-learning-challenges/
    Explore at:
    Dataset updated
    Apr 6, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2020
    Area covered
    Worldwide
    Description

    According to a recent survey, 56 percent of respondents state experiencing issues with security and auditability requirements when deploying machine learning and artificial intelligence in 2021. Auditability is the degree to which transaction from the originator to the approver and final disposition can be traced.

  6. AI/machine learning medical device market worldwide 2021-2032

    • statista.com
    • ai-chatbox.pro
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). AI/machine learning medical device market worldwide 2021-2032 [Dataset]. https://www.statista.com/statistics/1419774/ai-machine-learning-medical-device-market/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2021
    Area covered
    Worldwide
    Description

    In 2021, the AI and machine learning medical device market was valued at around *** billion U.S. dollars globally. By 2032, the market was forecast to increase to a value of **** billion U.S. dollars.

  7. California STD Statistics (2001-2021)

    • kaggle.com
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mir Tahmid (2024). California STD Statistics (2001-2021) [Dataset]. https://www.kaggle.com/datasets/tahmidmir/stds-in-california
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Kaggle
    Authors
    Mir Tahmid
    Area covered
    California
    Description

    The dataset "California STD Statistics (2001-2021).csv" contains information about reported cases of sexually transmitted diseases (STDs) (chlamydia, gonorrhea, and early syphilis, which includes primary, secondary, and early latent syphilis) across different counties in the United States from the year 2001 to 2021. The data includes details on the number of cases, population estimates, and calculated rates of infection. It is segmented by disease type, county, year, and sex, providing a comprehensive overview of STD prevalence and trends over a 20-year period.

    Column Descriptions

    Disease: The type of sexually transmitted disease (e.g., Chlamydia, Gonorrhea).

    County: The name of the county where the data was collected.

    Year: The year when the data was recorded.

    Sex: The sex of the population (Female, Male, Total).

    Cases: The number of reported cases of the disease.

    Population: The estimated population of the county for the given year and sex.

    Rate: The rate of infection per 100,000 people.

    Lower 95% CI: The lower bound of the 95% confidence interval for the rate.

    Upper 95% CI: The upper bound of the 95% confidence interval for the rate.

    Annotation Code: Additional annotation codes that are sparsely populated.

    Acknowledgement: All rights reserved by CalHHS

    https://data.chhs.ca.gov/pages/terms

    Usage: CalHHS Open Data Portal Terms of Use

    License: CalHHS reserves all rights and terms to use this data

    you will find it here on those links

    https://data.chhs.ca.gov/pages/terms

    https://data.chhs.ca.gov/dataset/stds-in-california-by-disease-county-year-and-sex

    LAST MODIFIED: June 4, 2024.

  8. F

    AI in HIV/AIDS Market By Tools (Machine Learning, Natural Language...

    • fnfresearch.com
    pdf
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Facts and Factors (2025). AI in HIV/AIDS Market By Tools (Machine Learning, Natural Language Processing, Image Processing, Speech Recognition, And Others), By Data (Patient Data, Insurance Information, And Hospital Stay), By Types Of Algorithms (Artificial Neural Networks, Deep Learning, And Others): Global and Regional Industry Perspective, Comprehensive Analysis, and Forecast 2021 – 2026 [Dataset]. https://www.fnfresearch.com/ai-in-hivaids-market
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset authored and provided by
    Facts and Factors
    License

    https://www.fnfresearch.com/privacy-policyhttps://www.fnfresearch.com/privacy-policy

    Time period covered
    2022 - 2030
    Area covered
    Global
    Description

    [197+ Pages Report] Global AI in HIV/AIDS market size & share expected to reach revenue of USD 400.7 Million by 2026, with a CAGR of 8.9% during the projected period. AI has been transforming the landscape of technology breakthroughs with its impact being felt across several sectors.

  9. Data archive for paper "Copula-based synthetic data augmentation for...

    • zenodo.org
    zip
    Updated Mar 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Meyer; David Meyer (2022). Data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators" [Dataset]. http://doi.org/10.5281/zenodo.5150327
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 15, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Meyer; David Meyer
    Description

    Overview

    This is the data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators". It contains the paper’s data archive with model outputs (see results folder) and the Singularity image for (optionally) re-running experiments.

    For the Python tool used to generate synthetic data, please refer to Synthia.

    Requirements

    *Although PBS in not a strict requirement, it is required to run all helper scripts as included in this repository. Please note that depending on your specific system settings and resource availability, you may need to modify PBS parameters at the top of submit scripts stored in the hpc directory (e.g. #PBS -lwalltime=72:00:00).

    Usage

    To reproduce the results from the experiments described in the paper, first fit all copula models to the reduced NWP-SAF dataset with:

    qsub hpc/fit.sh

    then, to generate synthetic data, run all machine learning model configurations, and compute the relevant statistics use:

    qsub hpc/stats.sh
    qsub hpc/ml_control.sh
    qsub hpc/ml_synth.sh

    Finally, to plot all artifacts included in the paper use:

    qsub hpc/plot.sh

    Licence

    Code released under MIT license. Data from the reduced NWP-SAF dataset released under CC BY 4.0.

  10. Types of data used by ML, DS, and AI developers worldwide 2021

    • statista.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Types of data used by ML, DS, and AI developers worldwide 2021 [Dataset]. https://www.statista.com/statistics/1241924/worldwide-software-developer-data-uses/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2020 - Feb 2021
    Area covered
    Worldwide
    Description

    According to the survey, ** percent of machine learning, data science, and artificial intelligence developers work with unstructured text data, which makes it the most popular type of data for developers. Tabular data is the second most popular type of data, with ** percent usage.

  11. NeurIPS 2021 - submission statistics

    • kaggle.com
    Updated Nov 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konrad Banachewicz (2021). NeurIPS 2021 - submission statistics [Dataset]. https://www.kaggle.com/konradb/neurips-2021-submission-statistics/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 22, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Konrad Banachewicz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
  12. m

    Ultimate_Analysis

    • data.mendeley.com
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akara Kijkarncharoensin (2022). Ultimate_Analysis [Dataset]. http://doi.org/10.17632/t8x96g88p3.2
    Explore at:
    Dataset updated
    Jan 28, 2022
    Authors
    Akara Kijkarncharoensin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This database studies the performance inconsistency on the biomass HHV ultimate analysis. The research null hypothesis is the consistency in the rank of a biomass HHV model. Fifteen biomass models are trained and tested in four datasets. In each dataset, the rank invariability of these 15 models indicates the performance consistency.

    The database includes the datasets and source codes to analyze the performance consistency of the biomass HHV. These datasets are stored in tabular on an excel workbook. The source codes are the biomass HHV machine learning model through the MATLAB Objected Orient Program (OOP). These machine learning models consist of eight regressions, four supervised learnings, and three neural networks.

    An excel workbook, "BiomassDataSetUltimate.xlsx," collects the research datasets in six worksheets. The first worksheet, "Ultimate," contains 908 HHV data from 20 pieces of literature. The names of the worksheet column indicate the elements of the ultimate analysis on a % dry basis. The HHV column refers to the higher heating value in MJ/kg. The following worksheet, "Full Residuals," backups the model testing's residuals based on the 20-fold cross-validations. The article (Kijkarncharoensin & Innet, 2021) verifies the performance consistency through these residuals. The other worksheets present the literature datasets implemented to train and test the model performance in many pieces of literature.

    A file named "SourceCodeUltimate.rar" collects the MATLAB machine learning models implemented in the article. The list of the folders in this file is the class structure of the machine learning models. These classes extend the features of the original MATLAB's Statistics and Machine Learning Toolbox to support, e.g., the k-fold cross-validation. The MATLAB script, name "runStudyUltimate.m," is the article's main program to analyze the performance consistency of the biomass HHV model through the ultimate analysis. The script instantly loads the datasets from the excel workbook and automatically fits the biomass model through the OOP classes.

    The first section of the MATLAB script generates the most accurate model by optimizing the model's higher parameters. It takes a few hours for the first run to train the machine learning model via the trial and error process. The trained models can be saved in MATLAB .mat file and loaded back to the MATLAB workspace. The remaining script, separated by the script section break, performs the residual analysis to inspect the performance consistency. Furthermore, the figure of the biomass data in the 3D scatter plot, and the box plots of the prediction residuals are exhibited. Finally, the interpretations of these results are examined in the author's article.

    Reference : Kijkarncharoensin, A., & Innet, S. (2022). Performance inconsistency of the Biomass Higher Heating Value (HHV) Models derived from Ultimate Analysis [Manuscript in preparation]. University of the Thai Chamber of Commerce.

  13. D

    Notable AI Models

    • epoch.ai
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Epoch AI, Notable AI Models [Dataset]. https://epoch.ai/data/notable-ai-models
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    Epoch AI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Global
    Variables measured
    https://epoch.ai/data/notable-ai-models-documentation#records
    Measurement technique
    https://epoch.ai/data/notable-ai-models-documentation#records
    Description

    Our most comprehensive database of AI models, containing over 800 models that are state of the art, highly cited, or otherwise historically notable. It tracks key factors driving machine learning progress and includes over 300 training compute estimates.

  14. Artificial Intelligence Market By Component (Hardware, Software, &...

    • fnfresearch.com
    pdf
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Facts and Factors (2025). Artificial Intelligence Market By Component (Hardware, Software, & Services), By Technology (Deep Learning, Machine Learning, Natural Language Processing, & Machine Vision), By Deployment (Cloud-Based& On-Premises), By End-Verticals (Healthcare, Retail, BSFI, Automotive, Advertising and Media, Manufacturing, Agricultural, & Others), And By Regions - Global Industry Perspective, COVID-19 Impact Analysis, Market Valuation, Business Strategies, Comprehensive Study, Latest Trends & Forecast 2021 - 2026 [Dataset]. https://www.fnfresearch.com/artificial-intelligence-market
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 21, 2025
    Dataset provided by
    Authors
    Facts and Factors
    License

    https://www.fnfresearch.com/privacy-policyhttps://www.fnfresearch.com/privacy-policy

    Time period covered
    2022 - 2030
    Area covered
    Global
    Description

    [219+ Pages Report] Global artificial intelligence market size & share projected a value of USD 299.64 Billionby 2026, and is growing at a CAGR value of 35.6% during 2021-2026.

  15. m

    Proximate_Analysis

    • data.mendeley.com
    Updated Jan 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akara Kijkarncharoensin (2022). Proximate_Analysis [Dataset]. http://doi.org/10.17632/g36dhg826s.2
    Explore at:
    Dataset updated
    Jan 25, 2022
    Authors
    Akara Kijkarncharoensin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This database studies the performance inconsistency on the biomass HHV proximate analysis. The research null hypothesis is the consistency in the rank of a biomass HHV model. Fifteen biomass models are trained and tested in four datasets. In each dataset, the rank invariability of these 15 models indicates the performance consistency.

    The database includes the datasets and source codes to analyze the performance consistency of the biomass HHV. These datasets are stored in tabular on an excel workbook. The source codes are the biomass HHV machine learning model through the MATLAB Objected Orient Program (OOP). These models consist of eight regressions, four supervised learnings, and three neural networks.

    An excel workbook, "BiomassDataSetProximate.xlsx," collects the research datasets in six worksheets. The first worksheet, "Proximate," contains 803 HHV data from 17 pieces of literature. The names of the worksheet column indicate the elements of the proximate analysis on a % dry basis. The HHV column refers to the higher heating value in MJ/kg. The following worksheet, "Full Residuals," backups the model testing's residuals based on the 20-fold cross-validations. The article verifies the performance consistency through these residuals. The other worksheets present the literature datasets implemented to train and test the model performance in many pieces of literature.

    A file named "SourceCodeProximate.rar" collects the MATLAB machine learning models implemented in the article. The list of the folders in this file is the class structure of the machine learning models. These classes extend the features of the original MATLAB's Statistics and Machine Learning Toolbox to support, e.g., the k-fold cross-validation. The MATLAB script, "runStudyProximate.m," is the article's main program (Kijkarncharoensin & Innet, 2021) to analyze the performance consistency of the biomass HHV model through the proximate analysis. The script instantly loads the datasets from the excel workbook and automatically fits the biomass model through the OOP classes.

    The first section of the MATLAB script generates the most accurate model by optimizing the model's higher parameters. It takes a few hours for the first run to train the machine learning model via the trial and error process. The trained models can be saved in MATLAB .mat file and loaded back to the MATLAB workspace. The remaining script, separated by the script section break, performs the residual analysis to inspect the performance consistency. Furthermore, the figure of the biomass data in the 3D scatter plot, and the box plots of the prediction residuals are exhibited. Finally, the interpretations of these results are examined in the author's article.

    Reference : Kijkarncharoensin, A., & Innet, S. (2021). Performance inconsistency of the Biomass Higher Heating Value (HHV) Models derived from Proximate Analysis [Manuscript in preparation]. University of the Thai Chamber of Commerce.

  16. f

    Available NUTS3 areas by country.

    • plos.figshare.com
    xls
    Updated Oct 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umberto Minora; Stefano Maria Iacus; Filipe Batista e Silva; Francesco Sermi; Spyridon Spyratos (2023). Available NUTS3 areas by country. [Dataset]. http://doi.org/10.1371/journal.pone.0287063.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Umberto Minora; Stefano Maria Iacus; Filipe Batista e Silva; Francesco Sermi; Spyridon Spyratos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The publication of tourism statistics often does not keep up with the highly dynamic tourism demand trends, especially critical during crises. Alternative data sources such as digital traces and web searches represent an important source to potentially fill this gap, since they are generally timely, and available at detailed spatial scale. In this study we explore the potential of human mobility data from the Google Community Mobility Reports to nowcast the number of monthly nights spent at sub-national scale across 11 European countries in 2020, 2021, and the first half of 2022. Using a machine learning implementation, we found that this novel data source is able to predict the tourism demand with high accuracy, and we compare its potential in the tourism domain to web search and mobile phone data. This result paves the way for a more frequent and timely production of tourism statistics by researchers and statistical entities, and their usage to support tourism monitoring and management, although privacy and surveillance concerns still hinder an actual data innovation transition.

  17. f

    Absolute and relative estimation errors and relative model errors for each...

    • plos.figshare.com
    xls
    Updated Oct 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umberto Minora; Stefano Maria Iacus; Filipe Batista e Silva; Francesco Sermi; Spyridon Spyratos (2023). Absolute and relative estimation errors and relative model errors for each country in year 2020, 2021, and 2022. [Dataset]. http://doi.org/10.1371/journal.pone.0287063.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Umberto Minora; Stefano Maria Iacus; Filipe Batista e Silva; Francesco Sermi; Spyridon Spyratos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The first column shows the available countries with ISO 3166–1 alpha-2 country codes (https://www.iso.org/iso-3166-country-codes.html. Last accessed the 2022/05/16).

  18. t

    Data for A method for assessment of the general circulation model quality...

    • data.taltech.ee
    • data.niaid.nih.gov
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilja Maljutenko; Ilja Maljutenko; Urmas Raudsepp; Urmas Raudsepp (2025). Data for A method for assessment of the general circulation model quality using k-means clustering algorithm [Dataset]. http://doi.org/10.5281/zenodo.4588510
    Explore at:
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    TalTech Data Repository
    Authors
    Ilja Maljutenko; Ilja Maljutenko; Urmas Raudsepp; Urmas Raudsepp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2021
    Description

    The dataset consists of simulated and observed salinity/temperature data which were used in the manuscript "A method for assessment of the general circulation model quality using k-means clustering algorithm" submitted to Geoscientific Model Development.
    The model simulation dataset is from long-term 3D circulation model simulation (Maljutenko and Raudsepp 2014, 2019). The observations are from the "Baltic Sea - Eutrophication and Acidity aggregated datasets 1902/2017 v2018" SMHI (2018).

    The files are in simple comma separated table format without headers.
    The Dout-t_z_lat_lon_Smod_Sobs_Tmod_Tobs.csv file contains columns with following variables [units]:
    Time [matlab datenum units], Vertical coordinate [m], latitude [oN], longitude [oE], model salinity [g/kg], observed salinity [g/kg], model temperature [oC], observed temperature [oC].

    The Dout-t_z_lat_lon_dS_dT_K1_K2_K3_K4_K5_K6_K7_K8_K9.csv file contains columns with following variables [units]:
    4 first columns are the same as in the previous file, salinity error [g/kg], temperature error [oC], columns 7-8 are integers showing the cluster to which the error pair is designated.

    do_clust_valid_DataFig.m is a Matlab script which reads the two csv files (and optionally mask file Model_mask.mat), performs the clustering analysis and creates plots which are used in Manuscript. The script is organized into %% blocks which can be executed separately (default: ctrl+enter).

    k-means function is used from the Matlab Statistics and Machine Learning Toolbox.

    Additional software used in the do_clust_valid_DataFig.m:

    Author's auxiliary formatting scripts script/
    datetick_cst.m
    do_fitfig.m
    do_skipticks.m
    do_skipticks_y.m

    Colormaps are generated using cbrewer.m (Charles, 2021).
    Moving average smoothing is performed using nanmoving_average.m (Aguilera, 2021).

  19. Dataset and code of groudwater nitrate for machine learning

    • zenodo.org
    bin
    Updated Apr 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xuan Guo; Xuan Guo (2024). Dataset and code of groudwater nitrate for machine learning [Dataset]. http://doi.org/10.5281/zenodo.10974026
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xuan Guo; Xuan Guo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data of groudwater nitrate and related data in North China Plain (NCP). The data including nitrate concentration of groudwater collected from more than 4,000 sites (wells) in NCP from 2005 to 2021. The groundwater samples were collected in 2005–2021, and the collection was conducted in May (before rainy season) and October (after rainy season) in each year for every site.During sampling, basic information about well location, groundwater depth, farmland planting pattern and soil types were collected. Sampling wells were divided into three types according to depth, shallow (0–30 m), medium (30–100 m) and deep (> 100 m). The planting pattern mainly involved intensive croplands, grain crops, vegetable crops and orchards. Soil types of each sampling site were obtained from the China soil database (http://vdb3.soil.csdb.cn/).The socio-economic and agricultural information of the study areas (take the districts of municipalities and prefecture-level cities of provinces as basic units) were acquired via the China Statistical Yearbook (http://www.stats.gov.cn/sj/ndsj/). The data includes agricultural planting area, grain crop area, vegetable planting area, orchard planting area, total facility agricultural area; fertilizer amount, nitrogen fertilizer amount, unit area nitrogen fertilizer amount; total output value of agricultural, forestry, animal and fishery husbandry, agricultural output value, forestry output value, animal husbandry output value, fishery output value; Gross Domestic Product (GDP), per capita GDP; total population, and rural population.

  20. m

    Data from: The Economic Bomb: A Strategic Financial Warfare Tactic

    • data.mendeley.com
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolin Decker (2025). The Economic Bomb: A Strategic Financial Warfare Tactic [Dataset]. http://doi.org/10.17632/xn9ws8x6j7.2
    Explore at:
    Dataset updated
    Feb 21, 2025
    Authors
    Nicolin Decker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides evidence supporting the hypothesis that institutional shorting, ETF outflows, whale wallet movements, and media sentiment drive Bitcoin’s volatility and price manipulation. Central to this dataset is the Decker Sentiment-Short Interest Model (DSSIM)—an original equation developed by Nicolin Decker to quantify the relationship between market sentiment and institutional short interest. By combining sentiment scores from Natural Language Processing (NLP) and short positioning data, DSSIM offers a flexible framework for analyzing volatility in Bitcoin and other assets.

    The dataset spans January 2021 to December 2024, capturing daily market activity and key price events. Each file aligns with DSSIM’s variables, enabling replication and further analysis of the findings in the doctoral-level thesis The Economic Bomb: A Strategic Financial Warfare Tactic.

    Key Components: BTC_Price_Data.csv: Daily BTC/USD closing prices from Binance, Coinbase, and Bitstamp, serving as the baseline for volatility and return calculations.

    ETF_Holdings_Over_Time_Thesis.csv: Daily BTC holdings of ETFs (Grayscale, BlackRock, and Fidelity), illustrating cumulative outflows and their liquidity impact.

    ETF_Outflows_Price_Impact_Data.csv: Correlates ETF outflows with BTC volatility, highlighting timing and magnitude.

    Institutional_Shorting_Data.csv: Daily BTC short positions from Binance, BitMEX, Bybit, and OKX, serving as input for DSSIM’s short interest variable.

    Whale_Wallet_Movements.csv: Tracks large BTC wallet movements, revealing sell-offs preceding price crashes and influencing DSSIM’s residual noise component.

    Market_Liquidity_Data.csv: Daily BTC trading volume, order book depth, and liquidity ratios, validating DSSIM’s predictive capabilities.

    Media_Sentiment_Scores.csv: Daily sentiment from Twitter, Reddit, Google News, and YouTube, forming DSSIM’s sentiment variable.

    Monte_Carlo_Simulation_Results.csv: Simulates 1,000 BTC price paths to assess potential volatility under market stress.

    VAR_Model_Data.csv: Analyzes ETF outflows’ delayed impact on BTC returns using vector autoregression.

    Volatility_Clustering_Data.csv: Tracks daily BTC returns and 30-day rolling volatility, confirming persistent volatility after institutional actions.

    GARCH_Model_Data.csv: Models BTC volatility using GARCH, validating volatility clustering during market shocks.

    The dataset includes adjustments for major market events, such as the May 2021 Flash Crash, June 2022 Liquidation Crisis, and March 2023 Banking Crisis, ensuring realistic volatility patterns aligned with DSSIM’s modeling of sentiment shifts and institutional shorting.

    Researchers can use DSSIM’s structure and data to explore similar dynamics in other cryptocurrencies, equities, commodities, and forex markets, advancing financial analysis and predictive modeling.

    Access the full dataset: https://drive.google.com/drive/folders/1pnwqBTMF_QSJoC5QcNAPSQpVtOST2n8c?usp=drive_link

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Machine learning software market share worldwide 2021 [Dataset]. https://www.statista.com/statistics/1258541/machine-learning-market-share-technology-worldwide/
Organization logo

Machine learning software market share worldwide 2021

Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2021
Area covered
Worldwide
Description

Newsle led the global machine learning industry in 2021 with a market share of ***** percent, followed by TensorFlow and Torch. The source indicates that machine learning software is utilized for the application of artificial intelligence (AI) that allows systems the ability to automatically or "artificially" learn and improve functions based on experience without being specifically programmed to do so.

Search
Clear search
Close search
Google apps
Main menu